Bug 564911 - [webkit2][GTK3] Deadlock when closing windows with an embedded webkit view
Summary: [webkit2][GTK3] Deadlock when closing windows with an embedded webkit view
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.14   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform-SWT-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-03 12:00 EDT by Alexander Diewald CLA
Modified: 2020-10-14 14:40 EDT (History)
2 users (show)

See Also:


Attachments
Timeout-for-deadlock (1.56 KB, patch)
2020-07-03 12:00 EDT, Alexander Diewald CLA
no flags Details | Diff
Stack trace without JFX (5.78 KB, application/octet-stream)
2020-07-04 04:57 EDT, Alexander Diewald CLA
no flags Details
Stack Trace with JFX code loaded (5.92 KB, application/octet-stream)
2020-07-04 04:59 EDT, Alexander Diewald CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Diewald CLA 2020-07-03 12:00:04 EDT
Created attachment 283501 [details]
Timeout-for-deadlock

Hi there,

for some time already, I could observe a deadlock if I try to close an eclipse window with an embedded webkit2 view, but just came to having a closer look. This disturbs the UX, unfortunately.

In such a situation, eclipse hangs in the method "execAsyncAndWaitForReturn" in the class WebKit. The call to "OS.g_main_context_iteration (0, false)" never returns. This happens at least with a Wayland-based DM where eclipse runs atopXWayland.

Right now, I have created a patch that (in my personal view) fixes the current timeout implementation in the very same method. I have attached it to this bug report. I am sure that this is not the correct solution, but it helps at least a bit.
I would be grateful if someone who is into this code could have a look at this issue.

I have verified that Webkit2 is used (Here: version 2.28.0).

My other system specs:
* Arch Linux
* Gnome Desktop v3.36.3
* Wayland

A colleague of mine did observe the bug using pure X11 as well, AFAIK.

I have found something very similar in FreeBsd: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238844

TBH, I have no idea whether this should be fixed in eclipse or in Webkit-Gtk, so some advice would be very helpful. I could not observe the issue with other applications embedding a Webkit view.


Best regards,

Alexander
Comment 1 Andrey Loskutov CLA 2020-07-03 12:09:46 EDT
@Alexander, can you please push a Gerrit patch for review and also provide a jstack from the hang case.

Funny enough, I've reported similar hang on closing Eclipse window internally in our application based on webkit, but so far had no time to analyze that.

My stack as I saw it in Yourkit was (upper part):

  org.eclipse.swt.internal.gtk.OS.g_main_context_iteration(long, boolean) OS.java (native)
  org.eclipse.swt.browser.WebKit$Webkit2AsyncToSync.execAsyncAndWaitForReturn(Browser, Consumer, String) WebKit.java:1410
  org.eclipse.swt.browser.WebKit$Webkit2AsyncToSync.runjavascript(String, Browser, long) WebKit.java:1179
  org.eclipse.swt.browser.WebKit$Webkit2AsyncToSync.evaluate(String, Browser, long) WebKit.java:1127
  org.eclipse.swt.browser.WebKit.evaluate(String) WebKit.java:1445
  org.eclipse.swt.browser.WebKit.close(boolean) WebKit.java:952
  org.eclipse.swt.browser.WebKit.onDispose(Event) WebKit.java:1934
  org.eclipse.swt.browser.WebKit.lambda$4(Event) WebKit.java:858
  org.eclipse.swt.browser.WebKit$$Lambda$1255.handleEvent(Event)
  org.eclipse.swt.widgets.EventTable.sendEvent(Event) EventTable.java:89
  org.eclipse.swt.widgets.Display.sendEvent(EventTable, Event) Display.java:7087
  org.eclipse.swt.widgets.Widget.sendEvent(Event) Widget.java:1452
  org.eclipse.swt.widgets.Widget.sendEvent(int, Event, boolean) Widget.java:1479
  org.eclipse.swt.widgets.Widget.sendEvent(int) Widget.java:1457
  org.eclipse.swt.widgets.Widget.release(boolean) Widget.java:1269
  org.eclipse.swt.widgets.Control.release(boolean) Control.java:4683
  org.eclipse.swt.widgets.Widget.dispose() Widget.java:541
Comment 2 Alexander Diewald CLA 2020-07-03 13:10:31 EDT
@Andrey
Thank you for the very quick reply!

I just noticed that I can directly copy the stack trace from the eclipse debugger... *facepalm* Up to now I effectively used them only when exceptions occurred. (Thanks for the jstack pointer!)

So, here it is (from a 2019-12 / 4.14 based application):
OS._g_main_context_iteration(long, boolean) line: not available [native method]	
OS.g_main_context_iteration(long, boolean) line: 1604	
WebKit$Webkit2AsyncToSync.execAsyncAndWaitForReturn(Browser, Consumer<Integer>, String) line: 1900	
WebKit$Webkit2AsyncToSync.runjavascript(String, Browser, long) line: 1802	
WebKit$Webkit2AsyncToSync.evaluate(String, Browser, long) line: 1750	
WebKit.evaluate(String) line: 1936	
WebKit.close(boolean) line: 1557	
WebKit.onDispose(Event) line: 2569	
WebKit.lambda$4(Event) line: 1318	
1735688275.handleEvent(Event) line: not available	
EventTable.sendEvent(Event) line: 89	
Display.sendEvent(EventTable, Event) line: 5676	
Browser(Widget).sendEvent(Event) line: 1423	
Browser(Widget).sendEvent(int, Event, boolean) line: 1449	
Browser(Widget).sendEvent(int) line: 1428	
Browser(Widget).release(boolean) line: 1240	
Browser(Control).release(boolean) line: 4628	
Composite.releaseChildren(boolean) line: 1504	
Composite(Widget).release(boolean) line: 1243	
Composite(Control).release(boolean) line: 4628	
Composite.releaseChildren(boolean) line: 1504	
Composite(Widget).release(boolean) line: 1243	
Composite(Control).release(boolean) line: 4628	
ContributedPartRenderer$1(Composite).releaseChildren(boolean) line: 1504	

It looks pretty similar to yours.

However, I found one very interesting thing when getting the stack trace:
The issue does not appear with a recent standard eclipse installation (2020-06) and its welcome page. I can reproduce the deadlock only with our RCP that uses the compatibility layer. It is based on eclipse 2019-12.
I also switched to the 4.14 maintenance branch in the standard eclipse installation where I have the SWT plugins and verified that eclipse closes correctly, too. Previously, I only did a test that the version with my patch applied actually works (which is not good, I know...).

While scanning the history, I could not see anything related to a fix after the 2019-12 release which could cause the R4-14 maintainance branch to have the problem resolved.

The whole issue seems a bit more complicated than to be a pure webkit/gtk problem. For now, I can only guess that the Compatibility Layer is somehow involved, but again, this is no more than a guess.


The patch I created can be found here:
https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/165810

But please note that this version seems to create a force-wait situation for 10s  that I just noticed.

I hope this helps.


Best,
Alex
Comment 3 Simeon Andreev CLA 2020-07-03 14:07:14 EDT
See the documentation of WebKit.runjavascript().

In the stack traces from Alexander and Andrey, the browser is running some close window javascript function. Andrey has listed those pstack entries in our internal tracker:

Thread 129 (Thread 0x7ffef4dfa700 (LWP 28520)):
#0  0x00007ffff72bea3d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fff82d9932c in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fff82d9967a in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fff15e0d9d0 in WTF::RunLoop::run() () at /lib64/libjavascriptcoregtk-4.0.so.18
#4  0x00007fff15e0c65e in std::_Function_handler<void (), WTF::WorkQueue::platformInitialize(char const*, WTF::WorkQueue::Type, WTF::WorkQueue::QOS)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at /lib64/libjavascriptcoregtk-4.0.so.18
#5  0x00007fff15de73c5 in WTF::threadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#6  0x00007fff15e0b2ea in WTF::wtfThreadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#7  0x00007ffff79b0e25 in start_thread (arg=0x7ffef4dfa700) at pthread_create.c:308
#8  0x00007ffff72c934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 128 (Thread 0x7ffef57fb700 (LWP 28519)):
#0  0x00007ffff72bea3d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fff82d9932c in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fff82d9967a in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fff15e0d9d0 in WTF::RunLoop::run() () at /lib64/libjavascriptcoregtk-4.0.so.18
#4  0x00007fff15e0c65e in std::_Function_handler<void (), WTF::WorkQueue::platformInitialize(char const*, WTF::WorkQueue::Type, WTF::WorkQueue::QOS)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at /lib64/libjavascriptcoregtk-4.0.so.18
#5  0x00007fff15de73c5 in WTF::threadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#6  0x00007fff15e0b2ea in WTF::wtfThreadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#7  0x00007ffff79b0e25 in start_thread (arg=0x7ffef57fb700) at pthread_create.c:308
#8  0x00007ffff72c934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 127 (Thread 0x7ffef75fe700 (LWP 28508)):
#0  0x00007ffff79b4cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007fff15de4753 in WTF::ParkingLot::parkConditionallyImpl(void const*, WTF::ScopedLambda<bool ()> const&, WTF::ScopedLambda<void ()> const&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >) () at /lib64/libjavascriptcoregtk-4.0.so.18
#2  0x00007fff15dd78f2 in WTF::sleep(double) () at /lib64/libjavascriptcoregtk-4.0.so.18
#3  0x00007fff0609693d in std::_Function_handler<void (), WebKit::MemoryPressureMonitor::MemoryPressureMonitor()::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at /lib64/libwebkit2gtk-4.0.so.37
#4  0x00007fff15de73c5 in WTF::threadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#5  0x00007fff15e0b2ea in WTF::wtfThreadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#6  0x00007ffff79b0e25 in start_thread (arg=0x7ffef75fe700) at pthread_create.c:308
#7  0x00007ffff72c934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 126 (Thread 0x7ffef7fff700 (LWP 28504)):
#0  0x00007ffff72bea3d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fff82d9932c in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fff82d9967a in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fff15e0d9d0 in WTF::RunLoop::run() () at /lib64/libjavascriptcoregtk-4.0.so.18
#4  0x00007fff15e0c65e in std::_Function_handler<void (), WTF::WorkQueue::platformInitialize(char const*, WTF::WorkQueue::Type, WTF::WorkQueue::QOS)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at /lib64/libjavascriptcoregtk-4.0.so.18
#5  0x00007fff15de73c5 in WTF::threadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#6  0x00007fff15e0b2ea in WTF::wtfThreadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#7  0x00007ffff79b0e25 in start_thread (arg=0x7ffef7fff700) at pthread_create.c:308
#8  0x00007ffff72c934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 125 (Thread 0x7ffefd1ff700 (LWP 28503)):
#0  0x00007ffff72bea3d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fff82d9932c in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2  0x00007fff82d9967a in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3  0x00007fff15e0d9d0 in WTF::RunLoop::run() () at /lib64/libjavascriptcoregtk-4.0.so.18
#4  0x00007fff15e0c65e in std::_Function_handler<void (), WTF::WorkQueue::platformInitialize(char const*, WTF::WorkQueue::Type, WTF::WorkQueue::QOS)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () at /lib64/libjavascriptcoregtk-4.0.so.18
#5  0x00007fff15de73c5 in WTF::threadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#6  0x00007fff15e0b2ea in WTF::wtfThreadEntryPoint(void*) () at /lib64/libjavascriptcoregtk-4.0.so.18
#7  0x00007ffff79b0e25 in start_thread (arg=0x7ffefd1ff700) at pthread_create.c:308
#8  0x00007ffff72c934d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


At the very least, name-wise this one seems very weird for something to call during a close/dispose:

 WTF::WorkQueue::platformInitialize()

Since we have this also in our product, we can hopefully provide a reproducer. Probably not irrelevant in our case is: 2 Eclipse windows, SWT Browser in a status line contribution, hang is seen on closing the 2nd window.

Andrey should we ping RHEL? This is not the only WebKit2 related issue we have, it would at least be good to know if anyone is maintaining the SWT WebKit code (and someone knowledgeable could give us pointers to speed up reproduction).
Comment 4 Alexander Diewald CLA 2020-07-03 14:43:24 EDT
Andrey asked me about steps to reproduce. Here they are:

1. Build an RCP that uses the compatibility layer (Inside eclipse is fine).
2. Ensure there is no runtime workspace for the upcoming launch of the RCP.
3. Start the RCP and notice the Welcome screen (it must be enabled in the RCP).
4. Close the Eclipse window, not the welcome view.
5. Eclipse will hang.

I doubt that is useful for debugging due to point 1. However, if you would like a test case, you can follow the instructions at https://af3-developer.fortiss.org/projects/autofocus3/wiki/AF3_Developer_Installation therefore. But this is anything than a minimal example...

I am not aware of any other triggers than that :/
Comment 5 Andrey Loskutov CLA 2020-07-03 14:50:47 EDT
(In reply to Simeon Andreev from comment #3)
> At the very least, name-wise this one seems very weird for something to call
> during a close/dispose:
> 
>  WTF::WorkQueue::platformInitialize()
> 
> Since we have this also in our product, we can hopefully provide a
> reproducer. Probably not irrelevant in our case is: 2 Eclipse windows, SWT
> Browser in a status line contribution, hang is seen on closing the 2nd
> window.

*If* platformInitialize() is something that happens on *starting* webkit, probably what happens is that we dispose webkit instance before it is properly started, so that the callback on close is triggered at some indefinite state.

I remember I've created and closed windows short after each other, to see if *our* internal problem somehow related to number of opened/closed windows, and at some time UI just froze.

> Andrey should we ping RHEL? This is not the only WebKit2 related issue we
> have, it would at least be good to know if anyone is maintaining the SWT
> WebKit code (and someone knowledgeable could give us pointers to speed up
> reproduction).

*Our* webkit is too old for RH, but Alexander is on the "latest greatest". 

If we can get simple steps to reproduce, we can open RH ticket. I believe that should also work with opened "internal web browser" or "javadoc" views. I guess  open javadoc, open new window (should have javadoc opened), close old window, repeat that till hang... Unfortunately *I* can't reproduce it with this sequence. May be "empty" javadoc doesn't immediately create a new browser, haven't checked that yet.
Comment 6 Andrey Loskutov CLA 2020-07-03 15:47:48 EDT
See also bug 564464. May be related.
Comment 7 Andrey Loskutov CLA 2020-07-03 15:52:30 EDT
Bug 563990 could be also related (the hang, not the crash after).
Comment 8 Alexander Diewald CLA 2020-07-03 16:29:13 EDT
https://bugs.eclipse.org/bugs/show_bug.cgi?id=563990 could be indeed related. We are also using JavaFX. In our case we also use e(fx)clipse, but only a small portion of it.

Essentially, my second guess after the compatibility layer. Nevertheless, it *might* be the case that JFX interferes with the event handling. What about you Andrey and Simeon? Is JFX somewhere used in your application?

Thanks a lot for the detailed information and the linked bugs. I hope we can get a cleared picture about what is going on here.
Comment 9 Alexander Diewald CLA 2020-07-04 04:51:16 EDT
It seems to me that using e(fx)clipse or JavaFX itself is causing the problem that I can observe.

In order to check this assumption from my last comment, I removed almost all UI plugins from our RCP and disabled all views with embedded JFX content. Eclipse did close properly in such a scenario when the welcome page is shown.
As a counter example, I repeated the test and added a single view that initializes the e(fx)clipse/JFX code. Eclipse did hang again with the welcome page and the JFX-loading view open.
These two cases can be separated very well: If no e(fx)clipse/JFX code is loaded, the main Java Thread is named "main", whereas in the other case it is named "JavaFX Application Thread".

@Andrey, @Simeon: Are you using e(fx)clipse / JFX somewhere in your code? If not, I assume we have two different issues.

I will now try the FX container for the WebKit engine, embedded in a SWT Eclipse ViewPart to see if it helps somehow. Also, I'll abandon the hack in the gerrit instance.
Comment 10 Alexander Diewald CLA 2020-07-04 04:57:40 EDT
Created attachment 283507 [details]
Stack trace without JFX
Comment 11 Alexander Diewald CLA 2020-07-04 04:59:14 EDT
Created attachment 283508 [details]
Stack Trace with JFX code loaded

Note that there is no difference in the stack of the main thread, with JFX loaded and without it being loaded.
Comment 12 Simeon Andreev CLA 2020-07-04 05:25:55 EDT
(In reply to Alexander Diewald from comment #9)
> @Andrey, @Simeon: Are you using e(fx)clipse / JFX somewhere in your code?

I believe we don't use those.

> If not, I assume we have two different issues.

Or maybe two different paths to the same issue. We'll try to reproduce on our end  and we'll see.
Comment 13 Alexander Diewald CLA 2020-07-04 08:18:05 EDT
(In reply to Simeon Andreev from comment #12)
> (In reply to Alexander Diewald from comment #9)
> > @Andrey, @Simeon: Are you using e(fx)clipse / JFX somewhere in your code?
> 
> I believe we don't use those.

Alright, thanks!

> 
> > If not, I assume we have two different issues.
> 
> Or maybe two different paths to the same issue. We'll try to reproduce on
> our end  and we'll see.

Possible, yes.

Using a JFX-embedded webkit is unfortunately impossible due to the structure of the intro pages. For a simpler case where we present some html-based help content using a custom implementation, I noticed that the FX/SWT Browser support is still experimental and not even release, so it makes no sense to go this route in my opinion as the efforts are to large compared to the risk that the end result won't work as well.

Additionally, I tried to decouple the JavaFX Application Thread from the main thread, but Eclipse shuts down itself after the splash screen and before presenting any GUI.

I ran out of ideas for now...