Bug 546743 - [GTK] Remove Platform.lock
Summary: [GTK] Remove Platform.lock
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.12   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 4.15 M3   Edit
Assignee: Nikita Nemkin CLA
QA Contact:
URL:
Whiteboard:
Keywords: triaged
Depends on:
Blocks: 553533 553535
  Show dependency tree
 
Reported: 2019-04-25 10:05 EDT by Alexandr Miloslavskiy CLA
Modified: 2021-08-26 02:20 EDT (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandr Miloslavskiy CLA 2019-04-25 10:05:38 EDT
Problems begin when some Java code gets called from inside event loop. For example, Shell's 'SWT.Close' event gets called from 'g_main_context_iteration'. For the entire duration of the code, lock is held, and all other threads will have to wait.

This for example means that on System.exit() java's shutdown hooks will deadlock with main thread if they call pretty much any 'OS' or 'GTK' code:
1) main thread called System.exit() and waits for shutdown hook to complete
2) shutdown hook (which is a different thread) tries to get 'Platform.lock' currently obtained in main thread which hold it in 'g_main_context_iteration'.

This caused problems for me in Bug 531634.

I did some research:
1) First version of GTK lock was implemented in this commit:
   4212fbaf by Christophe Cornu at 2003-02-06 22:16:42
   PR24372
2) There's a typo in commit message and it's actually Bug 23472
3) It seems that a small problem was fixed in overkill way by adding lock everywhere, which later caused some subsequent problems with deadlocks.
4) I measured the time spent in 'Lock.lock' in SmartGit, and it's quite low (~0.3%)
Comment 1 Alexandr Miloslavskiy CLA 2019-04-25 10:09:48 EDT
To clarify, I believe that lock can be removed from most native APIs.
Comment 2 Alexandr Miloslavskiy CLA 2019-04-25 10:10:26 EDT
On win32, SWT doesn't use locks in native APIs.
Comment 3 Eric Williams CLA 2019-04-25 10:23:18 EDT
Yes this has been lurking in the background for years. IIRC the locks were introduced in a time when GTK supported making calls from separate threads (GTK2, maybe even GTK1). That has been long gone in GTK3, so the locks are probably not needed.
Comment 4 Nikita Nemkin CLA 2020-01-15 04:56:25 EST
GTK3 thread safety requirements are similar to Win32/Cocoa and agree with restrictions posed by public SWT API. Blanket lock on every platform function is unnecessary.
Comment 5 Eclipse Genie CLA 2020-01-15 05:00:31 EST
New Gerrit change created: https://git.eclipse.org/r/155907
Comment 6 Alexandr Miloslavskiy CLA 2020-01-15 05:40:05 EST
Nikita, thanks for working on this!

I must admit, it seems that I didn't do my homework properly.

In Bug 531634, I ran into issues with the lock guarding GLib APIs.

Back then, I studied GLib and found that it doesn't need to be guarded:
----------------
https://developer.gnome.org/glib/2.32/glib-Threads.html

GLib itself is internally completely thread-safe (all global data is automatically locked), but individual data structure instances are not automatically locked for performance reasons.

For example, you must coordinate accesses to the same GHashTable from multiple threads.

The two notable exceptions from this rule are GMainLoop and GAsyncQueue, which are thread-safe and need no further application-level locking to be accessed from multiple threads.

Most refcounting functions such as g_object_ref() are also thread-safe. 
----------------

What I didn't do, though, is checking whether GTK needs to be guarded. As a result, I suggested a too broad idea to remove from most native APIs.

Today I decided to actually look into other components and found this:
----------------
https://developer.gnome.org/gdk3/stable/gdk3-Threads.html

GTK+, however, is not thread safe.

You should only use GTK+ and GDK from the thread gtk_init() and gtk_main() were called on.

This is usually referred to as the "main thread".
----------------

Which left me wondering. On one hand, it seems that GTK APIs are simply not allowed to run in other threads, which suggests that lock is useless. On the other hand, this made me feel somewhat uneasy.

It seems that you covered everything in your patch. Did you have a look whether those other things (like cairo) need to be guarded? What's your stance on GTK?

Maybe it would be reasonable to split the big patch into individual patches, one per component, and provide some docs in commit message?
Comment 7 Alexandr Miloslavskiy CLA 2020-01-15 06:09:08 EST
Hmm, now that I started thinking about it more carefully, GLib still wants locking for accessing individual things simultaneously. It's not that I expect SWT to read&write the same thing from multiple threads, but I also can't say that this is not possible.

In the perfect world, I would expect SWT to know/explain which methods can be called in a multi-threaded fashion and use 'synchronized' or locking properly. Most likely that wouldn't be on OS API level. Rather, small regions of SWT methods would be guarded: if for example SWT inserts data into native container and then reads the data, guarding on OS API level is not sufficient and could result in reading something else, because another thread managed to write already.

That leads me to another thought. I think I don't expect it to get any worse without API locks. Let's for example discuss disposing. Thread A wants to dispose GLib object, and thread B wants to do some action on it. Without SWT-side locking, having OS APIs under locks is not sufficient, because thread A will atomically dispose and then thread B will atomically access already-freed memory. This is already so bad that it couldn't get much worse.

Frankly, I don't exactly understand which code is allowed to run multi-threaded in SWT. If there is some, I'm beginning to be afraid that it could have relied on platform locks.
Comment 8 Nikita Nemkin CLA 2020-01-15 08:07:27 EST
SWT API that can be used from non-GUI threads:

1) Printer and its base class Device;
2) Image, Font, Path, TextLayout and other graphics objects;
3) GC on an Image or Printer;
4) Display.getDefault/getThread/syncExec/asyncExec/post/etc.

1-3 aren't thread safe, just thread-agnostic. User is responsible for locking as necessary.
4 is fully thread-safe, since it's meant to be called from non-GUI threads.
Everything else must be called from the main thread.

This fits pretty well with GTK requirements. Graphics use Pango and Cairo, widgets use GTK/GDK/X and everyone is happy.

In reality, GTK ports of Printer, Device and Image call some gtk_ and gdk_ functions. I have no idea if it's allowed at all and whether locking helps in this case. (Maybe Printer instances must be created on the main thread.)

Also there's a single FontConfig call (FcConfigAppFontAddFile) in Device.loadFont that I'm not sure if thread-safe or not. It would be prudent to protect it with Device.class lock.

Correctly written code, both SWT internal and external, doesn't need Platform.lock. I suggest that we remove the lock and fix API misuse as it's discovered.

PS. Alexandr you're right that per-function locking doesn't guarantee correcteness. Some (hypothetical) code blocks really need to be synchronized as a whole.
Comment 9 Eclipse Genie CLA 2020-01-16 05:17:06 EST
New Gerrit change created: https://git.eclipse.org/r/155989
Comment 10 Eclipse Genie CLA 2020-01-16 14:10:37 EST
New Gerrit change created: https://git.eclipse.org/r/156029
Comment 13 Lars Vogel CLA 2020-01-21 09:38:21 EST
Thanks, Nikita. 

Is the assumption correct that the resulting SWT code should be faster now that we do not use Platform.lock anymore?
Comment 14 Nikita Nemkin CLA 2020-01-21 09:58:16 EST
(In reply to Lars Vogel from comment #13)
> Thanks, Nikita. 
> 
> Is the assumption correct that the resulting SWT code should be faster now
> that we do not use Platform.lock anymore?

Yes, but the difference is likely unnoticeable.
Comment 15 Alexandr Miloslavskiy CLA 2020-01-21 09:59:19 EST
> 4) I measured the time spent in 'Lock.lock' in SmartGit, and it's quite low (~0.3%)
Comment 16 Lars Vogel CLA 2020-01-22 03:31:26 EST
As I do not see anything else related committed yesterday I think this change improved the startup time for the Eclipse IDE.

I can now start Eclipse SDK with Git and Markplace below 4 seconds.

eclipse -data ~/workspace/empty -debug

Debug options:
    file:/home/vogella/dev/eclipse-2020-01-21/eclipse/.options not found
Time to load bundles: 3
Starting application: 735
log4j:WARN No appenders could be found for logger (org.eclipse.jgit.internal.storage.file.FileSnapshot).
log4j:WARN Please initialize the log4j system properly.
Application started in : 3977ms

Thanks again Nikita for working on this.
Comment 17 Serban Ionica CLA 2020-09-28 08:35:56 EDT
Starting with Eclipse 4.15 I've encountered a lot of exotic crashes and dead-locks in an application I work on. After some digging to isolate the problem I've discovered that a previously working drawing engine was the source of the crashes. More specifically it was drawing a very complex Image off-GUI. When I implemented that engine I had the impression that Image and GC on an Image APIs are thread-agnostic and thread-safe (I'm not sure if this was documented or my conclusion was the result of experimentation).

I guess that removing the Platform.lock() also revealed that these APIs are not thread-safe (as Nikita said, the APIs aren't thread safe, just thread-agnostic).

My question is, how can I ensure thread-safety when drawing with GC on an Image from a non-GUI thread from user code when I can't control what gets called on the GUI thread (in an Eclipse plugin-based application any plugin can use the GUI thread at the same time with my off-GUI thread).

The only way I could achieve the synchronization was to move the code on the GUI thread, losing the benefit of off-GUI drawing. Are there any other options I could use?

If there are no other options, wouldn't it be a good idea to guard ALL SWT API with a SWT Exception "Invalid thread access" and just force the user to do everything on the GUI thread?

Finding the root of these crashes was not straightforward, the whole JVM dead-locked or crashed with a wide range of messages, none of them generating a hs_err_* file or something that could lead to the real problem and as any concurrency problem, it never crashed in the same place in the thread's stack.