501997 – Investigate asynchronous installation/uninstallation of IRefreshMonitor on workspace resources

Bug 501997 - Investigate asynchronous installation/uninstallation of IRefreshMonitor on workspace resources

Summary: Investigate asynchronous installation/uninstallation of IRefreshMonitor on wo...

Status:	REOPENED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Resources (show other bugs)
Version:	4.6
Hardware:	PC Mac OS X

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	Alex Blewitt
QA Contact:

URL:
Whiteboard:	performance
Keywords:

Depends on:	508419 514090 572128
Blocks:	507092 507812
	Show dependency tree

Reported:	2016-09-22 09:24 EDT by Mikaël Barbero
Modified:	2021-03-21 06:48 EDT (History)
CC List:	8 users (show)

See Also:	501306 Gerrit Change Git Commit Gerrit Change Gerrit Change Git Commit Git Commit Gerrit Change

Attachments
Class loaded during Workspace init (63.83 KB, text/plain) 2016-09-29 11:09 EDT, Mikaël Barbero	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mikaël Barbero

2016-09-22 09:24:59 EDT

When Eclipse is started and the "Refresh using native hooks or polling" option is activated, the o.e.core.resources plugin activation is stopped by the installation of native hooks or polling (see bug 501306 comment 4). If it ever takes longer than 5 seconds, OSGi class loader will time out and Eclipse would not be properly initialized. We should run the installation asynchronously and call org.eclipse.core.internal.refresh.PollingMonitor.runOnce() after the installation in order to be sure we are fully synchronized (otherwise, we can loose some change events during the monitor installation).

Comment 1 Mikaël Barbero

2016-09-22 12:33:11 EDT

One interesting finding: 

We may not be able to "simply" install/uninstall refresh monitors asynchronously as it seems that AliasManager requires to be started after MonitorManager (see bug 94829). The bug does not explain all the details, but if the AliasManager requires the monitors to be installed then we will need to do both startup sequentially, in a single background task.

Comment 2 Eclipse Genie

2016-09-23 08:37:46 EDT

New Gerrit change created: https://git.eclipse.org/r/81786

Comment 3 Mikaël Barbero

2016-09-23 08:40:05 EDT

https://git.eclipse.org/r/81786 is a bold change: I've moved the whole workspace startup in a background process. I'm looking forward to seeing what you think about that.

Comment 4 Sergey Prigogin

2016-09-23 20:24:06 EDT

(In reply to Mikaël Barbero from comment #3)
Bold indeed! God only knows what kinds of deadlocks and other problems could happen because of this change. What is your plan for testing it?

Comment 5 Mikaël Barbero

2016-09-27 06:30:45 EDT

Deadlocks can happen if a thread is spawn during the background initialization of the workspace, if this thread calls ResourcesPlugin.getWorkspace() and the spawner joins on this later thread. 

AFAICT, all the workspace init code is internal and only the refresh manager delegates some processing to extensions (during startup). It means that apart from refresh manager, we have a hand on all the init code. 

Is it reasonable to add a constraint on ResourcesPlugin.getWorkspace() for internal init code and refresh providers? The constraint would be "do not call ResourcesPlugin.getWorkspace() from a different thread than your caller's and later join on this thread."

Also note that workspace jobs can still be scheduled during init because WorkManager locks the whole workspace during init.

For the testing, I simply made the init job very long to execute (sleeping for several seconds, at different locations in the init code) and checked that no error was raised. I also added a log message before joining the init job: 

getPlugin().getLog().log(new Status(IStatus.INFO, PI_RESOURCES, Thread.currentThread() + " blocked in #getWorkspace()", new Exception())); //$NON-NLS-1$

Only EMF showed up in this log:

Thread[main,6,main] blocked in #getWorkspace()
	at org.eclipse.core.resources.ResourcesPlugin.getWorkspace(ResourcesPlugin.java:425)
	at org.eclipse.emf.ecore.plugin.EcorePlugin.getWorkspaceRoot(EcorePlugin.java:1131)
	at org.eclipse.emf.ecore.resource.impl.ExtensibleURIConverterImpl.<clinit>(ExtensibleURIConverterImpl.java:393)
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.getURIConverter(ResourceSetImpl.java:499)
	at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.getResource(ResourceSetImpl.java:369)
	at org.eclipse.e4.ui.internal.workbench.ResourceHandler.getResource(ResourceHandler.java:286)
	at org.eclipse.e4.ui.internal.workbench.ResourceHandler.loadResource(ResourceHandler.java:262)
	at org.eclipse.e4.ui.internal.workbench.ResourceHandler.loadMostRecentModel(ResourceHandler.java:169)
	at org.eclipse.e4.ui.internal.workbench.swt.E4Application.loadApplicationModel(E4Application.java:378)
	at org.eclipse.e4.ui.internal.workbench.swt.E4Application.createE4Workbench(E4Application.java:253)
	at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:626)
	[...]
	at org.eclipse.equinox.launcher.Main.main(Main.java:1492)

Comment 6 Sergey Prigogin

2016-09-27 13:58:08 EDT

(In reply to Mikaël Barbero from comment #5)
This testing looks like a good start.

> AFAICT, all the workspace init code is internal and only the refresh manager
> delegates some processing to extensions (during startup). It means that apart
> from refresh manager, we have a hand on all the init code.

This is not quite true since visible editors and views trigger activation of their host plugins and a lot of other things. Experience shows that initialization scenario varies widely depending on the set of open editors and views.  We definitely need more real life testing.

Comment 7 Mikaël Barbero

2016-09-29 11:09:43 EDT

Created attachment 264496 [details]
Class loaded during Workspace init

I've investigated a bit more. With the current master of o.e.c.resources, I've modified the ResourcesPlugin#start method to sysout markers and the beginning and the end of the ResourcesPlugin#start() method. Then, I've started a runtime with verbose:class and identified which classes are loaded between the beginning and the end of the workspace init (see attachment for the list). From this list, it is easy to extract which plugins are potentially activated. Here is the list of plugins:

org.eclipse.core.contenttype_3.6.0.v20160919-1839.jar
org.eclipse.core.filesystem
org.eclipse.core.jobs_3.8.0.v20160901-1938.jar
org.eclipse.core.resources
org.eclipse.core.runtime_3.12.0.v20160919-2241.jar
org.eclipse.equinox.common_3.9.0.v20160831-0037.jar
org.eclipse.equinox.preferences_3.6.100.v20160815-1421.jar

(o.e.c.resources and o.e.c.filesystem were in my workspace, hence the name)

So the risk of deadlocking is mainly in the activator of these plugins. Here is a summary of what the plugin activators do:

org.eclipse.core.contenttype => register service
org.eclipse.core.filesystem => do nothing
org.eclipse.core.jobs => register services
org.eclipse.core.runtime => most probably already started, and anyway it just get the application args
org.eclipse.equinox.common => register services (also most probably already started)
org.eclipse.equinox.preferences => register services

It looks safe. 

(In reply to Sergey Prigogin from comment #6)
> We definitely need more real life testing.

I'm running my eclipses with this patch for a week now, and experienced no deadlock. What do you suggest for more real life testing? What about integrating a patch with a timeout (20secs?) on the join and we throw an exception, with info about a potential deadlock between the job init thread and the current thread that has reached the timeout (both with stacktraces). I'm also willing to add a property (-Ddisable.concurrent.workspace.init=true) to deactivate the background init of the workspace in case it blocks people (and add this info in the exception message). WDYT?

Comment 8 Sergey Prigogin

2016-09-30 00:38:10 EDT

(In reply to Mikaël Barbero from comment #7)
The amount of testing you've done already is sufficient in my view for submitting this change to master even without the disable.concurrent.workspace.init red button.

Comment 9 Mikaël Barbero

2016-10-03 07:19:47 EDT

(In reply to Sergey Prigogin from comment #8)
> (In reply to Mikaël Barbero from comment #7)
> The amount of testing you've done already is sufficient in my view for
> submitting this change to master even without the
> disable.concurrent.workspace.init red button.

Thanks. I've updated the patch with a timed join that logs a warning when the workspace init takes too long.

Comment 10 Eclipse Genie

2016-10-07 16:38:56 EDT

Gerrit change https://git.eclipse.org/r/81786 was merged to [master].
Commit: http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/?id=8d90fb030df310a974aae5d27d0bc2610c1a14ad

Comment 11 Sergey Prigogin

2016-10-07 16:39:55 EDT

Crossing my fingers.

Comment 12 Andrey Loskutov

2016-11-05 04:59:50 EDT

(In reply to Sergey Prigogin from comment #11)
> Crossing my fingers.

I'm sorry, crossing fingers didn't help, see bug 507092 :-(

Comment 13 Sergey Prigogin

2017-03-22 20:31:55 EDT

Reopened due to the issue described in https://bugs.eclipse.org/bugs/show_bug.cgi?id=514090#c4

Comment 14 Eclipse Genie

2017-03-22 20:38:47 EDT

New Gerrit change created: https://git.eclipse.org/r/93664

Comment 15 Eclipse Genie

2017-03-22 23:05:40 EDT

Gerrit change https://git.eclipse.org/r/93664 was merged to [master].
Commit: http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/?id=7069f16072925b47d40a4ed8d58c1561ae6c276a

Comment 16 Andrey Loskutov

2017-03-23 04:27:01 EDT

(In reply to Sergey Prigogin from comment #13)
> Reopened due to the issue described in
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=514090#c4

Doe this mean, we don't plan to init ResourcesPlugin asynchronously anymore? If yes, we can close bug 508419.

Comment 17 Sergey Prigogin

2017-03-23 14:20:38 EDT

> Doe this mean, we don't plan to init ResourcesPlugin asynchronously anymore?
> If yes, we can close bug 508419.

Asynchronous initialization of ResourcesPlugin is still desirable but it cannot be done safely unless suspension of the job manager is removed.

Comment 18 Jens Kuebler

2017-05-24 04:11:16 EDT

I'm wondering why not use a osgi component for the workspace. It is registered as service anyways. Providing a configuration for the storage locations seems to be straight forward. If configuration should be non persistent the component factory comes in handy as well.

I can't see that for creating the service a scheduling rule is required so the job manager should not be necessary at all and the osgi scr handles startup with its threading model.
Although not the best solution the static getWorkspace could do a service lookup and return null if the service is not yet registered.

Comment 19 Eclipse Genie

2017-05-24 07:12:07 EDT

New Gerrit change created: https://git.eclipse.org/r/97851

Comment 20 Jens Kuebler

2017-05-24 07:18:51 EDT

I don't wanna tamper with with the initial approach. I just think that due to the numerous cycle dependencies in this plugin more needs to be refactored. Most of the time passing the arguments seems totally fine to me.

The service lookup is certainly not safe at the moment but adding more time to the lookup or using an event based method seems totally fine to me.

The presented patch is a stub for further discussion and not production ready.

Comment 21 Mikaël Barbero

2017-05-26 05:02:09 EDT

(In reply to Jens  Kuebler from comment #18)
> Although not the best solution the static getWorkspace could do a service
> lookup and return null if the service is not yet registered.

The issue is that *many* plugins expect the workspace to be registered/initialized when the they call getWorskpace() (or throw an IllegalStateException).

Comment 22 Jens Kuebler

2017-05-29 01:16:44 EDT

I think the design problem that the old implementation had, was that the lifecycle of the bundle and the associated class loading for the bundle was coupled with initialization of the workbench. The initialization may only happen after it has been configured (either by the application, the command line or a user interaction). However this forbids loading classes from that bundle until configuration has happened. Using the Jobs API allows for using scheduling rules and doing things asynchronously. However it does not guarantee that the workbench location has been configured unless done programmatically. Currently this seems like choosing between the lesser of two evils, since the workbench crashes if classes from the resource bundle are loaded too early. On the other hand consumers receive NPEs if the workbench has not been initialized yet. Switching to an event bases system seems reasonable for me and since the OSGi service seems to be a natural fit I would go for this, even if clients relied on runtime behavior. Maybe a combination with a job / scheduling rule during initialization is reasonable since it needs to lock the workspace anyways then.
Clients would have a clean migration path since they all must wait on the workbench to become available until they may proceed or an error will occur.
Maybe providing a legacy mode is another option here.

Comment 23 Alex Blewitt

2020-07-02 11:49:27 EDT

There hasn't been any work on this recently. I'm happy to take it over if there aren't any objections?

Comment 24 Mikaël Barbero

2020-07-02 11:50:48 EDT

(In reply to Alex Blewitt from comment #23)
> There hasn't been any work on this recently. I'm happy to take it over if
> there aren't any objections?

Glad you have a look.