Community
Participate
Working Groups
When Eclipse is started and the "Refresh using native hooks or polling" option is activated, the o.e.core.resources plugin activation is stopped by the installation of native hooks or polling (see bug 501306 comment 4). If it ever takes longer than 5 seconds, OSGi class loader will time out and Eclipse would not be properly initialized. We should run the installation asynchronously and call org.eclipse.core.internal.refresh.PollingMonitor.runOnce() after the installation in order to be sure we are fully synchronized (otherwise, we can loose some change events during the monitor installation).
One interesting finding: We may not be able to "simply" install/uninstall refresh monitors asynchronously as it seems that AliasManager requires to be started after MonitorManager (see bug 94829). The bug does not explain all the details, but if the AliasManager requires the monitors to be installed then we will need to do both startup sequentially, in a single background task.
New Gerrit change created: https://git.eclipse.org/r/81786
https://git.eclipse.org/r/81786 is a bold change: I've moved the whole workspace startup in a background process. I'm looking forward to seeing what you think about that.
(In reply to Mikaël Barbero from comment #3) Bold indeed! God only knows what kinds of deadlocks and other problems could happen because of this change. What is your plan for testing it?
Deadlocks can happen if a thread is spawn during the background initialization of the workspace, if this thread calls ResourcesPlugin.getWorkspace() and the spawner joins on this later thread. AFAICT, all the workspace init code is internal and only the refresh manager delegates some processing to extensions (during startup). It means that apart from refresh manager, we have a hand on all the init code. Is it reasonable to add a constraint on ResourcesPlugin.getWorkspace() for internal init code and refresh providers? The constraint would be "do not call ResourcesPlugin.getWorkspace() from a different thread than your caller's and later join on this thread." Also note that workspace jobs can still be scheduled during init because WorkManager locks the whole workspace during init. For the testing, I simply made the init job very long to execute (sleeping for several seconds, at different locations in the init code) and checked that no error was raised. I also added a log message before joining the init job: getPlugin().getLog().log(new Status(IStatus.INFO, PI_RESOURCES, Thread.currentThread() + " blocked in #getWorkspace()", new Exception())); //$NON-NLS-1$ Only EMF showed up in this log: Thread[main,6,main] blocked in #getWorkspace() at org.eclipse.core.resources.ResourcesPlugin.getWorkspace(ResourcesPlugin.java:425) at org.eclipse.emf.ecore.plugin.EcorePlugin.getWorkspaceRoot(EcorePlugin.java:1131) at org.eclipse.emf.ecore.resource.impl.ExtensibleURIConverterImpl.<clinit>(ExtensibleURIConverterImpl.java:393) at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.getURIConverter(ResourceSetImpl.java:499) at org.eclipse.emf.ecore.resource.impl.ResourceSetImpl.getResource(ResourceSetImpl.java:369) at org.eclipse.e4.ui.internal.workbench.ResourceHandler.getResource(ResourceHandler.java:286) at org.eclipse.e4.ui.internal.workbench.ResourceHandler.loadResource(ResourceHandler.java:262) at org.eclipse.e4.ui.internal.workbench.ResourceHandler.loadMostRecentModel(ResourceHandler.java:169) at org.eclipse.e4.ui.internal.workbench.swt.E4Application.loadApplicationModel(E4Application.java:378) at org.eclipse.e4.ui.internal.workbench.swt.E4Application.createE4Workbench(E4Application.java:253) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:626) [...] at org.eclipse.equinox.launcher.Main.main(Main.java:1492)
(In reply to Mikaël Barbero from comment #5) This testing looks like a good start. > AFAICT, all the workspace init code is internal and only the refresh manager > delegates some processing to extensions (during startup). It means that apart > from refresh manager, we have a hand on all the init code. This is not quite true since visible editors and views trigger activation of their host plugins and a lot of other things. Experience shows that initialization scenario varies widely depending on the set of open editors and views. We definitely need more real life testing.
Created attachment 264496 [details] Class loaded during Workspace init I've investigated a bit more. With the current master of o.e.c.resources, I've modified the ResourcesPlugin#start method to sysout markers and the beginning and the end of the ResourcesPlugin#start() method. Then, I've started a runtime with verbose:class and identified which classes are loaded between the beginning and the end of the workspace init (see attachment for the list). From this list, it is easy to extract which plugins are potentially activated. Here is the list of plugins: org.eclipse.core.contenttype_3.6.0.v20160919-1839.jar org.eclipse.core.filesystem org.eclipse.core.jobs_3.8.0.v20160901-1938.jar org.eclipse.core.resources org.eclipse.core.runtime_3.12.0.v20160919-2241.jar org.eclipse.equinox.common_3.9.0.v20160831-0037.jar org.eclipse.equinox.preferences_3.6.100.v20160815-1421.jar (o.e.c.resources and o.e.c.filesystem were in my workspace, hence the name) So the risk of deadlocking is mainly in the activator of these plugins. Here is a summary of what the plugin activators do: org.eclipse.core.contenttype => register service org.eclipse.core.filesystem => do nothing org.eclipse.core.jobs => register services org.eclipse.core.runtime => most probably already started, and anyway it just get the application args org.eclipse.equinox.common => register services (also most probably already started) org.eclipse.equinox.preferences => register services It looks safe. (In reply to Sergey Prigogin from comment #6) > We definitely need more real life testing. I'm running my eclipses with this patch for a week now, and experienced no deadlock. What do you suggest for more real life testing? What about integrating a patch with a timeout (20secs?) on the join and we throw an exception, with info about a potential deadlock between the job init thread and the current thread that has reached the timeout (both with stacktraces). I'm also willing to add a property (-Ddisable.concurrent.workspace.init=true) to deactivate the background init of the workspace in case it blocks people (and add this info in the exception message). WDYT?
(In reply to Mikaël Barbero from comment #7) The amount of testing you've done already is sufficient in my view for submitting this change to master even without the disable.concurrent.workspace.init red button.
(In reply to Sergey Prigogin from comment #8) > (In reply to Mikaël Barbero from comment #7) > The amount of testing you've done already is sufficient in my view for > submitting this change to master even without the > disable.concurrent.workspace.init red button. Thanks. I've updated the patch with a timed join that logs a warning when the workspace init takes too long.
Gerrit change https://git.eclipse.org/r/81786 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/?id=8d90fb030df310a974aae5d27d0bc2610c1a14ad
Crossing my fingers.
(In reply to Sergey Prigogin from comment #11) > Crossing my fingers. I'm sorry, crossing fingers didn't help, see bug 507092 :-(
Reopened due to the issue described in https://bugs.eclipse.org/bugs/show_bug.cgi?id=514090#c4
New Gerrit change created: https://git.eclipse.org/r/93664
Gerrit change https://git.eclipse.org/r/93664 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.resources.git/commit/?id=7069f16072925b47d40a4ed8d58c1561ae6c276a
(In reply to Sergey Prigogin from comment #13) > Reopened due to the issue described in > https://bugs.eclipse.org/bugs/show_bug.cgi?id=514090#c4 Doe this mean, we don't plan to init ResourcesPlugin asynchronously anymore? If yes, we can close bug 508419.
> Doe this mean, we don't plan to init ResourcesPlugin asynchronously anymore? > If yes, we can close bug 508419. Asynchronous initialization of ResourcesPlugin is still desirable but it cannot be done safely unless suspension of the job manager is removed.
I'm wondering why not use a osgi component for the workspace. It is registered as service anyways. Providing a configuration for the storage locations seems to be straight forward. If configuration should be non persistent the component factory comes in handy as well. I can't see that for creating the service a scheduling rule is required so the job manager should not be necessary at all and the osgi scr handles startup with its threading model. Although not the best solution the static getWorkspace could do a service lookup and return null if the service is not yet registered.
New Gerrit change created: https://git.eclipse.org/r/97851
I don't wanna tamper with with the initial approach. I just think that due to the numerous cycle dependencies in this plugin more needs to be refactored. Most of the time passing the arguments seems totally fine to me. The service lookup is certainly not safe at the moment but adding more time to the lookup or using an event based method seems totally fine to me. The presented patch is a stub for further discussion and not production ready.
(In reply to Jens Kuebler from comment #18) > Although not the best solution the static getWorkspace could do a service > lookup and return null if the service is not yet registered. The issue is that *many* plugins expect the workspace to be registered/initialized when the they call getWorskpace() (or throw an IllegalStateException).
I think the design problem that the old implementation had, was that the lifecycle of the bundle and the associated class loading for the bundle was coupled with initialization of the workbench. The initialization may only happen after it has been configured (either by the application, the command line or a user interaction). However this forbids loading classes from that bundle until configuration has happened. Using the Jobs API allows for using scheduling rules and doing things asynchronously. However it does not guarantee that the workbench location has been configured unless done programmatically. Currently this seems like choosing between the lesser of two evils, since the workbench crashes if classes from the resource bundle are loaded too early. On the other hand consumers receive NPEs if the workbench has not been initialized yet. Switching to an event bases system seems reasonable for me and since the OSGi service seems to be a natural fit I would go for this, even if clients relied on runtime behavior. Maybe a combination with a job / scheduling rule during initialization is reasonable since it needs to lock the workspace anyways then. Clients would have a clean migration path since they all must wait on the workbench to become available until they may proceed or an error will occur. Maybe providing a legacy mode is another option here.
There hasn't been any work on this recently. I'm happy to take it over if there aren't any objections?
(In reply to Alex Blewitt from comment #23) > There hasn't been any work on this recently. I'm happy to take it over if > there aren't any objections? Glad you have a look.