Bug 225360 - [files] Deadlock on Startup with a remote file in the Editor
Summary: [files] Deadlock on Startup with a remote file in the Editor
Status: RESOLVED DUPLICATE of bug 228353
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P3 critical (vote)
Target Milestone: 3.0 M7   Edit
Assignee: Martin Oberhuber CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords:
Depends on: 182363 190231 218304 228353
Blocks:
  Show dependency tree
 
Reported: 2008-04-02 11:24 EDT by Ryan CLA
Modified: 2008-09-16 10:35 EDT (History)
0 users

See Also:


Attachments
dump of hung eclipse (26.74 KB, text/plain)
2008-04-02 11:24 EDT, Ryan CLA
no flags Details
copy of .log file with error messages. (19.08 KB, application/octet-stream)
2008-04-02 14:30 EDT, Ryan CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan CLA 2008-04-02 11:24:41 EDT
Created attachment 94558 [details]
dump of hung eclipse

Build ID: M20071023-1652

Steps To Reproduce:
Eclipse will no longer boot up in my default workspace. I've dumped a log while it's hung and there are several references to waiting on an object monitor, followed by some references to RSE classes, e.g.: org.eclipse.rse.internal.subsystems.files.core.Activator$1.run(Activator.java:54)

It was working ok yesterday, except I noticed that at one point I had to restart to browse one remote connection. I can load a different workspace (not my default) after I have hung the original. It says something like "default workspace is in use", unfortunately I don't have another workspace with my info in it. I'm kinda screwed here as I need to use the workspace, and I'm not sure if I can some how migrate my settings from this crashed workspace to another. Please help get my eclipse back up and running.

More information:
Comment 1 Ryan CLA 2008-04-02 14:30:49 EDT
Created attachment 94596 [details]
copy of .log file with error messages.

copy of .log file with error messages.
Comment 2 Ryan CLA 2008-04-02 14:37:37 EDT
After further investigation, I attempted to follow these instruction (from http://dev.zhourenjian.com/blog/2007/11/07/eclipse-freezing-on-start.html):

I opened file
workspace\.metadata\.log
there was a line saying:

    The workspace exited with unsaved changes in the previous session; refreshing workspace to recover changes 

After taking many tries, I found a solution for this bug. Just remove folder
workspace\.metadata\.plugins\org.eclipse.core.resources\.root\.indexes
and restart Eclipse.

That didn't work, Eclipse was still hanging. So I removed .metadata/.plugins/org.eclipse.core.resources/.safetable and restarted. That gave me a popup error stating "An error has occurred. See the log file ... \workspace\.metadata\.log"

In looking in the log file, I get the following sorts of error messages:

Caused by: org.eclipse.core.internal.dtree.ObjectNotFoundException: Tree element '/RemoteSystemsTempFiles/SERVER/file'  not found.

I've attached a complete .log file. Also I've verified that the file in question  existed in the temp files directory. I've also tried completely removing the directory, but I still get the same error message saying the tree element can't be found.

Is there a way to "clean up" so eclipse will start without this problem? I don't really need the temp files as I can just download new copies.

Comment 3 Ryan CLA 2008-04-02 14:56:42 EDT
Ok, now I deleted the .snap file in:
workspace\.metadata\.plugins\org.eclipse.core.resources

as it seemed to be in the same general area as the other files I deleted and when I viewed it, the contents seemed related to all of the RSE connections I had. I then started eclipse. And it started!!! But it had a couple errors, which is fine. I closed the two files that the editor attempted to open and I think I'm back to normal.
Comment 4 Martin Oberhuber CLA 2008-04-21 08:52:59 EDT
Analyzing the Log, the following seems to happen:

1.) Thread "files.ui adapter loader": On Eclipse Start, the
    subsystems.files.core plugin starts the adapter loader Thread.

1.1.) That Thread forces activation of rse.files.ui plugin very early. As
      part of its Activator initialization sequence, it calls
          SystemRemoteEditManager.refreshRemoteEditProject()
      which performs a Workspace Operation (and thus acquires a Workspace Lock)
      very early.

2.) Thread "[main]": The Dispatch thread is used at Eclipse Start by Eclipse
    trying to restore the editors that you had open during the previous 
    session:
         EditorManager.setVisibleEditor()
    This, in turn, leads to loading the text editor (also on the main Thread):
         StatusTextEditor.doSetInput()
    which in turn leads the FileDocumentProvider to refresh its file:
         File.refreshLocal()
    This is a Workspace Operation, so it requires the workspace lock that had
    been obtained in (1.1) above, and locks the Main Thread until that lock
    is available.

3.) There is two more Threads which require a Display.syncExec() access to the
    main Thread in order to do their work. Since that one is locked due to (2)
    they cannot continue:

3.1.) Thread "Worker-0" is busy refreshing an EFS-Shared file:
          Display.syncExec()
          RSEFileStoreImpl.getConnectedFileSubSystem()
          localstore.UnifiedTree.createChildForLinkedResource()
          localstore.FileSystemResourceManager.refresh()
      but cannot continue because the Dispatch Thread is currently taken.

3.2.) Thread "Thread-2" is busy initializing the Workbench itself:
          Display.syncExec()
          EditorManager.restoreState()
          WorkbenchPage.restoreState()
          WorkbenchConfigurer.restoreState()
          WorkbenchAdvisor$1.run()
      This Thread seems to be the reason for the editor activation taking
      place in step (2).


Looking at this analysis, I see two possible causes for the 
deadlock that you observe:

(a) You had quit the Workbench with an open editor that was editing a file
    on an EFS-shared file provided by RSE. On Workbench Startup, Eclipse 
    tries to re-open that editor on the dispatch thread, but RSE cannot
    provide the editor contents because it also requires the dispatch thread
    in order to do the subSystem.connect() that's required.

(b) A somewhat more complex scenario that also involves the "files.ui adapter
    loader" - which cannot continue because it's being blocked by a related
    Workspace Job on the same scheduling rule.

Now in case option (b) is true, that would be good for us because the problem would most likely be fixed with the fix for bug 197167, which defers the UI Adapter Loading to a later time. So the issue would be fixed with RSE 3.0M6.

In case the adapter loader is not related, and option (a) is true, there's really only two ingredients in the deadlock: The editor performing a load-file on the dispatch Thread (which it shouldn't do), and the RSE EFS Provider requiring the dispatch Thread for connect (which it also shouldn't do). Following these thoughts, there are two possibilities breaking the deadlock and thus fixing the issue:

(i) Platform Editor not performing any load-file operations while the dispatch
    thread is owned. This is IMHO a no-no anyways because load-file can be 
    a long running operation and should thus not happen while the dispatch
    thread is owned. We should either file a bug against the Platform for this,
    or (most likely) find an existing bug in the Platform for this and link
    to it. At the very least, we can expect the Platform Editor to have a 
    watchdog which kills a hanging thread that tries to load something after
    a fixed timeout - just to avoid deadlock (like OSGi Bundle Activators 
    do it).

(ii) In RSE, we could avoid the Display.syncExec() in the subSystem connect 
     Thread, if we can. This means that if we already have a saved password,
     we should use it without switching to the dispatch thread; and, only
     switch to the dispatch thread if we need to ask for a password.
     Now this would fix the problem in most cases (where a stored password)
     is available; but it would not fix the problem when we need to ask for
     a password. In any case, problem (ii) can most likely be addressed along
     with bug 190231 so I'm marking that one as dependent bug.

Based on the Analysis, I set severity CRITICAL since it locked out all of Eclipse. I also changed the Summary, previous value was:
eclipse won't start up with RSE

Ryan -- in order to verify the theory, and in order to see whether problem (b) can be ruled out: Can you confirm that you had an EFS-shared RSE-provided file open in the editor before you quit, and that this was in fact the problem? And: Can you please update to RSE 3.0M6 and see whether you can still reproduce the problem there, or whether it's indeed fixed with RSE 3.0M6?

Thanks!
Comment 5 Martin Oberhuber CLA 2008-04-21 08:54:38 EDT
Ryan: At any rate, please provide as exact as possible description what file you had in the editor before you quit Eclipse and made your workspace hang; and, what sort of RSE connection it was; and, whether you have saved a password for that connection; and, whether you can reproduce it with RSE 3.0M6 or not. Thanks!
Comment 6 Martin Oberhuber CLA 2008-04-21 09:44:09 EDT
Correction: The problem (b) would not be fixed with bug 197167, but with bug 218304. The fix for bug 218304, however, apparently has some other problematic implication as shown in bug 227944: In that case, deferred loading of the adapters again leads to issues though not as problematic as here (only an errorlog entry, but not a deadlock). 
Comment 7 Ryan CLA 2008-04-21 10:51:38 EDT
>please provide as exact as possible description what file you had in the editor before you quit Eclipse and made your workspace hang;

I believe I had two files in the editor, both would have been temp files that
were downloaded from RSE using SSH Only. Both would have been saved, and should
have been in sync with the server when I exited eclipse. I believe they were
either shell files or ant files, if they were shell files, they would have used
the ShellEd editor, and if ant, then the built in ant editor. It's also
possible they were xml files and used the xml editor provided by web tools. I
would have had saved passwords for the connections the files used. It's
possible they were on different servers, or on the same server using different
connections (because of a different username)

It occurs to me that if eclipse were to hang for another reason and exit
ungracefully (which NEVER happens), that could leave the files in an 'unsaved
changes' mode, which might be a different case altogether.

I'm probably not going to be able to reproduce this bug (unless it just happens
again). I think when I deleted the various files to get eclipse to boot up, I
also lost some connection information that I had to recreate. I've recently
downloaded 3.0M6 and am using that in conjunction with eclipse 3.4. I'm not
using it for all my daily work yet, but certainly if the error happens again,
you'll be the first to know.

It would be good to know a safe way to restore a workspace, and a safe way to
copy/backup all of the connection settings. It's a fair amount of work to
recreate connections with all of the filters, I don't mind so much if a temp
copy of a file I have on the server is lost in the event of a system failure.
Comment 8 Martin Oberhuber CLA 2008-04-22 13:54:40 EDT
A feature for exporting all your profiles, connections and filters onto a ZIP file will be added with bug 216858 / bug 189274. Until this is complete, you can simply backup your connections with

   zip -a C:/rse_backup.zip <workspace>/.metadata/.plugins/org.eclipse.rse.core/Profiles

Are you sure that you did not have any EFS-provided files? (File > New > Advanced > Link to folder in file system > RSE > ... or RSE "Create Remote Project") ? So you were using the RSE SystemView in the Remote Systems Perspective only?

With 3.0M6. the deadlock should no longer happen and you should be in the bug 227944 situation. I think, though, that the fix for this should be relatively easy by moving SystemRemoteEditManager initialization out of the files.ui Activator.start() method and into a deferred startup Thread -- I'll file a separate bug for that. The final solution, however, will be with bug 182363 I guess.
Comment 9 Ryan CLA 2008-04-22 14:32:32 EDT
>Are you sure that you did not have any EFS-provided files? (File > New >
Advanced > Link to folder in file system > RSE > ... or RSE "Create Remote
Project") ? So you were using the RSE SystemView in the Remote Systems
Perspective only?

I am pretty sure I was using the Remote Systems View in the Remote Systems Perspective. I haven't really done anything with File > New [folder] > Advanced Link to folder in file system > RSE > ... or RSE "Create Remote
Project", I'm sure you'll hear about it when I do though ;-)
Comment 10 Martin Oberhuber CLA 2008-04-22 14:47:45 EDT
Thanks. Based on your word, I change the summary - previous value was:
[efs] Deadlock on Startup with an EFS-shared RSE-provided file in the Editor

The thing, however, is that this part of your thread dump definitely identifies an EFS-shared file:

3.1.) Thread "Worker-0" is busy refreshing an EFS-Shared file:
          Display.syncExec()
          RSEFileStoreImpl.getConnectedFileSubSystem()
          localstore.UnifiedTree.createChildForLinkedResource()
          localstore.FileSystemResourceManager.refresh()
      but cannot continue because the Dispatch Thread is currently taken.

So you MUST have created a remote project or remote linked resource somehow. Is there any chance that your workspace (.project file(s)) is still available? The .project file(s) hold the information about the linked resources.
Comment 11 Martin Oberhuber CLA 2008-05-20 17:44:59 EDT
Bulk update of target milestone
Comment 12 Martin Oberhuber CLA 2008-09-16 10:35:39 EDT
After checking the logs again, I'm very confident that this has actually been fixed with the fix for bug 228353.

We haven't got any report about such behavior again, and bug 228353 does address the most problematic issues.

*** This bug has been marked as a duplicate of bug 228353 ***