Community
Participate
Working Groups
Created attachment 156810 [details] In this session RSE crashed I'm not exactly sure what is happening here. I'm editing some files on my server at home, and suddenly my Eclipse crashes. At some point RSE can't sync or list a file or folder contents. The process of doing so is pending in the background. Every action after this (save, refresh etc) gets me to a full crash where Eclipse won't respond to anything i do and i have to manually kill the process. I can't really get much clearer then this. Sometimes i can work for hours without any crashes, sometimes i have crashes for every 2 files i'm trying to edit. In the meanwhile my files are not saved or synced so I'm losing work here. I am using * Eclipse 3.5.1 from the ubuntu repositories * Ubuntu... (2.6.31-17, but had same with 16) * RSE End-User Runtime 3.1.1.v200907301400 * java-6-openjdk Today i've been so lucky (...) to have lots of crashes, so it was pretty easy to get a clean logfile. Attached file ([projectfolder]/.metadata/.log) was empty when i started the session, and this is whats left after i killed the process. However it doesn't look very informative for this particular problem. The logs in .metadata/.plugins/org.eclipse.rse.* are empty. Are there any more places i can look?
From your description, it sounds like you are experiencing a deadlock (product gets unresponsive), and not a crash -- correct? Fortunately, it is quite easy to get a thread dump out of the deadlocked application, and such a thread dump typically makes analyzing the situation quite straightforward. See http://wiki.eclipse.org/How_to_report_a_deadlock for how to obtain the thread dump, please attach it here on the bug. What connection type are you using in RSE, I assume SSH? I increased priority and severity based on your description. Thanks for reporting the exact version numbers used.
Created attachment 157127 [details] a JStack trace of a deadlocked eclipse
I believe you are right about it being a deadlock rather than a crash. I had no luck killing the process with kill -3, it did not give me any output nor did it kill the application. I used the jstack application to get the output i posted in a new attachment. I am connected to this remote machine via (and only via) FTP.
Now the thread dump does help! Quite apparently the problem is that multiple Threads send a NOOP command to the remote and expect a response; but since the channel to the remote is not multi-threaded it is not clear what channel should receive the response, and some senders end up waiting forever. If the main Thread is waiting, Eclipse is dead. The real problem here is that in the org.eclipse.rse.services.files.ftp plugin, FTPService.isConnected() communicates with the remote via the command channel (by sending a NOOP and expecting a response), but this communication is not gated through the _commandMutex semaphore to ensure single-threaded communication. I think that this must be fixed together with bug 269171. Since the isConnected() method can apparently be called on the main Thread, we must make sure that isConnected() can always return quickly without waiting for a remote response to return. One option for doing so is relying on local socket state to see whether we are connected; another is caching the connected state and updating that cached state in a background thread if a timeout has occurred. In both cases, we may need to revisit the current feature of automatically re-connecting when the remote server has timed out on the connection and closed the socket (but we still think that we are connected). Comparing behavior against what we do in SSH should help getting the logic right. I'll be happy to help any interested community contributor with getting this right and will accept patches. As mentioned, looking at FTPService.java for fixing this should be sufficient.
Created attachment 157224 [details] Simple patch fixing the issue Attached is a simple patch fixing this issue. This patch introduces a regression for bug 243921 -- connection timeout will likely lead to an error expanding folders again. But this product hang-up issue seems more severe than bug 243921. Joost, do you think you have a chance testing this patch? Ideas for solving bug 243921 differently (without sending NOOP) would be appreciated.
Created attachment 157328 [details] Stacktrace of a patched deadlocked RSE
Created attachment 157329 [details] Console output of Eclipse launching a patched RSE
Created attachment 157330 [details] Patched RSE/Eclipse crashing
My results so far. I'm at a WinXP sp3 machine now, same RSE version and Sun Java 1.6.0_14 1. I created a new workspace 2. Imported the cvs workspaces using http://www.eclipse.org/dsdp/tm/development/rse-anonymous.psf 3. I applied the patch to org.eclise.rse.services.files.ftp 4. Made an Eclipse Application run configuration (detault settings) 5. On the first launch i only got a firewall (windows) warning about java wanted to access internet, I choose to block it (yeah, weird right?). I got some messages in my console (see output1.txt). Nearly all of my features are installed, including rse. 6. When i connected to an ftp and opened the root directory, Eclipse deadlocked, although it still paints the window once in a while (and its 'Running' in taskmanager). The stacktrace is in stacktrace1.txt. The launching Eclipse now keeps getting (with one or two minutes interval) Error while logging event loop exception: java.lang.OutOfMemoryError: PermGen space Logging exception: java.lang.OutOfMemoryError: PermGen space I have 1 Eclipse process which is steady using about 275Mb ram I have 1 javaw process which is using about 200~220Mb, clicking stuff or moving makes cpu usage skyrocket, otherwise its calm 7 Killed all and tried again. Lets try and allow Eclipse to reach the FTP in the first place now... (that is, at launching the app, not on connecting to ftp) 8. I succesfully connected to the ftp, but opening the file make it crash (crash this time because the launching eclipse caught and killed it). I shortly got a (uncopyable) error popup saying the php editor couldn't be opened. See log1.txt 9. Relauching, opened the file the regular text editor now. It succesfully opened the file but gave a ParmGen space error on intializing php tooling. Is it worth trying to disable or avoid using anything else than RSE or did the patch break something it shouldn't have?
I see my file-names are not kept in here Stacktrace of a patched deadlocked RSE - stacktrace1.txt Console output of Eclipse launching a patched RSE - output1.txt Patched RSE/Eclipse crashing - Forgot to mention this, its the log from the launch where i opened a file but the php editor couldn't be opened
Comment on attachment 157329 [details] Console output of Eclipse launching a patched RSE Hiding irrelevant messages
Comment on attachment 157330 [details] Patched RSE/Eclipse crashing Hiding irrelevant .log
(In reply to comment #9) I cannot see anything related to RSE in your thread dump. From your description, it looks like the problem is just that your host-eclipse ran out of PermGen memory (not surprising, looking at how much stuff you seem to have with Aptana, DLTK etc). The OutOfMem of your debug host seems to cause the slow response times in your debuggee. Try launching your debug host with -vmargs -XX:MaxPermSize=256m as described in section "4. Running Eclipse" of the file readme/readme_eclipse.html that comes with your Eclipse SDK.
Created attachment 162117 [details] Additional patch Attached additional patch ensures that * NOOP is not sent more often that every 30 seconds for checking connection * NOOP is never sent outside command mutex protected region * Console logging does not print plaintext password after reconnect * setReadOnly() and setTimeStamp() are capable of reconnect-on-demand if needed I ran a couple of tests on multiple hosts, allowing connections to run into timeouts. Apparently the original issue from bug 243921 which led to adding the NOOP commands and thus introduced this regression is now fixed differently, so the excessive sending of NOOP in isConnected() is no longer necessary. I think that this issue is finally fixed now.
Released into 3.2M6. Thanks Joost for your thread dumps and diligent investigations, which finally made fixing this possible.
*** Bug 280515 has been marked as a duplicate of this bug. ***
*** Bug 281731 has been marked as a duplicate of this bug. ***
*** Bug 260796 has been marked as a duplicate of this bug. ***
*** Bug 300455 has been marked as a duplicate of this bug. ***
*** Bug 255117 has been marked as a duplicate of this bug. ***
*** Bug 232915 has been marked as a duplicate of this bug. ***