Community
Participate
Working Groups
20070606 I got in a state where Eclipse took over almost all of the CPU time for the last 30 minutes. CTrl + Break didn't work, but I could connect to the VM with a profiled. According to the profiler, all time is spend in org.eclipse.team.internal.core.streams.TimeoutOutputStream.runThread() org.eclipse.team.internal.core.streams.TimeoutOutputStream.writeUntilDone() java.io.BufferedOutputStream.flush() com.jcraft.jsch.Channel$1.write(int) com.jcraft.jsch.Channel$1.write(byte[], int, int) Ui was not bloicked, but just very slow. No operations shown in the progress view. I had to kill Eclipse to continue.
This could be related to the fact that I just ran an rlog command against the server to extract all the userids that have committed code to the repository so we could include them in the IP log. The command took quite a while to run. This may have lead to a situation where the connection never timed out but sending and receiving data may have been affected. It would be good to know what you were doing so we can try and reproduce this. We're you synchronizing?
I think it was something that ran in the background, maybe quick diff. I later managed to close all editors (that took 15 minutes) but CPU was still blocked.
Is that problem reproducible? In JSch 0.1.31 source code, I found lines, where CPU time is eaten, but I have not found the exact reason. It seems that the channel was not been correctly initialized due to some rare reason. If that problem occurs on jsch 0.1.28, Eclipse SDK will not be blocked, but ArrayIndexOutOfBoundException will be thrown. I have modified jsch 0.1.31 to work around such blocking and named it as 0.1.31.1. It is available at http://eclipse.jcraft.com/plugins/com.jcraft.jsch_0.1.31.1.jar and you can update it automatically at http://eclipse.jcraft.com/site.xml Is it acceptable to use 0.1.31.1? If not, IMHO, Eclipse 3.3 should use jsch 0.1.28. Of course, we will publicly release the source code of 0.1.31.1, and the next formal release 0.1.34 will also include same fix.
I have a question for the reporter. At that time, did you get an error message dialog like "I/O has been interrupted."?
I can't reproduce this problem and it is the first time I've seen it happen. There was no error dialog and also no error in the log, but maybe it got lost because I had to kill the process.
At this stage in the release cycle it is too late to make any changes (i.e. we are shipping with 0.1.31). If, over the next few weeks, we start seeing more reports like this, we can investigate the possibility of including 0.1.31.1 in 3.3.1.
(In reply to comment #5) > I can't reproduce this problem and it is the first time I've seen it happen. > There was no error dialog and also no error in the log, but maybe it got lost > because I had to kill the process. Thank you for information. If that problem written at comment #3 had occurred, any data will not be transfered and the error dialog will appear soon thanks to TimeoutOutputStream. So, I guess that data had been sent continuously. Now, I think that my comment #3 is not related to this problem and Michael's guess at #1 may be right. I'll withdraw jsch 0.1.31.1.
I believe this is caused by what I described in comment 1. As such, it is not necessary to address this for 3.3.1.
Created attachment 79389 [details] Thread dump of Eclipse eating 100% CPU I think I just ran into the very same issue, with the brand-new Eclipse 3.3.1 Here is what I did: * Start Eclipse, Team > Synchronize --> Found several files to be synchronized, because local resources were not up-to-date --> Dbl click each of them in a row to open compare viewer (about 50 of them). Compare viewer opened and closed immediately because it found that there were not really diffs and just the local file had been out of sync. * Clear Error Log because it showed lots of "Resource out of Sync" * Select all projects, Team > Tag as Version "R2_0_1" --> Tagging ran for a while, but I did _not_ put it into background * When tagging was done, open a few editors, with History open * Work in a TM Terminal At that point, I noticed Eclipse taking 100% CPU. I ran jconsole, and the StackTrace tool from http://www.adaptj.com/root/main/tracehowtos Trying to find out who the culprit was, I closed all open editors and all Perspectives. Result was the Thread dump attached. I find it odd that there are _two_ instances of TimeoutOutputStream. These seem to be the only ones who are always RUNNABLE so I assume it's them eating the CPU.
I did not see any other errors logged, or any error dialog. Progress View was empty. I wonder how likely this scenario is to occur? I wonder who is really hogging the CPU by constantly running code? In the jconsole, I see the memory usage run like a sawtooth pattern, so it looks like somebody is constantly creating objects which gc later removes. I inspected the code in JSch's Channel.java around line 275, and I'm not sure but it looks like this could loop forever in the case where the remote minimum packet size (rmpsize) is very small or even still zero, and dataLen==0: public void write(byte[] buf, int s, int l) throws java.io.IOException{ /*...*/ byte[] _buf=buffer.buffer; //initialized to rmpsize int _bufl=_buf.length; //_bufl==rmpsize, e.g. 0 while(l>0){ //user wants to write e.g. 32 bytes int _l=l; if(l>_bufl-(14+dataLen)-32-20){ //_bufl very small so this holds true _l=_bufl-(14+dataLen)-32-20; //_l gets negative } if(_l<=0){ flush(); //does nothing if dataLen==0 continue; //back to the loop, l is unchanged!! } } Atsuhiko could you investigate? I'm wondering under what circumstances this OutputStream is used and how likely this issue is to occur... (I tried debugging it but never dropped into the code in question)
I also noticed that when making multiple thread dumps at different times, they all looked exactly the same: particularly, the TimeoutInputStream's were always running in the same Thread, locking the same objects, being always RUNNABLE and never in a stack frame below flush() So I tend to assume that this issue is indeed an endless loop in the OutputStream's write() method with a very specific set of parameters given such that the "l" variable is never modified because flush() returns immediately, and thus the while() loop never ends.
May I ask you to download http://eclipse.jcraft.com/plugins/com.jcraft.jsch_0.1.31.1.jar to your plugins/ directory? # You may need to remove com.jcraft.jsch_0.1.31.jar to enable that new jar. As I wrote at comment #3 > In JSch 0.1.31 source code, I found lines, where CPU time is eaten, but > I have not found the exact reason. It seems that the channel was not been > correctly initialized due to some rare reason. That new jar has creafully initialized the channel.
Would you ask me to perform any tests with 0.1.31.1 or would it just be to make my environment safe, and rule out potential problem candidates in case it happens again for me? I'd still like to know under what circumstances Eclipse Team/CVS uses Channel.getOutputStream() such that I'd have a better chance trying to reproduce the issue / verify the fix. Thanks!
(In reply to comment #13) > Would you ask me to perform any tests with 0.1.31.1 or would it just be to make > my environment safe, and rule out potential problem candidates in case it > happens again for me? It seems you can reproduce that problem very easily, so I just ask you to test if you can reproduce it with 0.1.31.1 or not. If you can, my guess is not right. > I'd still like to know under what circumstances Eclipse Team/CVS uses > Channel.getOutputStream() such that I'd have a better chance trying to > reproduce the issue / verify the fix. out=channel.getOutputStream(); channel.connect(); out.write("??"); If the channel is not ready before writing data to the output stream, that problem will happen. The channel will be initialized by other thread and, due to heavy load, there is the possibility that the initialization may be delayed.
No it's not easy to reproduce... looks like the system must be under really heavy load, like Michael indicated with his running a large remote cvs rlog. My several thread dumps were all from one and the same occurrence, I haven't been able to reproduce it since (with the old jsch-0.1.31). I guess I'm still going to install jsch-0.1.31.1 though.
I have not been able to reproduce this problem yet, but I have found the bug, which may cause this problem and fixed it. That fix has been included in jsch-0.1.35, which is already released. If you have opportunities to run a large remote cvs rlog, please try http://eclipse.jcraft.com/plugins/com.jcraft.jsch_0.1.35.jar (or install it via the update site, http://eclipse.jcraft.com/site.xml) As for changes since 0.1.31 included in Eclipse SDK 3.3, there was not any code from others(third parities), so I hope that the legal review will not take so much time.
Atsuhiko, since you are the provider of the Jsch client and an Eclipse committer, it may be easiest if you submit the request to the IP team for permission to include the latest JSch client in Eclipse. You can issue the request by going to the My Foundation Portal at: https://dev.eclipse.org/portal/myfoundation/portal/portal.php Once you log in using your dev.eclipse user name and password, you can click on the [request] link in the IP Policy section. The fields should be fairly easy for you to fill out.
(In reply to comment #17) > Atsuhiko, since you are the provider of the Jsch client and an Eclipse > committer, it may be easiest if you submit the request to the IP team for > permission to include the latest JSch client in Eclipse. You can issue the > request by going to the My Foundation Portal at: > https://dev.eclipse.org/portal/myfoundation/portal/portal.php Thank you for your suggestion. I have filed it; CQ 1826.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.