Bug 191374 - [performance] Refresh a big directory in SSH connection takes very long time
Summary: [performance] Refresh a big directory in SSH connection takes very long time
Status: CLOSED WORKSFORME
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P2 major (vote)
Target Milestone: 2.0.1   Edit
Assignee: Xuan Chen CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: performance
Depends on: 196662 196664
Blocks: 198143
  Show dependency tree
 
Reported: 2007-06-06 17:17 EDT by Xuan Chen CLA
Modified: 2008-01-23 13:09 EST (History)
1 user (show)

See Also:


Attachments
Stackdump when the workspace hangs (10.64 KB, text/plain)
2007-07-25 17:05 EDT, Xuan Chen CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xuan Chen CLA 2007-06-06 17:17:59 EDT
This problem was found when I tried to look into bug 191280.

To mimic Martin's scenario, I wrote a program to duplicate about 4000 folders in one of my directory in a Linux machine (directory named "dummy").

I expanded dummy in a SSH connection, and got the results almost right away.
The I refreshed this directory, and its job just kept on running.  It eventually finished, but took at least 20 mins.  The whole IDE hang (even it seems was running in a separated job).

-----------Enter bugs above this line-----------
TM RC2 Testing
installation : eclipse-platform-3.3M7 (I20070503-1400)
RSE install  : RSE-SDK-2.0RC2.zip
java.runtime : Sun 1.5.0_06-b05
os.name:     : Windows XP 5.1, Service Pack 2
------------------------------------------------
systemtype   : Linux - SSH
Comment 1 Martin Oberhuber CLA 2007-06-06 17:34:24 EDT
Ouch. Refresh should not take longer than initial population.
Comment 2 Xuan Chen CLA 2007-06-08 10:29:49 EDT
Kevin tried it in the lastest dev driver, and could not reproduce it.  So I gave it a try again today, using RC2 driver (the driver used for testing before), and could not reproduce it either.

So I will close it for now.
Comment 3 Kevin Doyle CLA 2007-06-08 10:52:47 EDT
Kushal and I tried this and can't reproduce it using the latest dev driver with eclipse 3.3RC3.  Sometimes it took a little longer to refresh the directory and others it looked to take the same amount of time.
Comment 4 Xuan Chen CLA 2007-07-15 23:23:27 EDT
It seems I still got this problem occationally.  

I downloaded the 0713 2.0.1 driver, and tried this scenario.

The first time I refresh a very large folder (with 4000 subfolders), I waited for 3 mins.  The UI was frozen.  Then I need to step away from my computer.  When I came back around 15 mins, the folder had been expanded correctly.
I refreshed it again, and it took more than 11 mins, and the UI is totally frozen.  I have to kill the workbench.  Then I started it again, and I did not have this problem any more (I've tried at least 7-8 time, and no problem at all).
Comment 5 Martin Oberhuber CLA 2007-07-16 11:10:52 EDT
I tried the scenario and found two potential problems, which I created separate bugs to track:

1. Bug 196662 - During Refresh, a getFile() query is made on the dispatch 
   thread. Since SSH only has a single dirChannel, the backgroundThread can
   block the UI thread through the mutex being used, leading all of Eclipse to
   freeze. I marked that bug as Major / P2.

2. Bug 196664 - When refreshing multiple times in a row, more and more getFile()
   queries are scheduled on parents of the given directory.

Issue 2 could be the reason why it sometimes works and sometimes not, although I'd personally think that the long delay comes from problems with the file system on the remote that we cannot get fixed - so our only chance is to make sure all remote queries are done in the background.

Reopening this to reverify when the two depenent bugs are addressed.
Comment 6 Martin Oberhuber CLA 2007-07-16 11:11:37 EDT
Assigning Xuan to reverify once the dependent bugs are addressed, since she's got the 4000 file setup.
Comment 7 Xuan Chen CLA 2007-07-25 17:02:55 EDT
I sync-ed up with the cvs, and tried this scenario again.

I got this problem again, and now, already 20 min pass, I still got the hanging.

I will attach the stack dump for it.

I've taken around 4 stack dumps, and they are similar.  I will attach just one of them.

Comment 8 Xuan Chen CLA 2007-07-25 17:04:56 EDT
Just as I finished typing last comment, the operation completed.  Now my workspace space back to normal.  It took about 5:04 - 4:39 = 25 mins.
Comment 9 Xuan Chen CLA 2007-07-25 17:05:53 EDT
Created attachment 74626 [details]
Stackdump when the workspace hangs
Comment 10 Martin Oberhuber CLA 2007-07-25 17:18:12 EDT
Did you really test this with SSH? - The stackdump seems to indicate dstore is running, but no SSH.

From the stackdump, it looks to me like the problem is in the main thread at SystemView.recursiveFindAllRemoteItemReferences() -- which would mean that the remote is not necessarily involved and the performance issue could be found with a profiler.

Did you ever try refreshing 4000 directories on Local?
Comment 11 Martin Oberhuber CLA 2007-07-25 17:23:24 EDT
Did you look at the Task manager / cpu utilization while this was running?
Was your cpu mostly idle (indicating a problem with the remote) or at 100% (indicating a problem with the refresh / recursiveFind)?
Comment 12 Xuan Chen CLA 2007-07-25 17:45:22 EDT
You are right.  I have two connections (one dstore and one ssh), and I accidently picked the dstore one and try.

I tried the ssh one twice, and it worked fine.

But I tried the dstore one, still got the same problem.

My workbench is still hanging, and it seems hang in the same spot.

My javaw.exe take about 30-40 % of my CPU.
Comment 13 Martin Oberhuber CLA 2007-07-27 11:52:15 EDT
Does this mean that the issue for SSH is no longer there?

Then we should either change the summary to reference dstore, or mark this bug fixed and create a new one for the dstore issue.
Comment 14 Xuan Chen CLA 2007-07-27 13:01:20 EDT
I gave it a try again, and SSH is fine.

But I still got the hang for dstore one.  And the stack dump shows the same place.

I will close this one and open a different bug to track the DStore problem.

Comment 15 Martin Oberhuber CLA 2007-09-10 09:21:56 EDT
Xuan: we're tracking dstore now with bug 198143, is it ok to close this ssh one?
Comment 16 Xuan Chen CLA 2007-09-10 09:37:39 EDT
I cannot reproduce it any more.
Comment 17 Xuan Chen CLA 2007-09-10 09:38:19 EDT
close it.