Bug 265504 - [Performance] IResource refreshLocal over-used => very poor performance for some users
Summary: [Performance] IResource refreshLocal over-used => very poor performance for s...
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 6.0   Edit
Hardware: All All
: P3 normal with 2 votes (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on: 313673
Blocks:
  Show dependency tree
 
Reported: 2009-02-19 13:38 EST by James Blackburn CLA
Modified: 2020-09-04 15:26 EDT (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James Blackburn CLA 2009-02-19 13:38:25 EST
Build ID: HEAD

Steps To Reproduce:
CDT eagerly attempts to bring the Workspace state into sync with the filesystem by running IResource#refreshLocal() whenever an external process may have been modified the fs.

This is bad because:
1)  #refreshLocal() requires the resources scheduling rule preventing the user from making further resource modifications. If this is a project, then the user is locked out of the project until refresh is done.
2) This ignores any RefreshProviders which give notification callbacks to the Workspace.
3) Most users of #refreshLocal() don't care about refreshs happening in the same thread (even standard external make doesn't care). They just care that the resource tree is brought up to date in a reasonably timely manner.

Because of (1) a user who's iteratively developing & debugging will have at least 2 project refreshes each time they launch their program. 1 for the end of build (+1 for build before launch) 1 for end of launch.  On NFS  this takes seconds, on ClearCase this takes minutes.
Because of (2) these refreshes may be entirely unnecessary.  Eclipse has a refresh provider for Windows, and Rational provide a refresh provider for ClearCase. Other providers have been proposed / developed for Linux & Mac OS X.


What the platform provides:

The platform provider provides a Polling Refresh Monitor and a refresh provider for Windows.  The polling provider uses a maximum of 2.5 -> 5% of CPU (in a low priority thread) when there is no platform specific refresh provider.  This _doesn't_ hold any scheduling rules during the file stat(), scheduling a #refreshLocal for out of sync resources (requiring a resource scheduling rule).  If nothing's changed, there is no concurrency penalty.

The polling provider or any other external refresh monitor is enabled when "Refresh Automatically" is enabled.  This is disabled by default due to a platform decision: they (John A) don't see external modification of Java resources frequently enough.  In the CDT clearly we do see this problem.

Proposed fix:

1) Enable the "Refresh Automatically" by default in the CDT product.
2) Create our own refresh manager in cdt.core to be used by all the places that currently call #refreshLocal() but don't care that resources are refreshed immediately.
3) Refresh Manager would use isSynchronized(...) before scheduling the resource for refresh with potentially more fine-grained resource out of sync discovery (rather than stat()'ing the IProject tree once and then again during the refresh, we could iteratively stat the resources in the tree).

What would be good:
4) RefreshManager would, if possible, queue resource's with the platform's RefreshManager for refresh. If not, just do refreshLocal.
5) Refresh Manager might discover if the resource is already covered by a refresh monitor; if so, no need to refresh.

There are some core.resource bugs that would help the above (in particular 4&5):
 - it turns out the polling monitor is under scheduled...
 - It would be neat to notify the platform that we want a resource refreshed best effort.  If there's already a refresh provider for the resource sub-tree => nothing to do.

Without any platform changes I believe we can improve the experience significantly. With the platform changes the CDT ResourceManager would just end up wrapping Platform API (much like ResourceLookup might one day do :) ).


Any thoughts / ideas more than welcome!

More information:
Related Bugs:
bug 265498 - 2 refreshLocal on launch tear down
bug 261695 - clearcase build performance problem (most likely refresh related)
bug 133881 - refresh projects is killing me
Comment 1 Chris Recoskie CLA 2009-02-19 14:57:32 EST
Refreshing in general is a thorny problem.  The big problem is, there is no one refresh policy that can satisfy everyone.

As an example, when working on remote projects, deep refreshes on large trees is very slow.  The platform likes to do this every time you open a closed project though.  In the remote situation, it makes more sense to have refreshes be largely "on demand" rather than automatic.

In the case of a managed build, we could do better by only refreshing the resources that are generated (which we should largely know by the input/output rules).  I think there's a bug somewhere requesting that... maybe one of the ones you mention but I'm not sure.  It doesn't work in all cases though because it's possible that your tool has secondary outputs which are not handled by the input/output chain.

In the case of standard make, it is very hard (nigh impossible) to know what to refresh, so we're stuck with either refreshing everything, or refreshing nothing.

I am not sure how your proposal meshes with the use case you mention.  Maybe I just need more detail.  To my mind, if someone hits the debug button, which then triggers a build, then the debug session can't be created until there's an executable to debug, which means that we need to wait for the refresh to finish.  This is not a good example of something that doesn't care if the refresh is immediate, because it can't continue properly until the refresh is done.

I agree that minimizing the number of refreshes by eliminating duplicate refreshes is a good thing.  I don't think launching should initiate a refresh at all... that should come as a result of the build.
Comment 2 James Blackburn CLA 2009-02-19 17:18:32 EST
(In reply to comment #1)
Thanks for taking the time to reply Chris!

> Refreshing in general is a thorny problem.  The big problem is, there is no one
> refresh policy that can satisfy everyone.

Ok. So the major point of the proposal is that we can improve interactive performance without making any changes to our refresh policy by taking into account:
1) IResource#isSynchronized() doesn't require any locks -- i.e. you can test whether you're out of sync without blocking the user
2) IResource#refreshLocal() is always locked -- updating the sync state requires locking the resource and all it's children
3) IRefreshMonitor is already implemented for Win32 / Clearcase (and in the future will be for MacOS and Linux)

Given (1) we can bring the workspace back into the sync with the fs piecemeal without blocking the user from saving resources.  Given (3) most the top-level refreshLocals on Win32 and ClearCase with MVFS block the user for no reason and waste CPU cycles.

> As an example, when working on remote projects, deep refreshes on large trees
> is very slow.  The platform likes to do this every time you open a closed
> project though.  

True. The platform uses it's RefreshManager -> RefreshJob for this which, while expensive and locking [it doesn't do anything clever with checking sync state(1); it uses multiple iterations of refreshLocal()].  The implementation of RefreshJob breaks the refreshLocal()s up into chunks so the Project's lock isn't held from beginning to end.

What _may_ be hurting you is that the RefreshJob is optimised for Java projects (i.e. projects with deep trees).  It looks like the algorithm refreshes breadth first iteratively (it does 2 layers at a time). So if you've got shallow Project with many files, yes you will suffer badly.  This was noted in bug 74392 comment 36 and it hasn't been fixed (Ed doesn't appear to have filed a bug -- and the code still seems to have this issue)...

bug 74392 comment 22 John A discusses the implementation and his intentions:
"It performs population of new projects in the background, in small chunks of work so that
the user can continue browsing and modifying the workspace in the meantime. 
When an attempt is made to access a directory that has not yet been populated
(for example by expanding in the Navigator), it moves that discovery request to
the front of the refresh queue. It does not implement the "Pending..." and busy
cursor behaviour in the CVS repository view, but since local refresh of a
single directory (just discovering one level of children) is never likely to
take a long  time I'm not sure if this is necessary.

I have been experimenting with this implementation on a simulated "slow" file
system (I added sleeps to file system calls). It feels pretty good, but it
would be nice to get feedback from real users on these systems to see how it
behaves."

> In the remote situation, it makes more sense to have refreshes
> be largely "on demand" rather than automatic.

I agree. But I think the remote case is a wholly different one (and one I'm interested in exploring...) from the common case of the user who does everything - build / edit / index / debug / profile - locally.

> In the case of a managed build, we could do better by only refreshing the
> resources that are generated (which we should largely know by the input/output
> rules).  I think there's a bug somewhere requesting that... maybe one of the
> ones you mention but I'm not sure.  It doesn't work in all cases though because
> it's possible that your tool has secondary outputs which are not handled by the
> input/output chain.

Yes, that is the idea.  I purposefully didn't include managed build in the summary because it does need to make all its resource changes within the build job such that next time it's run IncrementalProjectBuilder#getDelta() can return the changed resources.  If it can refresh individual IResources rather than the whole project, then all the better!

> In the case of standard make, it is very hard (nigh impossible) to know what to
> refresh, so we're stuck with either refreshing everything, or refreshing
> nothing.

Well there is a middle ground. We bring the workspace back into sync asap after the build, using isSynchronized to check sync state and/or break the refreshLocal() into chunks as the platform does. During the build the WorkpsaceRoot rule is held so if we refresh the project during the build, we increase the time the user is prevented from editing files.  As far as I can see it makes no difference if the refresh is done in the build job, or out of it.  [In fact it almost makes sense, in my mind, to run Standard Builds off of our own Job so we can schedule it as we see fit and _don't_ lock the workspace...]

>  To my mind, if someone hits the debug button, which
> then triggers a build, then the debug session can't be created until there's an
> executable to debug, which means that we need to wait for the refresh to
> finish.  This is not a good example of something that doesn't care if the
> refresh is immediate, because it can't continue properly until the refresh is
> done.

The executable _does_ exist on the filesystem, and is ready to debug.  
If the launch is running some external tool, then that external tool will see the resource.  
If, as part of the Launch, an IResource needs to be accessed, then the launch can refresh the resource itself (if it cares about the sync state).  None of the read operations on resources require holding locks, and the API provides a way to say whether you care about the sync state. Sync state is obviously unimportant when running external tools.

At least this way round the Launch can refresh the resources it needs.  The builder doesn't have to refresh everything just-in-case.

When creating launch configurations, there is definitely a need to find IBinarys in the Project. Which is why we need to get the workspace back in sync asap. It just doesn't _necessarily_ need to block the launch but it can block the launch if the launch wants.  Effectively move the decision away from the builder to the consumer of the build.

> I agree that minimizing the number of refreshes by eliminating duplicate
> refreshes is a good thing.  I don't think launching should initiate a refresh
> at all... that should come as a result of the build.

Well we've got launches that produce output which is under the project.  We don't much care about this test data. But it's nice that Eclipse eventually notices it's there.  I suspect whoever wrote the code had something similar.

Presumably the refreshLocal()s have accumulated because people found they were out of sync when they didn't expect to be or didn't want to be, and someone complained, and I think we can do better using the approach outlined above.  

The remote project case & clearcase are likely to be the worst affected by all this currently; and the best potential winners from this.  But Windows will also win -- the user will get some CPU cycles back :).

Do let me know if any of the above doesn't make sense or you think I've missed something.