Bug 244979 - Lazy refresh
Summary: Lazy refresh
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.4   Edit
Hardware: PC Mac OS X - Carbon (unsup.)
: P3 enhancement with 6 votes (vote)
Target Milestone: ---   Edit
Assignee: Martin Oberhuber CLA
QA Contact:
URL:
Whiteboard:
Keywords: investigate, performance
: 244504 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-22 13:10 EDT by Kelly Campbell CLA
Modified: 2019-09-06 15:37 EDT (History)
15 users (show)

See Also:


Attachments
Stack from the thread running the refresh (3.01 KB, text/plain)
2008-08-22 13:10 EDT, Kelly Campbell CLA
no flags Details
First try of a fix (2.23 KB, patch)
2008-10-01 09:58 EDT, Jerome Lanneluc CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kelly Campbell CLA 2008-08-22 13:10:14 EDT
Created attachment 110702 [details]
Stack from the thread running the refresh

Build ID: I20080617-2000

Steps To Reproduce:
1. Create a java project
2. Link in an external folder sourcepath with many files
3. Refresh the project

The refresh can take a long time, over an hour in our case, and it does seem to be able to be canceled which usually causes the user to have to kill eclipse.


More information:
We have a shared readonly directory where we keep source for everything we use, e.g. third party opensource, a readonly view of the latest revision of our entire code from our revision control system, etc. Our eclipse project's classpath refers to built jars and those jars sourcepaths are pointed at this readonly directory for reference. 

This is very useful for developers who only need to edit a small portion of files but want to browse the other files. It's also very useful in debugging large binaries without checking all the code out from revision control.

With Europa, this worked fine. We were able to browse sources and debug large binaries. With Ganymede's new External Folder alias feature, the refresh doesn't complete in a reasonable amount of time.

To fix this, we have to remove any reference to the readonly directory from our sourcepaths in the .classpath file and delete the .metadata/.plugins/org.eclipse.jdt.core/.org.eclipse.jdt.core.external.folders file which contains the alias.

I will attach an example stack trace from the refresh thread.
Comment 1 Kelly Campbell CLA 2008-08-22 13:24:10 EDT
I had a typo in my initial description:

This: 
"it does seem to be able to be canceled which usually causes the user to have to kill eclipse"

should have said this:

"it doesn't seem to be able to be canceled which usually causes the user to have to kill eclipse"
Comment 2 Jerome Lanneluc CLA 2008-08-26 05:02:00 EDT
I'm confused. You said " Link in an external folder sourcepath" but the trace shows that an external library folder is being refreshed.

If you're linking an external library folder, how could this work in Europa since it was not possible to add an external library folder on the build path?
Comment 3 Kelly Campbell CLA 2008-08-27 09:59:22 EDT
Maybe I used some overloaded terminology so it wasn't completely clear.

The stack is from a refresh because we have a script which generates the .classpath file from our build. But the same thing happens when you add a source path manually in the UI. Here's an example entry for a third party jar file:

<classpathentry kind='lib' path='/third_party/log4j/log4j.jar' sourcepath='/third_party/all_java_src'/>

In Europa, the files in all_java_src are accessed only when needed during debugging or browsing code. In Ganymede, the entire directory is scanned during project refresh even though we will likely never access 99% of the files it contains.

The simple fix we're using currently is to leave out the sourcepath, but this makes browsing and debugging less usable than previous versions.
Comment 4 Jerome Lanneluc CLA 2008-08-27 10:06:55 EDT
Thanks I now understand. We usually call the 'sourcepath' attribute 'source attachment'. This is why I was confused.

I will have a look.
Comment 5 Szymon Brandys CLA 2008-09-08 06:27:48 EDT
*** Bug 244504 has been marked as a duplicate of this bug. ***
Comment 6 Jerome Lanneluc CLA 2008-10-01 09:58:00 EDT
Created attachment 113992 [details]
First try of a fix

With this patch, some JDT/Core tests now fail randomly. I suspect that using BACKGROUND_REFRESH may cause more grief. I will need to invest more time in this issue.
Comment 7 Bradley Hawkes CLA 2008-10-09 14:52:23 EDT
If I understand this patch correctly it seems not to address the source of the problem (no pun intended). Does this simply move the source tree traversal to the background?

One of the big issues here is that the traversal is taking place at all. Its not clear why Eclipse should be looking at this source attachment tree at all unless trying to view the source from a class inside a jar. In my case the attached source tree includes a very large amount of code and is actually stored on a network server. The act of traversing this tree uses a lot of bandwidth and puts a lot of unnecessary load on the server. Is there any way to prevent the traversal instead?

(In reply to comment #6)
> Created an attachment (id=113992) [details]
> First try of a fix
> 
> With this patch, some JDT/Core tests now fail randomly. I suspect that using
> BACKGROUND_REFRESH may cause more grief. I will need to invest more time in
> this issue.
> 

Comment 8 Jerome Lanneluc CLA 2009-01-09 08:03:08 EST
Agreed that the traversal should be done more lazily.
Moving to Platform/Resources to investigate another solution (BACKGROUND_REFRESH still causes grief).
Comment 9 John Arthorne CLA 2009-01-12 10:10:23 EST
Martin, this is your "lazy refresh" idea... did you ever enter a bug report for that? See also bug 91432.
Comment 10 Martin Oberhuber CLA 2009-03-04 04:36:22 EST
I've added this to the list of things to investigate for e4 resources.

Today, all Resources in an Eclipse workspace are refreshed eagerly (for timestamp caching, batched change notifications, CM integrations and more).

In order to address the issue here, I'd like to add support for specifying folders as "lazy" resources. The contents below such "lazy" folders would be available for viewing and editing (once a location is known), but would never be part of any change notifications or Team support. Similar requirements exist if we want to bring remote files (accessible as URLs, for instance) into the workspace for easy editing and refrencing.

We'd need to explore what other implications such "lazy folders" may have. IWorkspaceRoot.findFilesForLocation() for instance would likely not work because not all files below the lazy location would be known. Any other "global" workspace operations such as creating an index would need to leave these files out.

We might, however, want to "promote" files that have been loaded into an editor once from "lazy" state into "known" state thus providing full IResource support for these specific files (but not their containers etc). 

In the end, this touches on our understanding of what a "workspace" is. Is it just a collection of items that we are working on (lazy or not), or is the workspace a more managed collection of project that's subject to eager refresh? In the latter case, the lazy resources would always live outside the workspace, but could be referenced from the workspace by means of linking them in.
Comment 11 James Blackburn CLA 2009-03-04 09:06:26 EST
(In reply to comment #10)
> I've added this to the list of things to investigate for e4 resources.
> 
> Today, all Resources in an Eclipse workspace are refreshed eagerly (for
> timestamp caching, batched change notifications, CM integrations and more).
> 
> In order to address the issue here, I'd like to add support for specifying
> folders as "lazy" resources. The contents below such "lazy" folders would be
> available for viewing and editing (once a location is known), but would never
> be part of any change notifications or Team support. Similar requirements exist
> if we want to bring remote files (accessible as URLs, for instance) into the
> workspace for easy editing and refrencing.

You probably also break local history if you do this...

> We might, however, want to "promote" files that have been loaded into an editor
> once from "lazy" state into "known" state thus providing full IResource support
> for these specific files (but not their containers etc). 

That sounds neat!

I filed a few bugs about the issues with refresh.

In CDT we have:
bug 265504 -- CDT is over-eager with refreshLocal(...) it usually calls IProject#refreshLocal() having run an external tool is run.

For our workflows, we do want to keep the IResource features - team, local history, notification, etc. However the refreshLocal(...) API isn't expressive enough.

1) If the resource has a refresh provider which calls-back when there's an external modification (such as windows and clearcase and others in the future) then the expensive refresh when the user calls the API is unneeded.
2) core.resources uses refreshManager to refresh the workspace piece-meal asynchronously. There's no way for consumers to use this i.e. they're blocked until refreshLocal(...) is finished.  The particular IResource (sub-)tree is locked for the entirety of refreshLocal().
3) RefreshManager is currently optimised for deep (Java-like) trees. It refreshes the requested resource iteratively bredth first, 2 layers at a time.


I agree it would be good to have lazy updates / and population of the tree. However it should surely be possible to do this without breaking everything that relies upon delta notification (which is a large number of IResource consumers currently...). Is there not a half-way house of not eagerly refreshing, but when a refresh does occur firing a resource changed event to listeners?  

The alternative is that we may get plugins polling these trees for changes... Or worse integrators using a mix of responding to resource change events and polling parts of the tree (if they can discover which bits aren't kept in sync).  I worry that we're throwing the problem out of platform core.resources to the user and asking them to make a choice when they create a linked resource.  Most users aren't going to understand the full implications of this (e.g. some builders don't work on that sub-tree).  

It also won't help in the case where users are creating projects on slow filesystems like RDT or ClearCase where you really do still want the IResource api to work, but fs latency is high.
Comment 12 Anton Leherbauer CLA 2010-03-11 03:51:08 EST
(In reply to comment #11)
> 2) core.resources uses refreshManager to refresh the workspace piece-meal
> asynchronously. There's no way for consumers to use this i.e. they're blocked
> until refreshLocal(...) is finished.  The particular IResource (sub-)tree is
> locked for the entirety of refreshLocal().

I think adding API to schedule a background refresh on any resource would be a valuable addition.  It does not solve all problems mentioned in this bug, but it would help with the typical use case of external builders which need to get the resource tree into sync again, but don't want to lock the workspace for a long time.
Comment 13 Eclipse Webmaster CLA 2019-09-06 15:37:21 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.