Bug 306575 - Allow project to reference a refresh snapshot for use with standard import
Summary: Allow project to reference a refresh snapshot for use with standard import
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.6   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 3.6 M7   Edit
Assignee: Martin Oberhuber CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 301563
Blocks: 310476 310508 310509 310514
  Show dependency tree
 
Reported: 2010-03-19 15:31 EDT by Martin Oberhuber CLA
Modified: 2010-04-27 05:01 EDT (History)
3 users (show)

See Also:
Szymon.Brandys: review+


Attachments
patch (16.61 KB, patch)
2010-04-22 08:29 EDT, Markus Schorn CLA
mober.at+eclipse: iplog+
Details | Diff
patch v1 rebased to HEAD (14.86 KB, patch)
2010-04-22 09:40 EDT, Martin Oberhuber CLA
no flags Details | Diff
patch v2 (API only) (2.38 KB, patch)
2010-04-23 02:10 EDT, Martin Oberhuber CLA
no flags Details | Diff
patch v2 (no API) (12.25 KB, patch)
2010-04-23 02:18 EDT, Martin Oberhuber CLA
no flags Details | Diff
patch v3 (with API and tests) (30.22 KB, patch)
2010-04-23 13:59 EDT, Martin Oberhuber CLA
no flags Details | Diff
patch v4 (no API) (21.32 KB, patch)
2010-04-25 19:08 EDT, Martin Oberhuber CLA
no flags Details | Diff
patch v4 (Unittests) (5.91 KB, patch)
2010-04-25 19:13 EDT, Martin Oberhuber CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2010-03-19 15:31:01 EDT
+++ This bug was initially created as a clone of Bug #301563 +++

Follow-up from bug 301563 comment 37: 

As a provider of a large source code base, I want to provide this source code along with a pre-computed refresh snapshot, such that people importing my source base can get access to the project quickly - regardless of what wizards, tools or plugins they use for importing these projects today.

For the case where the provider of code also owns the Eclipse .project file, it is easiest to have the refresh snapshot referenced in the project. This is the workflow which is successfully being used by the CDT team-shared index: snapshot location defined as a project-scope preference, with allowing core variables replacement to parameterize the snapshot location when needed.

Being able to combine a CDT team-shared index with a pre-computed refresh snapshot using the same location semantics would result in a very powerful project database that includes both files and symbol index and can be opened and browsed very quickly (today, the value of the CDT team-shared index is compromised on slow file systems by the need for refreshLocal).

Moreover, this method of deploying a refresh snapshot along with the project very much simplifies adopting the refresh snapshot with end users, since the existing, known and working import-project operations can be used. Conceptually, by importing this special kind of project, one is importing a project without the need for refreshLocal.
Comment 1 Martin Oberhuber CLA 2010-04-22 06:42:51 EDT
Changed summary, previous value was:
Support a project-scope preference for refresh snapshot to use with standard import

In order to match the original request without forcing a particular implementation. Initial experiments show that a project-scope Preference is problematic for this request, since we need to access the setting before Project Preferences are initialized. Thus getting to the Preference on import would be messy.
Comment 2 Markus Schorn CLA 2010-04-22 08:29:34 EDT
Created attachment 165748 [details]
patch

The patch adds the flag 'SET_AUTOLOAD_SNAPSHOT' that can be used when creating a snapshot. It causes the location of the snapshot to be saved in the project description. 

When importing a project whose project description points to a snapshot, it will be loaded automatically when the project is created.
Comment 3 Martin Oberhuber CLA 2010-04-22 09:40:42 EDT
Created attachment 165761 [details]
patch v1 rebased to HEAD

I like the approach of saving the snapshot location URI in the project description rather than project prefs, since the project description is available to any client before opening the project (so they can potentially see that a snapshot autoload is requested, and act accordingly).

I also like the minimal new API that's involved (in fact it looks like we might be able to do this without API at all for a start). I also like that the auto-loading happens on IProject#create() such that a later #loadSnapshot() can override the snapshot that was auto-loaded.

Original patch was against M6, I have rebased the patch to HEAD with minor modifications:
- Updated copyright comments
- Removed unnecessary IProjectDescription API additions
- Renamed SET_AUTOLOAD_SNAPSHOT constant to SNAPSHOT_SET_AUTOLOAD

The SNAPSHOT_SET_AUTOLOAD constant is now the only API addition, and we could do without this API for 3.6 if we must (by clients like CDT just "knowing" that the magic value to set is 2).

The only problem I see is that #saveSnapshot requires an absolute URI, so we'll always write an absolute URI into the project description. This will not work for clients which import the project on a different file system. We should 
 - at least support URI's that are relative to the location of the project
 - perhaps also support Eclipse Variables in URI's

Other than that, the original User Story requirements are all met, so I like it!
Comment 4 Szymon Brandys CLA 2010-04-22 10:52:30 EDT
I think we should rather keep the location of a project in the snapshot file. Storing the location of a snapshot in .project files looks wrong to me.
Comment 5 Martin Oberhuber CLA 2010-04-22 11:44:34 EDT
(In reply to comment #4)
Well... clearly the approach of storing location of a snapshot in the .project file won't address all use-cases, but I do think it exactly addresses the use-case outlined in comment 0.

This is the case where the "owner" of the project wants to equip that project with a snapshot such that his clients can import it faster. To me, this absolutely makes sense. It is a way of making the project richer and more useful by adding meta-information. And, the owner who has permission to change the .project will also know under what circumstances the timestamps in the snapshot need updating -- consider a read-only project that never changes for instance.

People who don't like putting a snapshot into the .project can just not do it.
Comment 6 Szymon Brandys CLA 2010-04-22 12:35:59 EDT
It looks like I'm confused by bug 303751, comment 18 where another improvement was suggested. Martin, you suggested to store the project location in the snapshot. Am I right? I think this also makes sense.
Comment 7 Martin Oberhuber CLA 2010-04-23 02:10:20 EDT
Created attachment 165877 [details]
patch v2 (API only)

Yes I think both the approach from bug 303751 (project associated with snapshot) and this approach (snapshot associated with project) are needed to address different usage scenarios. For this workflow, the usage scenario is clearly outlined in comment 0.

I reviewed the patch again, and I think that rather than setting the snapshot location as a side-effect of saving the snapshot, we should have clients set the snapshot explicitly in the project description. 

This simplifies the code, and makes it a little more flexible. For instance, a nightly cronjob could open a workspace where snapshot locations are set, and just save snapshots to the known locations (in order to update the snapshots). Such an update job needs to read the snapshot locations, but never set them.

Attached patch only contains the proposed API. I believe that this API is simple and safe because it's only data manipulation (set and read the corresponding slot in the .project file) without any more hidden functionality or promise. The new slot int the project description is fully backward compatible since the project description reader always ignored any slots it didn't know.

I'll attach the implementation next.
Comment 8 Martin Oberhuber CLA 2010-04-23 02:18:14 EDT
Created attachment 165878 [details]
patch v2 (no API)

Attached patch is the updated implementation with following changes:

- renamed snapshotAutoloadLocation etc into snapshotLocation for readability
- got rid of the "save" side-effect in saveSnapshot()
- got rid of the related API constant which is no longer needed
- added uri.resolve() in order to support URI's relative to the project

If we consider the implementation of internal methods
   ProjectDescription#getSnapshotLocationURI()
   ProjectDescription#setSnapshotLocationURI(URI)
as provisional API which the CDT export team index wizard could use, then I believe that this patch is extremely valuable to CDT even without adding API.

The one thing I'm not yet exactly sure about is whether we shouldn't allow Eclipse variables (such as ${env_var:FOO}) in the snapshot location, to make it more flexible. The CDT team-shared index does use such variables, so it would be necessary if all CDT workflows should be supported. I didn't check whether variables could be supported inside the URI's we pass around now, or if would need to change the ProjectDescription methods to set/get Strings rather than URI's.
Comment 9 Martin Oberhuber CLA 2010-04-23 02:26:04 EDT
Another thought is whether the snapshot auto-loading code shouldn't be moved from the Project#create() method into Project#open().

This would ensure clean semantics when a project is renamed, moved or otherwise "treated" after creating but before opening. With the new API approach, clients who want to explicitly disable the snapshot auto-loading could always 
   IProjectDescription#setSnapshotLocationURI(null)
on the internal project description before creating / opening the project.

Markus, what do you think?
Comment 10 Markus Schorn CLA 2010-04-23 04:31:22 EDT
(In reply to comment #9)
> Another thought is whether the snapshot auto-loading code shouldn't be moved
> from the Project#create() method into Project#open().
> 
> This would ensure clean semantics when a project is renamed, moved or otherwise
> "treated" after creating but before opening. With the new API approach, clients
> who want to explicitly disable the snapshot auto-loading could always 
>    IProjectDescription#setSnapshotLocationURI(null)
> on the internal project description before creating / opening the project.
> 
> Markus, what do you think?

After explicitly loading a snapshot it should not be necessary to use an unobvious trick to suppress the auto-loading. Rather than that, in this case the auto-loading should not be done.
Comment 11 Szymon Brandys CLA 2010-04-23 07:55:30 EDT
The reason why I don't like storing snapshotLocation in .project is that we use it only once per project's life. It's something temporary and perhaps should not be attached to a project forever. However the same issue is with the name in .project.

I think that Martin's approach (without the flag) is better. It's just setting the default snapshot location in .project that can be overridden with IProject#loadSnapshot. And one could change it anytime, not only while saving the snapshot.

There is a problem we can hit, when will start storing the project location in the snapshot. If .project points out a snapshot and the location in the snapshot doesn't match, we need to handle it somehow. 

To sum up. If CTD is going to consume it Helios, I would introduce this enhancement without API changes. We want to address both cases described in comment 7 and the suggested API may be not enough.
Comment 12 Martin Oberhuber CLA 2010-04-23 08:15:53 EDT
(In reply to comment #11)
> The reason why I don't like storing snapshotLocation in .project is that we
> use it only once per project's life.

True, the snapshotLocation is only used once when importing the project; on the other hand, the content in the project description is made exactly for this case (ie describe everything I need to know about a project before it is open). 

You are right that this is very similar to the name in .project - it's only relevant as a suggestion or hint when importing the project.

> There is a problem we can hit, when will start storing the project location in
> the snapshot. If .project points out a snapshot and the location in the
> snapshot doesn't match, we need to handle it somehow. 

I don't think that this is a problem. The project location that's stored in the snapshot would always just be a hint (ie where was the snapshot exported from). I could always be free and use that same snapshot for importing the project at any other location. 

In other words, the importer would always ignore the location that's stored in the snapshot. In some sense, bug 303751 and this one are just different ways of initially associating a project with a snapshot. Once the association is made, any hints for making the association are not relevant any more.

> To sum up. If CDT is going to consume it Helios, I would introduce this
> enhancement without API changes. We want to address both cases described in
> comment 7 and the suggested API may be not enough.

Thanks. Can I deduce from this statement that you're ok with us writing something into the .project file, as long as we don't use API to do so?

When exploring the use of Eclipse Variables for the snapshot location (like CDT does), I have come across the issue that URI's in fact don't work with the '{' character. So we cannot use URI in the get/setSnapshotLocation() API but we have to use String. This, again, may require some quoting for the .project file in order to ensure we write valid XML. I'm going to look at this next.
Comment 13 Szymon Brandys CLA 2010-04-23 09:31:51 EDT
(In reply to comment #12)
> Thanks. Can I deduce from this statement that you're ok with us writing
> something into the .project file, as long as we don't use API to do so?

Yes. As long as this something is snapshotLocation ;-)

> When exploring the use of Eclipse Variables for the snapshot location (like CDT
> does), I have come across the issue that URI's in fact don't work with the '{'
> character. So we cannot use URI in the get/setSnapshotLocation() 

Maybe this is similar. We use path variables in URIs for linked resources.
Adding Serge to comment.
Comment 14 Serge Beauchamp CLA 2010-04-23 09:49:02 EDT
(In reply to comment #13)
> (In reply to comment #12)
> > Thanks. Can I deduce from this statement that you're ok with us writing
> > something into the .project file, as long as we don't use API to do so?
> 
> Yes. As long as this something is snapshotLocation ;-)
> 
> > When exploring the use of Eclipse Variables for the snapshot location (like CDT
> > does), I have come across the issue that URI's in fact don't work with the '{'
> > character. So we cannot use URI in the get/setSnapshotLocation() 
> 
> Maybe this is similar. We use path variables in URIs for linked resources.
> Adding Serge to comment.

Reading the comment threads, I agree that using a URI for the location and allowing path variables in the URI location could simplify things.

1) The URI could be a simple as "PROJECT_LOC/.snapshot", or as complex as "ENVIRONMENT-MY_VAR/.snapshot", and also extensible through the path variable extension mechanism to anyone.

2) We already have UI for creating URI with path variable.

My general impression, was that the snapshot could be simply a ".project_tree" file besides the .project file, since there can't exist more than one at any point of time, no?  And the user's option would simply be to enable that project-snapshot generation or not

But I guess I don't have all the use-cases in mind.
Comment 15 Martin Oberhuber CLA 2010-04-23 10:44:07 EDT
(In reply to comment #14)
> 2) We already have UI for creating URI with path variable.

Thanks Serge, this is interesting. Can you elaborate?

> My general impression, was that the snapshot could be simply a ".project_tree"
> file besides the .project file, since there can't exist more than one at any

Well yes, having a fixed well-known location for the snapshot is the simplest use-case. In our case, the default location could be

   .settings/resource-index.zip

for instance - and in fact the CDT team-shared index started as simple as this. The CDT experience showed, though, that many people wanted to have their team-shared index stored in a well-known location outside the project sources, such that their nightly cronjobs for updating the indexes could do so in one central location without interfering with users' workspaces.

So, the concept of using variables was introduced, e.g. 

   ${env_var:SNAPSHOTS_ROOT}/${env_var:BASELINE}/${project_name}-tree.zip

Does that make sense to you?
Comment 16 Serge Beauchamp CLA 2010-04-23 10:59:49 EDT
(In reply to comment #15)
> (In reply to comment #14)
> > 2) We already have UI for creating URI with path variable.
> 
> Thanks Serge, this is interesting. Can you elaborate?
> 

Well, basically, the CreateLinkedResourceGroup and PathVariableDialog in o.e.ui.ide are essentially editing a URI string that can include path variables.

> > My general impression, was that the snapshot could be simply a ".project_tree"
> > file besides the .project file, since there can't exist more than one at any
> 
> Well yes, having a fixed well-known location for the snapshot is the simplest
> use-case. In our case, the default location could be
> 
>    .settings/resource-index.zip
> 
> for instance - and in fact the CDT team-shared index started as simple as this.
> The CDT experience showed, though, that many people wanted to have their
> team-shared index stored in a well-known location outside the project sources,
> such that their nightly cronjobs for updating the indexes could do so in one
> central location without interfering with users' workspaces.
> 
> So, the concept of using variables was introduced, e.g. 
> 
>    ${env_var:SNAPSHOTS_ROOT}/${env_var:BASELINE}/${project_name}-tree.zip
> 
> Does that make sense to you?

Sure it does.  

By using the existing path variable mechanism, you can have something equivalent by declaring a path variable that contains the macros, and then referring to that path variable in the URI.

The path variable value itself can contain multiple macros, such as FOOBAR=${FOO}/${BAR}.zip, but the URI must refer to a single path variable only ("FOOBAR" for example).

Note that there's no built-in path variable for 'env_var' at the moment (because of jdk 1.4 issues), nor 'project_name', but it can be added easily through an extension point.

The problem with using core.variables is that the core.variables values aren't context sensitive, so the values can't vary according to what project the string is being resolved - as opposed to CDT variables.

That's one of the reason why path variable in core.resources don't use core.variables.  Eventually, though, it would be good to improve the core.variables design so it can support all requirements from the core.resources and the CDT use cases.
Comment 17 Martin Oberhuber CLA 2010-04-23 13:59:43 EDT
Created attachment 165943 [details]
patch v3 (with API and tests)

Ok, here's another snapshot:

- added unittests for the snapshot autoload functionality
- moved autoload from Project#create() into Project#open()
- get/set snapshot location as String to support core.variables
- added error reporting (warnings) for invalid snapshot location on open

The combination of URI and variables works, but it feels a bit messy. For instance, if a variable expands to a Windows path (like java.io.tmpdir, or eclipse.home), URI won't swallow it. CDT uses core.variables with IPath to avoid this; and I'd like to be compatible with CDT. But on the other hand I'd like the snapshot location to be fully EFS-enabled so I want URI's. Is this a problem we have solved elsewhere?

Is the additional dependency of core.resources on core.variables even acceptable? Serge, how can I test the Path Variables UI in the SDK?

I'm attaching my complete patch including the IProjectDescription API; getting rid of the API and replacing by (ProjectDescrition) casts is easy once we are happy.
Comment 18 Serge Beauchamp CLA 2010-04-23 14:46:14 EDT
(In reply to comment #17)
> Created an attachment (id=165943) [details]
> The combination of URI and variables works, but it feels a bit messy. For
> instance, if a variable expands to a Windows path (like java.io.tmpdir, or
> eclipse.home), URI won't swallow it. CDT uses core.variables with IPath to
> avoid this; and I'd like to be compatible with CDT. But on the other hand I'd
> like the snapshot location to be fully EFS-enabled so I want URI's. Is this a
> problem we have solved elsewhere?

Wrapping the string in a URI is part of the function of the path variable provider extension point.

You can wrap regular strings in a URI by doing something like:

URI uri;
try {
   uri = new URI(string);
} catch (URISyntaxException e) {
   uri = URIUtil.toURI(Path.fromOSString(string));
}

That's roughly what the UI is doing, for instance.

Then you can have something like "foo" or "c:\bar" in "string", and it will be converted back and forth from URI to path properly.

> 
> Is the additional dependency of core.resources on core.variables even
> acceptable? Serge, how can I test the Path Variables UI in the SDK?
> 

Well, you can add new path variable provider through an extension point, so there's no need to have the dependency on core.variables.

If someone would like to extend the path variable available, such as glu'ing in the core.variables, they have to implement an extension point somewhere.
Comment 19 Martin Oberhuber CLA 2010-04-25 02:20:45 EDT
Thanks for the hints, Serge. I'm leaning towards using path variables, because I don't want to introduce the core.variables dependency now, and I like that path variables are specialized for dealing with file system paths.

What I don't like about path variables is that a client cannot tell the difference between a relative URI and an URI with an unresolved path variable.

In the context of this bug, in most cases the resource snapshot will reside inside the project (relative to the project location), so forcing the client to deal with the PROJECT_LOC pseudo-variable isn't very compelling. 

Or is there some client-accessible code that helps doing this?
Comment 20 Martin Oberhuber CLA 2010-04-25 19:08:35 EDT
Created attachment 166034 [details]
patch v4 (no API)

Here is my final patch, which I intend to commit. After moving to Path Variables, I was able to move back to a version that's much more similar to the original contribution again:

- Supports Path Variables in URI's, forces client to use absolute URI's
- Back to URI only in ProjectDescription - fail fast in case of invalid URI
- Back to a SNAPSHOT_SET_AUTOLOAD constant for setting the value via 
  saveSnapshot() -- rationale is that when writing Unittests and the CDT 
  implementation I noticed this is much easier to handle for clients, API
  shouldn't force clients use boilerplate code
- Further improved error reporting
- Removed all API, marked internal provisional API with EXPERIMENTAL javadoc

I'm going to attach tests separately this time. All tests pass, and I'm able to equip CDT team-shared index with a resource-index.
Comment 21 Martin Oberhuber CLA 2010-04-25 19:13:55 EDT
Created attachment 166035 [details]
patch v4 (Unittests)

Here are the new Unittests for the feature.

One problem that was discovered by the unittests was that 
   IProject.getPathVariableManager() 
only works on project-scope when the project is open. That's not surprising, but it made it a bit tricky to (a) have loadSnapshot() work with path variables, because loadSnapshot must be called before the project is open; and (b) to support project-relative path variables in snapshot autoloading (when project is just being opened).
Comment 22 Martin Oberhuber CLA 2010-04-25 19:19:38 EDT
Comment on attachment 165748 [details]
patch

Committed patch v4 to HEAD.

I'm not sure if I'm allowed to release this to the mapfile.
Comment 23 Martin Oberhuber CLA 2010-04-26 10:00:44 EDT
Szymon can you review v4 for inclusion in M7 ?
Comment 24 Szymon Brandys CLA 2010-04-26 11:19:29 EDT
This change looks good. 

Please raise bugs for follow-up work. I'm aware of one issue to address, i.e. SNAPSHOT_FORCE_LOAD that "is needed to work around load() only allowed while project is closed, but path variable resolving only while open". I think we should change the implementation to not use the options of #loadSnapshot.
Comment 25 Martin Oberhuber CLA 2010-04-26 13:33:30 EDT
Bugs created and fixed, I think this is done.
Comment 26 Szymon Brandys CLA 2010-04-27 05:01:01 EDT
Yes, thanks Martin. I will release changes for the next IBuild.