Bug 274025 - [publisher] p2 metadata generator does not scale
Summary: [publisher] p2 metadata generator does not scale
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.5   Edit
Hardware: PC Linux
: P3 normal with 1 vote (vote)
Target Milestone: 3.6 M6   Edit
Assignee: Ian Bull CLA
QA Contact:
URL:
Whiteboard:
Keywords: api, performance
Depends on:
Blocks: 273981 276306
  Show dependency tree
 
Reported: 2009-04-28 08:48 EDT by Denis Roy CLA
Modified: 2010-02-08 13:59 EST (History)
11 users (show)

See Also:


Attachments
Patch to fix metadata generator in Ganymede R34x (5.76 KB, patch)
2009-05-15 01:11 EDT, Sean Flanigan CLA
no flags Details | Diff
Patch to fix metadata generator in Galileo (based on 3.5M7) (6.54 KB, patch)
2009-05-15 01:22 EDT, Sean Flanigan CLA
no flags Details | Diff
Error trying apply second patch (31.55 KB, image/png)
2009-06-01 14:22 EDT, Gabe O'Brien CLA
no flags Details
Patch to fix metadata generator in Galileo (based on HEAD) (6.54 KB, patch)
2009-06-01 23:54 EDT, Sean Flanigan CLA
no flags Details | Diff
Team project set which will check out the plugins affected by the patch (489 bytes, application/xml)
2009-06-01 23:56 EDT, Sean Flanigan CLA
no flags Details
org.eclipse.equinox.p2.artifactrepository.patch (11.55 KB, patch)
2009-10-20 23:31 EDT, Ian Bull CLA
no flags Details | Diff
mylyn/context/zip (61.25 KB, application/octet-stream)
2009-10-20 23:31 EDT, Ian Bull CLA
no flags Details
org.eclipse.equinox.p2.artifact.repository.patch (21.27 KB, patch)
2009-10-29 17:43 EDT, Ian Bull CLA
no flags Details | Diff
mylyn/context/zip (84.93 KB, application/octet-stream)
2009-10-29 17:43 EDT, Ian Bull CLA
no flags Details
updated patch (18.92 KB, patch)
2010-02-02 23:02 EST, Ian Bull CLA
no flags Details | Diff
updated patch (38.50 KB, patch)
2010-02-04 18:48 EST, Ian Bull CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Denis Roy CLA 2009-04-28 08:48:05 EDT
I am unable to generate the p2 metadata when the plugins/ and features/ directories are present.  The process consumes 100% CPU time, artifacts.jar grows and returns 10 zero bytes and it simply never ends.

If I remove the plugins/ and features/ directories, it takes less than one second to do, but I'm not sure this is correct.

I have tried multiple Java VMs on Linux x86_64 and Linux ppc, using Eclipse R-3.4.2 and S-3.5M6 with the same results.

The actual command I'm using is:
eclipse/eclipse -application org.eclipse.equinox.p2.metadata.generator.EclipseGenerator -updateSite /home/babel-working/live/output/galileo/ -site file:/home/babel-working/live/output/galileo/site.xml -metadataRepositoryName "Babel language packs update site" -append -reusePack200Files -vmargs -Xmx256m
Comment 1 Denis Roy CLA 2009-04-28 08:50:41 EDT
It may be worthy to add that Babel's site.xml for Galileo is 178K, and there are 460 files in features/ and 26223 in plugins/
Comment 2 Antoine Toulmé CLA 2009-04-28 09:31:52 EDT
The plugins and features directories are needed.

The generator reads all the plugin manifests and the feature definitions, and push all the info into artifacts.jar and contents.jar.

I believe you should assign this bug to Pascal Rapicault, or at least ask him what next steps we should take regarding this problem.
Comment 3 Kit Lo CLA 2009-04-28 09:35:48 EDT
Pascal, could you please take a look and advise? Thanks!
Comment 4 Denis Roy CLA 2009-04-28 09:47:35 EDT
> artifacts.jar grows and returns 10 zero bytes and it simply never ends.

That should be: 

artifacts.jar grows and returns to zero bytes


cc'ing Gabe, as he was also investigating this.
Comment 5 Sean Flanigan CLA 2009-04-28 23:02:02 EDT
It looks like a scalability problem in the P2 metadata generator.  I've tried running the metadata generator as an Eclipse application, in debug mode, and discovered that it is saving the entire artifact repository once for each artifact, all 26000 of them.  It thus hits this breakpoint over and over:

Thread [main] (Suspended (breakpoint at line 877 in SimpleArtifactRepository))	
	SimpleArtifactRepository.save() line: 877	
	SimpleArtifactRepository.addDescriptor(IArtifactDescriptor) line: 309	
	Generator.publishArtifact(IArtifactDescriptor, File[], IArtifactRepository, boolean, File) line: 1344	
	Generator.publishArtifact(IArtifactDescriptor, File[], IArtifactRepository, boolean) line: 1333	
	Generator.generateBundleIUs(BundleDescription[], Generator$GeneratorResult, IArtifactRepository) line: 408	
	Generator.generate() line: 345	
	EclipseGeneratorApplication.generate(EclipseInstallGeneratorInfoProvider) line: 380	
	EclipseGeneratorApplication.run(EclipseInstallGeneratorInfoProvider) line: 364	
	EclipseGeneratorApplication.run(String[]) line: 332	
	EclipseGeneratorApplication.start(IApplicationContext) line: 386	
	EclipseAppHandle.run(Object) line: 194	
	EclipseAppLauncher.runApplication(Object) line: 110	
	EclipseAppLauncher.start(Object) line: 79	
	EclipseStarter.run(Object) line: 368	
	EclipseStarter.run(String[], Runnable) line: 179	
	NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not available [native method]	
	NativeMethodAccessorImpl.invoke(Object, Object[]) line: 57	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43	
	Method.invoke(Object, Object...) line: 616	
	Main.invokeFramework(String[], URL[]) line: 559	
	Main.basicRun(String[]) line: 514	
	Main.run(String[]) line: 1287	
	Main.main(String[]) line: 1263	


Each time I hit that breakpoint, there is one more artifact in the artifacts.xml file.  The for-loop which iterates through all the bundles is several stack levels higher, in Generator.generateBundleIUs().

Perhaps SimpleArtifactRepository.addDescriptor() should not perform an implicit save(), and Generator.generateBundleIUs() could tell the SimpleArtifactRepository to save() outside the for-loop.  But I'm not sure if IArtifactRepository exposes the save() method, so there might be more to it than that.  And of course it might not be safe to change the existing API.

In any case, I suggest this bug be assigned to the Equinox team.
Comment 6 Denis Roy CLA 2009-04-28 23:12:25 EDT
Reassigning to p2.
Comment 7 Denis Roy CLA 2009-04-29 11:27:16 EDT
Downgrading severity, as it was relative to Babel.

Sean: thanks for your analysis!

p2 team: comment 0 and comment 5 contain all the meat and gravy here.
Comment 8 Pascal Rapicault CLA 2009-04-29 22:25:46 EDT
Changing this is non trivial and will not happen in this release. Note also that this code has not changed in months so there may be another reason to your problems.
We also need to see if we have the same issue in the publisher (which I believe we do).
Comment 9 Gabe O'Brien CLA 2009-04-30 12:15:49 EDT
I spent a little bit of time trying to generate the meta-data on my local box.  When the plugins and features folders were full the process didn't complete.  So I tried reducing the size of the features folder to only one project (keeping all the different translation for it) and the process didn't complete.  Then I tried reducing the number of file in the plugins folder and the process did complete (it took about 30 seconds with 3,000 plugins files out of 30,000 or so).  This just confirms that there is an issue with the process of generating meta-data scaling, but we already knew this.

One observations I had was that we are probably doing allot of redundant work having one copy of each project in the features folder for each languages Babel defines.  It seems each of those files should be mostly the same?  I am not sure how to avoid the extra work, but it seemed worth noting.

Comment 10 Sean Flanigan CLA 2009-05-14 03:43:30 EDT
I have had a go at fixing this, although it might be better classed as a workaround.  It seems to work: 40 seconds to generate the Babel metadata from site.xml, rather than forever.

As an Eclipse plugin novice, I started by importing the relevant p2 plugins from the Plug-ins view "as source", inside my own Eclipse SDK instance.  It might take me a little while to work out how to make a clean patch from my modified source.  I'll tackle it tomorrow, but let me know if I'm missing some easy shortcut like "PDE/Generate patch against current platform".  That would be nice.
Comment 11 John Arthorne CLA 2009-05-14 09:56:57 EDT
>let me know if I'm missing some
>easy shortcut like "PDE/Generate patch against current platform".

There is. Right click on the project, Export > Deployable Plugins and Fragments, and then select "Install into host". This will create a patch and install it into the currently running platform.
Comment 12 Andrew Niefer CLA 2009-05-14 11:07:19 EDT
I think Sean meant a cvs patch, not a feature patch to install his changes :)

You will the real source from cvs:
dev.eclipse.org:/cvsroot/rt/org.eclipse.equinox/p2/bundles/*

You can use "Check Out As..." to check it out into a different project name.
I tried selecting both the imported project and the cvs project and comparing them, but it just showed whole files as changed because of whitespace.  Also, I couldn't see any way to make a patch that way.
Comment 14 Sean Flanigan CLA 2009-05-15 01:11:14 EDT
Created attachment 135911 [details]
Patch to fix metadata generator in Ganymede R34x

Thanks for the patch generation tips; fortunately I didn't get formatting changes in the .java files, although I had to ignore a lot of other changes.  This is the R34x version.
Comment 15 Sean Flanigan CLA 2009-05-15 01:22:23 EDT
Created attachment 135912 [details]
Patch to fix metadata generator in Galileo (based on 3.5M7)

I tried using CVS HEAD, but the plugins I was changing depended on an update to another plugin I couldn't locate in CVS.  So this is generated against 3.5M7; hopefully that will be pretty close.

I hope we can get this in (or something like it) before too long.  Babel langpacks don't work at all for me in 3.5, unless they come with P2 metadata.  Either this bug, or bug 276306, probably deserves a higher severity than normal.
Comment 16 Ian Bull CLA 2009-05-15 11:04:26 EDT
The approach looks promising. I have just looked at the patch (I didn't apply it, although I'm pretty sure it will apply) and it sets a property on the repository (IArtifactRepository.DISABLE_IMPLICIT_SAVE), and if this is set, the artifacts are not written out incrementally, but rather at the end.

However, this patch is written against the generator, I wonder if we should also see how this affects the publisher.  



Comment 17 Andrew Niefer CLA 2009-05-15 11:32:26 EDT
My First comment is that the patch is an API change, and we are too late in 3.5 for that.

Note that the explicitSave is not strictly necessary, SimpleArtifactRepository#setProperty() will do a save().  So this could instead work as follows:

1) setProperty(disableSave, true)
1b) implicit save() does not occur
2 Do stuff, save()'s don't happen
3) setProperty(disableSave, false)
3b) implicit save() does occur

Here, anybody setting the property should really do it in a try with a finally block that removes the property.

John, what do you think of this?
Comment 18 Ian Bull CLA 2009-05-15 11:41:11 EDT
Andrew, if we can do this without adding a new method, then +1. However, is this really an API change?  The docs say that extenders should extend the abstract repository (not the interface), so extenders would not have to change their code unless they wanted to handle the explicit save. (although maybe it turns out that most repo providers will have to implement this in practice).  
Comment 19 Ian Bull CLA 2009-05-15 11:44:43 EDT
I'm just asking out of curiosity. I think we should avoiding adding new methods to our interfaces if we can. (which appears to be the case as you suggest).
Comment 20 John Arthorne CLA 2009-05-15 12:01:53 EDT
> However, is this really an API change?

It is an API change, but not a breaking API change. We'd like to avoid or minimize even non-breaking API changes at this point in the release. Andrew's property-based suggestion sound promising though. The only new API in this case would be a new property constant.

Comment 21 Andrew Niefer CLA 2009-05-15 13:16:54 EDT
I missed the javadoc that says:
 This interface is not intended to be implemented by clients.  Artifact repository
 implementations must subclass {@link AbstractArtifactRepository} rather than 
 implementing this interface directly.

This would have been a bigger problem for people not extending AbstractArtifactRepository.
Comment 22 Jeff McAffer CLA 2009-05-15 15:29:30 EDT
API or not API aside, at this point minimizing change like this is "a good thing" if possible.  There needs to be good justification espeically in the face of alternatives.
Comment 23 Sean Flanigan CLA 2009-05-17 23:04:50 EDT
(In reply to comment #22)
> API or not API aside, at this point minimizing change like this is "a good
> thing" if possible.  There needs to be good justification espeically in the
> face of alternatives.

Is there an alternative way of generating p2 metadata which could be used by Babel?  Or are you saying that the metadata generator (and the publisher, I suppose) could use the existing API in a more scalable way?  

About the only possibility that springs to mind is using the addDescriptor() which takes an array and saves to disk after processing the array.  I suppose that might be workable, as long as the array doesn't get too big.  Or is there a completely different API which could be used?
Comment 24 Gabe O'Brien CLA 2009-05-20 13:49:05 EDT
I have been able to generate the metadata for the Babel project. I was reading the help docs on generating P2  metadata [1] and altered the command Denis was using in comment #1 to this:

eclipse/eclipse   -nosplash   -application org.eclipse.equinox.p2.metadata.generator.EclipseGenerator   -source /shared/technology/babel/test-updates/galileo/   -metadataRepository file:/shared/technology/babel/test-updates/galileo/repository/   -metadataRepositoryName "My Update Site"   -artifactRepository file:/shared/technology/babel/test-updates/galileo/repository/   -artifactRepositoryName "My Artifacts"   -publishArtifacts   -publishArtifactRepository   -compress   -noDefaultIUs   -vmargs -Xmx256m

While it took a long time to complete (6 1/2 hours) I was confident it would finish.  The reason I felt it would complete is I could see it coping all the plugins (30,951 of them) from the 'source' directory into the 'repository' directory. Once the process finished I had 2 new files artifacts.jar (1.3 megs) and content.jar (4.5 megs).  

I have given the files generated to Kit to take a look at.  If there is interest in looking at them I can attach them to this bug.

[1] http://help.eclipse.org/stable/index.jsp?topic=/org.eclipse.platform.doc.isv/guide/p2_metadata_generator.html
Comment 25 Denis Roy CLA 2009-05-27 13:09:41 EDT
> While it took a long time to complete (6 1/2 hours)

I tried your command on the last Babel weekly build (for Galileo) and after 14 hours of intense crunching, it was about 70% complete, judging by the number of files in the repository/plugins directory.

I ended up stopping it because this just generates tons of load on the build machine.
Comment 26 Pascal Rapicault CLA 2009-05-27 16:02:04 EDT
How many plug-ins and features do you have? I have a hard time understanding how this can take more time than doing large builds.
Comment 27 Denis Roy CLA 2009-05-27 16:08:40 EDT
942 features, 41292 plugins.
Comment 28 Gabe O'Brien CLA 2009-05-27 16:50:39 EDT
(In reply to comment #24)
> 
> I have given the files generated to Kit to take a look at.  If there is
> interest in looking at them I can attach them to this bug.

I tried to use the metadata that I was able to generate locally (using the method I described in comment #24).  After waiting for Eclipse to process the local repository for a few minutes, it came back saying there were no categories defined.  The 'site.xml' file has categories (and 'category-def') defined. 

So even though I was able to generate the metadata, it doesn't look like it working.  Any ideas?
Comment 29 Gabe O'Brien CLA 2009-06-01 14:22:26 EDT
Created attachment 137885 [details]
Error trying apply second patch

(In reply to comment #15)
> Created an attachment (id=135912) [details]
> Patch to fix metadata generator in Galileo (based on 3.5M7)
> 
> I tried using CVS HEAD, but the plugins I was changing depended on an update to
> another plugin I couldn't locate in CVS.  So this is generated against 3.5M7;
> hopefully that will be pretty close.
> 
> I hope we can get this in (or something like it) before too long.  Babel
> langpacks don't work at all for me in 3.5, unless they come with P2 metadata. 
> Either this bug, or bug 276306, probably deserves a higher severity than
> normal.
> 

I took a stab at applying the second patch today.  I checked out the P2 bundles (dev.eclipse.org:/cvsroot/rt/org.eclipse.equinox/p2/bundles/) into my workspace.  When I tried to apply the second patch to the code, I got a few errors about files missing (see attached screen shot).  So I was not able to apply the patch and test the generation of the metadata.
Comment 30 Sean Flanigan CLA 2009-06-01 23:54:58 EDT
Created attachment 137943 [details]
Patch to fix metadata generator in Galileo (based on HEAD)

I've just used a fresh workspace to check out org.eclipse.equinox.p2.artifact.repository and org.eclipse.equinox.p2.metadata.generator from CVS HEAD and applied the M7 patch.  (I'm running under RC3 now, so HEAD compiles okay.)

First time around, it didn't seem to recognise it as a workspace-rooted patch the first time (which meant it couldn't find any of the files), but it's not doing that now (?!).  (As a workaround, I applied the patch twice: once for each project, ignoring 1 leading path segment.)  Anyway, I've now re-generated the patch against HEAD, and Eclipse isn't having any trouble applying it, reversing it, etc.

According to the error message you got, the relevant projects weren't in your workspace.  Could they have been given different project names?

I'll add a team project set.
Comment 31 Sean Flanigan CLA 2009-06-01 23:56:37 EDT
Created attachment 137945 [details]
Team project set which will check out the plugins affected by the patch
Comment 32 Sean Flanigan CLA 2009-06-02 03:38:59 EDT
(In reply to comment #28)
> I tried to use the metadata that I was able to generate locally (using the
> method I described in comment #24).  After waiting for Eclipse to process the
> local repository for a few minutes, it came back saying there were no
> categories defined.  The 'site.xml' file has categories (and 'category-def')
> defined. 
> 
> So even though I was able to generate the metadata, it doesn't look like it
> working.  Any ideas?

Going on http://wiki.eclipse.org/Equinox_p2_Metadata_Generator#Arguments_describing_the_input , I'm not sure you will get categories unless you use the "-site" option.  Understanding how the command line options interact is a bit of a black art.  It is to me, anyway.
Comment 33 Gabe O'Brien CLA 2009-06-02 11:29:51 EDT
(In reply to comment #32)
> (In reply to comment #28)
> > I tried to use the metadata that I was able to generate locally (using the
> > method I described in comment #24).  After waiting for Eclipse to process the
> > local repository for a few minutes, it came back saying there were no
> > categories defined.  The 'site.xml' file has categories (and 'category-def')
> > defined. 
> > 
> > So even though I was able to generate the metadata, it doesn't look like it
> > working.  Any ideas?
> 
> Going on
> http://wiki.eclipse.org/Equinox_p2_Metadata_Generator#Arguments_describing_the_input
> , I'm not sure you will get categories unless you use the "-site" option. 
> Understanding how the command line options interact is a bit of a black art. 
> It is to me, anyway.
> 

Funny you should mention the -site option.  I was reading that same page a few days ago and I tried to generate the metadata  with this command (pointing site to the site.xml file):

eclipse/eclipse -nosplash -application org.eclipse.equinox.p2.metadata.generator.EclipseGenerator -source /shared/technology/babel/test-updates/galileo/ -site /shared/technology/babel/test-updates/galileo/site.xml -metadataRepository file:/shared/technology/babel/test-updates/galileo/repository/ -metadataRepositoryName "My Update Site" -artifactRepository file:/shared/technology/babel/test-updates/galileo/repository/ -artifactRepositoryName "My Artifacts" -publishArtifacts  -publishArtifactRepository -compress -noDefaultIUs  -vmargs -Xmx256m

But when I pointed P2 to my local update site it still complained about having no categories defined.  

Comment 34 Denis Roy CLA 2009-06-04 10:19:15 EDT
I must have been at the right place and the right time yesterday, because I intercepted this gem on the rt-pmc list:

"The RAP team was using the old metadata generator which does not have p2.inf support. They switched to the publisher (all the cool kids are doing it...) and are happy now."   [1]

The same p2 publisher is mentioned in a blog post today:
http://eclipsesource.com/blogs/2009/06/04/who-writes-your-metadata-me-i-use-a-publisher/

So ... In order for Babel to be cool kids, should we be using the cool Publisher for Babel, instead of the old, outdated metadata generator?  Does this bug become moot?



[1] http://dev.eclipse.org/mhonarc/lists/rt-pmc/msg00706.html
Comment 35 Ian Bull CLA 2009-06-04 15:24:47 EDT
(In reply to comment #8)
> We also need to see if we have the same issue in the publisher (which I believe
> we do).
> 

Good question Denis :-).  I imagine the problem exists there too. (This bug is even tagged [publisher]).  
Comment 36 Ian Bull CLA 2009-06-09 23:36:21 EDT
I was doing some testing and it appears this is still a problem in the publisher.  I'll mark this M1.
Comment 37 John Arthorne CLA 2009-09-16 13:46:37 EDT
Ian, I'm bumping these again to M3, but please review these bugs and pick a reasonable target (or remove the target if you have no immediate plans to work on them).
Comment 38 Ian Bull CLA 2009-10-15 15:08:28 EDT
I'm looking at this bug now, and I think we should look at how we handle the metadata, and do that for the artifacts.

In the publisher, we queue up all the IUs and write them all at once (at the end), so we only have one write operation. However, for the artifacts, we add them to the repo every time we find one.  Each time we add an artifact we reload the XML, so 10's of thousands of bundles takes a bit of time.

That approach may be a bit more involved than Sean's approach, but I think it fits with our existing API a bit better.

thoughts?
Comment 39 Pascal Rapicault CLA 2009-10-15 15:38:31 EDT
I don't think that metadata and artifacts can be handled the same way as artifacts have to carry actual bytes. Also I am worried about the explicit save model for the artifact repository and the impact this has on backward compatibility, robustness (e.g. what happens when the process is killed in the middle of a save).
Now stepping back from the problem, it seems to me that with the addition of this API we are palliating for a poorly scalable implementation and I'm wondering if we would not be better off fixing the implementation of the artifact repository.
Comment 40 Ian Bull CLA 2009-10-15 16:09:08 EDT
(In reply to comment #39)
> I don't think that metadata and artifacts can be handled the same way as
> artifacts have to carry actual bytes. Also I am worried about the explicit save
> model for the artifact repository and the impact this has on backward
> compatibility, robustness (e.g. what happens when the process is killed in the
> middle of a save).
> Now stepping back from the problem, it seems to me that with the addition of
> this API we are palliating for a poorly scalable implementation and I'm
> wondering if we would not be better off fixing the implementation of the
> artifact repository.

Any ideas on what to fix?  Simple artifact (and metadata) repositories invoke a save each time something is added to them.  This helps with transactions, doesn't scale.  To help scale, we suggest using the add(SometType[] ) method, were a bunch of things are added at once.  

Other than the two options discussed here:
1. Explicit save
2. Queue up everything and write them at once

Do you see others?

We have a patch for #1, but I agree, explicit save has problems. I'll take a stab at #2 and see what problems fall out.
Comment 41 Ian Bull CLA 2009-10-20 23:31:09 EDT
Created attachment 150051 [details]
org.eclipse.equinox.p2.artifactrepository.patch

Here is a patch that uses a runnable to execute all the artifact updates without saving until the end.

This patch is not done yet, but I'm putting it here so I don't loose it.  I will commit this to the branch this week.
Comment 42 Ian Bull CLA 2009-10-20 23:31:20 EDT
Created attachment 150052 [details]
mylyn/context/zip
Comment 43 Ian Bull CLA 2009-10-20 23:55:34 EDT
This patch improves the performance of the publisher, although I'm not seeing the 40 sec times that Sean was seeing.  Just generating the MD5 hashes for thousands of bundles takes a few minutes.  However, I notice that Sean's patch was against the generator, and maybe we weren't doing MD5 hashes then.

Here are some numbers (on about 4K bundles)
Before (with artifacts) 917 sec
After (with artifacts) 575 sec

Before (w/o artifacts) 755 sec
After (w/o artifacts) 329 sec
and if I remove the MD5 hashes I get this down to 122 sec.

Now for the actual patch.  It works by adding an #executeBatch(Runnable process)  method to IArtifactRepository. There are a few questions
1. Should this be on IRepository?
2. How do we handle errors
 a. Should save() happen in a finally block (should we always try to save, even if the runnable fails)
 b. If yes, what happens if save() (also) throws an exception (out of disk space for example)?  What exception should be propagated to the caller? (the one from original process, or the one from save())?
 
 For example
  try {
     throw new NullPointerException();
  } finally {
    throw new CastClassException();
  }

As a caller, what exception are you more interested in handling?    (or does it not matter).
Comment 44 John Arthorne CLA 2009-10-21 16:27:20 EDT
(In reply to comment #43)
>  b. If yes, what happens if save() (also) throws an exception (out of disk
> space for example)?  What exception should be propagated to the caller? (the
> one from original process, or the one from save())?

You never want a finally block to throw an exception, since this will cause any original exception to be completely lost. Often if there is a second exception in the finally block it is a side-effect of the original failure, so you don't want to lose that. I see a few options here:

1) Transactional semantics: only save if the runnable succeeds, and if there is an exception any change to the repository made by the runnable is discarded.

2) Best effort semantics: Always save, but catch & log any exception thrown by the save itself

3) Combined result: Catch exceptions from the runnable, and catch exceptions from the save, and return a MultiStatus with the combined result. I suggest using SafeRunner in this case to allow severe errors to propagate immediately.

I don't feel strongly but would lean towards 3)
Comment 45 Andrew Niefer CLA 2009-10-21 16:37:49 EDT
I would also lean towards (3).  If you've run a batch of several thousand adds, and one fails, it would be a big waste to not save the successes.
Comment 46 Ian Bull CLA 2009-10-21 16:57:47 EDT
(In reply to comment #44)

> You never want a finally block to throw an exception, since this will cause any
> original exception to be completely lost. Often if there is a second exception
> in the finally block it is a side-effect of the original failure, so you don't
> want to lose that. I see a few options here:
Thanks for the explanation John.  This makes perfect sense.

> 
> 1) Transactional semantics: only save if the runnable succeeds, and if there is
> an exception any change to the repository made by the runnable is discarded.

I considered this but it can't really be done. Since we don't know what was done during the runnable, it will be hard to undo it. In our current implementation if someone added a bunch of artifact descriptors and then something failed, these descriptors would still be "added". The next repository operation would likely invoke a save, and that might not be what's desired.  Essentially we left the artifact repo in an unstable state.

> 
> 2) Best effort semantics: Always save, but catch & log any exception thrown by
> the save itself
This is what I was thinking.

> 
> 3) Combined result: Catch exceptions from the runnable, and catch exceptions
> from the save, and return a MultiStatus with the combined result. I suggest
> using SafeRunner in this case to allow severe errors to propagate immediately.
> 
I hadn't thought of this, and now that you mention it, it seems ideal.

> I don't feel strongly but would lean towards 3)
I'll put together a solution for #3.

Thanks again for the input.
Comment 47 Ian Bull CLA 2009-10-29 17:26:14 EDT
A few more interesting questions.

1. I looked at using SafeRunner but I'm not really sure what that will accomplish?  Do you mean that the caller should pass in a SafeRunnable?  If so, then we really can't catch any exceptions and pass them  back in a status (since the caller will already receive these exceptions).  

2. Where should this code live?  When I started, I put it in SimpleArtifactRepository and CompositeAartifactRepository.  The AbstractArtifactRepository simply calls run on the runnable (it doesn't muck around with saving).  However, another option would be provide a  setSaveEnabled(boolean) and a save() method on the abstract class.  The code could then live in the AbstractArtifactRepository and clients would be free to implement save / setSaveEnabled(boolean) as they see fit. (by default, they do nothing).

3. What about a batch process on Metadata Repositories?
Comment 48 Ian Bull CLA 2009-10-29 17:43:36 EDT
Created attachment 150882 [details]
org.eclipse.equinox.p2.artifact.repository.patch

This patch wraps the calling of the runnable in a try / catch / finally, and returns any exceptions as an IStatus.

This patch also includes some test cases with repositories that fail in different ways.
Comment 49 Ian Bull CLA 2009-10-29 17:43:42 EDT
Created attachment 150883 [details]
mylyn/context/zip
Comment 50 Pascal Rapicault CLA 2009-10-31 22:47:43 EDT
- SimpleArtifactRepository#save(), we should short-circuit on the first line
- Wonder if we should not catch Throwable rather than just Exception when we execute the Runnable.
- Describe the restrictions on the Runnable (for example the Runnable should not execute any operations on the repository in another thread).
- Consider adding a lock at the file level since we are now opening a bigger window during which the repository can be modified. Maybe we want to treat this in another bug.


Tests
- Need a test that actually do several adds and remove and save
- Need a test that validates that the save flag is set back properly
Comment 51 Pascal Rapicault CLA 2009-11-01 14:06:25 EST
> 3. What about a batch process on Metadata Repositories?
  This could be a nice addition for sake of consistency.
Comment 52 Ian Bull CLA 2010-01-22 01:30:36 EST
I'm going to drop this in as soon as head re-opens after M5 is declared.
Comment 53 Ian Bull CLA 2010-02-02 23:02:12 EST
Created attachment 157998 [details]
updated patch

This patch is now updated wrt to head (after all the API changes).  I still need to add a few more test cases and documentation before we can release.
Comment 54 Ian Bull CLA 2010-02-04 18:48:20 EST
Created attachment 158248 [details]
updated patch

This patch addresses most of Pascal's concerns.
1. We now short circuit on the first line
2. Catch throwable
3. Document the method
4. Added a bunch of tests (including multiple adds, remove, with and without exceptions and checking the flag).

The only thing left is the file level lock.  We can open another bug for that as it's somewhat unrelated (I'm not sure if the actual file open time is any longer, we still only open/write/close at the end).
Comment 55 Ian Bull CLA 2010-02-04 19:02:36 EST
Released in head.  I've opened Bug 301902  to add the API to metadata repo and bug 301903 to consider the file level lock.
Comment 56 Kit Lo CLA 2010-02-08 12:33:51 EST
Babel update site has 45,714 plugins (language fragments). Babel UpdateSitePublisher generation time now reduced from 46230 seconds (~13 hours) to 75 seconds, unbelievable!

Tested Babel Galileo update site with Eclipse 3.5.2 RC2, everything is fine! Thanks!
Comment 57 Ian Bull CLA 2010-02-08 12:40:03 EST
(In reply to comment #56)
> Babel update site has 45,714 plugins (language fragments). Babel
> UpdateSitePublisher generation time now reduced from 46230 seconds (~13 hours)
> to 75 seconds, unbelievable!
> 
Wow!  Ok, that's just cool.  Thanks for letting me know about that.
Comment 58 Denis Roy CLA 2010-02-08 13:59:56 EST
Woah, that is awesome.