Bug 567760 - [eclipse-repository] Defer dependencies *download* to get them only when mojo request local artifacts
Summary: [eclipse-repository] Defer dependencies *download* to get them only when mojo...
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Tycho (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Mickael Istria CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy, performance
: 375111 (view as bug list)
Depends on:
Blocks: 500769 564181
  Show dependency tree
 
Reported: 2020-10-09 10:16 EDT by Mickael Istria CLA
Modified: 2021-04-28 16:52 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mickael Istria CLA 2020-10-09 10:16:30 EDT
Currently Tycho does resolve and *download* dependencies very early during the build.
While the former (resolution from p2 metadata) isn't easy to avoid, and actually doesn't take too much time; actually *downloading* the artifacts can take very long while not being useful in some cases (eg `mvn validate` or even just `mvn compile` which don't need all transitive deps).
Tycho should investigate lazy download of the artifacts.

Note: this issue is different from bug 353889 as this current is only about lazily downlaoding artifacts, while bug 353889 is about changing the orchestration of dependency resolution.
Comment 1 Eclipse Genie CLA 2020-10-23 15:44:31 EDT
New Gerrit change created: https://git.eclipse.org/r/c/tycho/org.eclipse.tycho/+/171206
Comment 2 Mickael Istria CLA 2020-10-23 15:49:32 EDT
So it appears this one is extremely difficult to me.
It's relatively easy to tweak the p2 based dependency resolution to avoid an early download of artifacts, so we can delay the download a bit later.
However, a 2nd pass of resolution takes place just after, and this one uses the EquinoxResolver and OsgiBundleProject.getClasspath() elements; which do need the artifacts locally as this pass reads MANIFEST.MF. I've tried to delay a bit the computation of classpath, but it still happens early.
Overall, what I'm not getting at the moment is why is there even a 2nd pass of resolution? What's missing in the 1st p2-based pass which requires to them read MANIFEST.MF for all target platforms element so early...

I've put my experiments at https://git.eclipse.org/r/c/tycho/org.eclipse.tycho/+/171206 . What need to be investigated is whether the whole build path and 2nd resolution pass has to happen very early.
However I think this work begins to look like a duplicate of bug 353889 and Tobias' comment on that other bug are totally in sync with what would be expected to happen here (1st: compute a build order, then start building all modules with extra resolution if necessary, but not so early).
Comment 3 Christoph Laeubrich CLA 2020-10-24 03:29:16 EDT
P2 offers only *runtime* dependency resolution while the later provides *compile time* dependency. For example it is possible to place jars or even pre-compiled classes inside a bundle.

I'm a bit confused about that verify/compile thing because I would at least at that stage assume that the full classpath has to be available? So I wonder at what phase one would expect the download?

This might also be a limitation of maven itself, as from my experience maven itself starts to "download the internet" whenever I try to compile a new project very early.

What would be crucial from my point of view to investigate this further would be to have one simple case (preferable just one bundle) that shows one specific case/build command where an artifact is downloaded "to early" and we should give an idea why it should not be necessary to download it.
Then it would be possible to debug why/when it is downloaded and if this might really be optional, for example if it is just for meta-data that is already available via p2 we might need to insert a link here in between to transfer/convert the data into necessary formats.
Comment 4 Christoph Laeubrich CLA 2020-10-24 03:43:59 EDT
(In reply to Mickael Istria from comment #2)
> So it appears this one is extremely difficult to me.

Just wondering, even if I can understand the purpose for this, is this really such a big problem as I can hardly think about use cases where the artifacts are not needed (beside the usual 'mvn clean' usecase).
Comment 5 Mickael Istria CLA 2020-10-24 03:55:05 EDT
(In reply to Christoph Laeubrich from comment #4)
> (In reply to Mickael Istria from comment #2)
> > So it appears this one is extremely difficult to me.
> 
> Just wondering, even if I can understand the purpose for this, is this
> really such a big problem as I can hardly think about use cases where the
> artifacts are not needed (beside the usual 'mvn clean' usecase).

See the bug marked as dependent: we cannot at the moment verify that a build plan is satisfiable without downloading all artifacts; so the feedback about satisfiable plan does happen too late.

For the case of an eclipse-plugin I'd expect the dependencies to be downloaded during compile.
For the case of an eclipse-repository, I'd expect the dependencies to be downloaded when aggregating. In case of SimRel, it would be `mvn validate` verifying the dependency resolution, without downloading any artifact.
Basically, Tycho should keep a reference to the InstallableUnit as long as possible and only fetch the underlying artifact when an operation actually needs the file.

 If we can just enable this for eclipse-repository, it could actually be enough for the main use-case (SimRel). Maybe the submitted patch may actually already do it. I'll need to try later.
Comment 6 Christoph Laeubrich CLA 2020-10-26 14:04:29 EDT
(In reply to Mickael Istria from comment #5)
> See the bug marked as dependent: we cannot at the moment verify that a build
> plan is satisfiable without downloading all artifacts; so the feedback about
> satisfiable plan does happen too late.
> For the case of an eclipse-plugin I'd expect the dependencies to be
> downloaded during compile.

'mvn validate' should not download and 'maven compile' should do it right?

Is there an easy reproducer for this or anything special or can I simply use any project that contains at least one eclipse-plugin module?

> Basically, Tycho should keep a reference to the InstallableUnit as long as
> possible and only fetch the underlying artifact when an operation actually
> needs the file.

I think all information is already there we just need to use it, I'll try to also take a look at this I just a bit short of time atm.
Comment 7 Mickael Istria CLA 2020-10-26 18:09:11 EDT
(In reply to Christoph Laeubrich from comment #6)
> 'mvn validate' should not download and 'maven compile' should do it right?

Yes, although at the moment I think we can restrictthe use-case to eclipse-repository: dependency resolution remains happening early "afterProjectRead" but download of necessary deps would only happen during execution of the assemble-repository mojo.
Not that this late download may also allowto skip downloads.of useless transitive dependencies in some cases.
> 
> Is there an easy reproducer for this or anything special or can I simply use
> any project that contains at least one eclipse-plugin module?

No simple reproducer yet. I'll work on it this Thursday if everything remainsas expected.

> I think all information is already there we just need to use it,

Indeed, my current experiments align with that: data is already there and this change doesn't require more things, it's only a matter of just replacing some direct access to dependency files in early lifecycle by aome strategy to get the file only when necessary.


> I'll try to
> also take a look at this I just a bit short of time atm.


Thanks, but no need for you to worry or rush too much about it, as I think I knowhow to achieve the goal here and I have a sufficient "time budget" for it.
Comment 8 Mickael Istria CLA 2020-11-02 06:19:00 EST
Bug 372780 seems to be a prerequisite: currently the dependencies needs to be fetched and resolved early to be added as dependencies in the Maven model (like <dependency>). Allowing to delay downloads means that we need to delay or avoid creation of those dependencies unless necessary. If we stop relying on them in Tycho specific goals, that would make things much easier.
Comment 9 Christoph Laeubrich CLA 2020-11-02 06:28:17 EST
I wonder if this is even supported by maven. If I start with an empty local repository and perform "mvn clean" will the artifact of a dependency be downloaded? What about the meta-data (pom/md5)? That resembles mostly what P2 provides.
Comment 10 Mickael Istria CLA 2020-11-02 06:37:05 EST
(In reply to Christoph Laeubrich from comment #9)
> If I start with an empty local
> repository and perform "mvn clean" will the artifact of a dependency be
> downloaded? What about the meta-data (pom/md5)? That resembles mostly what
> P2 provides.

Metadata would be downloaded (as they're necessary for dependency resolution to compute the build graph) and cached for further reuse.
Artifacts wouldn't be downloaded until necessary (ie until assemble-repository is called)
Comment 13 Mickael Istria CLA 2020-11-23 14:18:58 EST
*** Bug 375111 has been marked as a duplicate of this bug. ***