Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-ui-dev] Smart Import concerns



2016-05-15 16:54 GMT+02:00 Mickael Istria <mistria@xxxxxxxxxx>:
Hi Stefan,

Thank you for your feedback. I'm glad to tell you that most of your concerns are already handled by the Smart Importer. However, I admit the approach is very disruptive for contributors compared to existing import scenarios, it's not trivial to understand so it requires some (a lot of) explanations.

Thanks for you explanations Mickael! Please see my comments inline :) 

 

The first problem is that this assumes that there is a 1:1 mapping from physical folders to Eclipse projects.
This is actually a (fair) limitation from the Resource model: 1 project maps to a single folder and vice-versa. The workspace doesn't allow you to have multiple projects at the same location, nor to have a project with multiple locations.

The Eclipse resource model has linked resources. So you can create Eclipse projects that link resources from a different location into their resource tree. What we would do is roughly this:

myGradleProject
  src/main/java
  src/test/java
  .eclipseProjects
     myLib
       src/main/java -> myGradleProject/src/main/java
     myTestSuite
       src/test/java -> myGradleProject/src/test/java
 
The .eclipse folder and all its contents would be auto-generated by Buildship. They do not exist at the time the Smart Importer runs. I guess this also answers the question below.


The ProjectConfigurator API still makes it possible to have 1 given directory resolve to multiple project proposals. See for example the Maven import, or a hierarchy of Eclipse projects: the input is a single directory and the output is a set of projects.
I'll go into more concrete explanation trying to answer you questions step by step. Make sure you have the code of ProjectConfigurator ( http://git.eclipse.org/c/platform/eclipse.platform.ui.git/tree/bundles/org.eclipse.ui.ide/src/org/eclipse/ui/wizards/datatransfer/ProjectConfigurator.java ) nearby to look at it while I reference some parts.
Also, to determine which sub-projects are contained in a Gradle build, you need to load the model, which can be a long-running operation (e.g. you might have to download the right Gradle distro first).
The Gradle configurator, implementing ProjectConfigurator, can load this Gradle model for the given input directory in the findConfigurableLocations method. This can be a long running one and it's the one used in order to populate the proposals.
An semi implementation for Gradle would look like (I imagined some API that I suppose have equivalent in BuildShip):
  public Set<File> findConfigurableLocations(File root, IProgressMonitor monitor) {
    SubMonitor subMonitor = SubMonitor.convert(monitor...);
    GradleModelBuilder gradleModelBuilder = GradleModelBuilder.getInstance(subMonitor.split(...));
    Set<File> gradleFiles = findChildrenWithName("build.gradle", subMonitor.split(...));
    Set<File> allGradleComponents = new HashSet<>();
    for (File gradleFile : gradleFiles) {
      GradleModel gradleModel = gradleModelBuilder.buildModel(gradleFile, subMonitor.split(...));
      allGradleComponents.addAll(gradleModel.getResolvedComponentLocations());
    }
    return allGradleComponents;
  }

Software components are not physically nested folders. A single physical Gradle project can be logically divided into different components that have different compile/runtime classpaths. So the approach proposed above would only work if we actively create the folders during this loop, which would be weird, because the user hasn't even selected which he wants yet :)

     
The next problem comes when the Smart Importer goes and imports all those projects itself. As far as I've seen it'll just use the folder name. This won't work for any larger project. It is very common to have a physical structure like this
root
 |- sub1
    |- api
    |- impl
 |- sub2
   |- api
   |-impl
If you try to import these, you'll get an exception because of duplicate names. Both Gradle and Maven handle this properly in their own importers, because they actually use a richer model. For Gradle there is a de-duplication algorithm, for Maven the artifactIds are used (they are unique within a multi-module project)
This one is indeed an issue, but the importer will manage to import the example structures. Although project in sub2/api will be likely to be called api_(0) and sub2/impl likely to be called impl_(0). http://git.eclipse.org/c/platform/eclipse.platform.ui.git/tree/bundles/org.eclipse.ui.ide/src/org/eclipse/ui/internal/wizards/datatransfer/SmartImportJob.java#n554

Fair enough, but still Buildship has to rename all of them, which creates additional churn and auto build events, which could be completely avoided by using the Buildship wizard. 


In cases of user diverging from the "mainstream" configurations, then we can easily expect users to tweak their projects via the Project > Properties menu.

This might work when we're talking about 1 or 2 projects. But real world projects out there have 100s, sometimes 1000s of sub-modules. You can't expect the user to click through those and change configurations.
 
Another concern is performance. I can only speak for Gradle here, but I assume the Maven guys have also done some heavy optimizations. Buildship fetches the model for a root project *once* and then does all the configuration within a single transaction. It suppresses both resource change notifications as well as JDT notifications until the import is complete. As a result, Buildship can import 500 sub-projects in just 10 seconds! Without these optimizations the same can take up to 5 minutes.
For Maven, the strategy that was used is to run Maven configuration asynchronously. So the modules get imported by the importer, and the Maven configurator populates a set of IProject to process for Maven configuration.
The actual configuration is performed asynchronously and in a non-blocking way, resulting in a decent user experience. See https://github.com/eclipse/m2e-core/blob/master/org.eclipse.m2e.importer/src/org/eclipse/m2e/importer/internal/MavenProjectConfigurator.java#L258 .

Honestly, I wouldn't call this a decent user experience. The projects are unusable until all configuration is there. Doing it in the background just improves the perceived performance a little bit, but as a user I still have to wait. Try importing a build with 500 sub-projects. It takes ages if you configure them one by one, because JDT will run lots of auto builds, the different views in Eclipse will update themselves constantly etc. It creates a lot of pressure on the UI thread, making the IDE almost freeze. You can't really use anything until the background job has settled down. Buildship does this in 10s thanks to its transactional approach. 

So the absolute minimum you need is doing this stuff synchronously, in a workspace transaction and allowing contributors to add additional transactional semantics. That way, e.g. we can also suppress JDT notifications until the import is completely done. Otherwise this will be way too slow for large projects.
 
Finally, it seems to me that all configurators will be run. This will lead to very awkward situations for many users. For instance, if you are doing a build system migration, you will usually have build files for both build systems in your projects for the transition period. Let's say, you might have both pom.xml and build.gradle files. With the current approach, this would lead to a project that is configured by both the Maven and Gradle configurator, which would definitely lead to a completely broken project.
Can you please provide an example?

Maven and Gradle both expect full control over the project. They have a classpath container, they manage the runtime classpath, they configure all kinds of settings. A project with both Maven and Gradle nature is completely broken.
 
Most natures work well together. A project that has both pom.xml and build.gradle is simultaneously a Maven and a Gradle project. It's totally consistent to import it with both natures enabled in the IDE.

It is not, it is *either* a Maven project *or* a Gradle project, depending on which tool you want to work with right now. 
 
Then users should be able to disable the nature they don't want (Gradle or Maven) later, while using the IDE.

As I said, this is impractical for large amounts of projects.
 
The "regular" import wizards would still be available for such tricky case when user need an exclusive import strategy.
 

Also, here are a few questions to consider about the mvn vs gradle issue at import:
* how would a user on a new project know whether it's best to import with Maven or Gradle?

I guess you are coming from a different viewpoint here - importing small open source projects written by someone else. In those small projects, there is usually only one build system, so there would be no choice to make. I'm more concerned about large enterprise builds and there the developers know very well which build system is the "current" one and which is the "one we're migrating to".
 
* should it be the role of the user to select one or another?
 
Absolutely. 

* since Maven and Gradle work together on CLI, why aren't they working together in the IDE.

They don't work together. You either run a Gradle build or a Maven build, not both at once.
 
Buildship is not far from being able to say "just use smart import". The remaining steps seems to be to Implement the project detection mentioned higher in this mail, and (depending on actual performances) to Implement configuration asynchronously like Maven did.

I hope my explanations have clarified a bit more how we are not that close yet. Let me propose another idea that moves more responsibility to the contributors, while still keeping one unified wizard:

1. User selects a root dir
2. Contributors are asked if they can provide a list of project names
3. User chooses a contriburor
4. The names provided by that contributor are shown for selection
5. The contributor is told to import those that the user selected

Now it is very important to note that those names are just identifiers for the contributor, not necessarily physical folders. So the Smart Importer does nothing itself, the contributor takes full control of how to actually imports the chosen projects. This reduces the amount of duplicated code and maximizes the possible performance.

Cheers,
Stefan

Back to the top