Bug 207093 - Perf: adding a new top-level package is slow if many source files exist
Summary: Perf: adding a new top-level package is slow if many source files exist
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.3.1   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.5 M4   Edit
Assignee: Kent Johnson CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2007-10-22 17:11 EDT by Jess Garms CLA
Modified: 2008-12-09 12:23 EST (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jess Garms CLA 2007-10-22 17:11:37 EDT
I have a project with 12,000 source files, all in a package "test.foo". Everything compiles fine with no errors.

If I add a new source file, "org/example/Bar.java", it triggers a full rebuild which takes several minutes. Note that this is in the same source root.

There are no dependencies between test.foo and anything in org.example in either direction. Seems like it should be possible to short-circuit compilation and only compile the new files.

The source files in question are proprietary so I can't attach them to this bug, but I can provide separately by email. Thanks much!
Comment 1 Kent Johnson CLA 2007-10-23 11:18:36 EDT
Jess, please turn on the builder trace so we can see why so many files are being recompiled:

# Turn on debug tracing for org.eclipse.jdt.core plugin
org.eclipse.jdt.core/debug=true
# Reports incremental builder activity
org.eclipse.jdt.core/debug/builder=true

Send the trace by email if you do not want to attach it here

thx
Comment 2 Kent Johnson CLA 2007-10-23 13:59:42 EDT
Actually - are you only seeing this if you add the new source file to a NEW top-level package "org", "java", or "com" ?

If so then I think you're being hit with an optimization to save space with our reference recording - see ReferenceCollection.WellKnownQualifiedNames
Comment 3 Jess Garms CLA 2007-10-23 14:18:29 EDT
Yes, adding a new source file to a top-level org package. 
Comment 4 Jess Garms CLA 2007-10-23 14:24:14 EDT
To be more specific, I am adding a source file, "org.example.Bar" which creates a new top-level package "org", with the source file inside the package "org.example".
Comment 5 Kent Johnson CLA 2007-10-23 14:34:25 EDT
Adding a new top-level package named "org", "java" or "com" is hit by this optimization. It usually happens earlier in development when there are not so many source files under different top-level package names.

Every existing source file in your project 'references' these 3 top-level 'well-known' package names so they are all recompiled.


We save too much space by not physically recording references to these 3 package names, so I don't think we'll change this.
Comment 6 Tim Hanson CLA 2007-10-23 16:08:12 EDT
I think you are right to not record references to org, com, and java. 

But I think that implies that if one of these packages is added, you can safely do nothing since these already exist. 

Comment 7 Kent Johnson CLA 2007-10-23 16:20:02 EDT
No, there is no guarantee that they exist in the project yet, nor on its classpath.

In most cases, they do - but a simple project that does NOT depend on rt.jar starts with no packages at all.
Comment 8 Tim Hanson CLA 2007-10-23 16:23:34 EDT
If rt.jar is not on the classpath, nothing would compile. Without String, Object, etc, java code can't even begin to compile. 

In fact, I believe the builder doesn't even run. It bails out early with a message about the classpath being misconfigured.
Comment 9 Kent Johnson CLA 2007-10-23 16:36:48 EDT
No, we compile fine if Object, String etc. are provided as source files (obviously in the correct packages).

So "java" and "com" are not the best examples to use in this case.



But "org" is not always available and we cannot rely on it existing in all cases.
Comment 10 Tim Hanson CLA 2007-10-23 17:19:05 EDT
The thing is almost no compilation units need to be recompiled with the addition of a top-level "org" package.

The only compilation unit whose problems would change as a result is one that has a broken import of "org.*". Any other compilation unit would see no change in problems as a result of the package addition.

I'm arguing that you keep too much info in the build state. Almost nothing should depend on org. For example, a typical import would be:

import org.w3c.dom.Document;

The only thing that can affect the compilation unit with that import (or that QName reference in the source code), is a change to Document.class (or .java) itself if the file compiles.

Only if the "org" package does not exist, does the compilation unit have a dependency on the "org" package.

The only other case is the on-demand import which only depends on the deepest level. For example:

import org.w3.dom.*;

The only thing that can affect this compilation unit is addition or removal to package "org.w3.dom".


In summary, I don't understand why everything has a dependency on the top-level packages, when in reality is almost nothing does. This is causing a lot of pain for us to do expensive rebuilding that is unnecessary.
Comment 11 Kent Johnson CLA 2007-10-24 10:16:03 EDT
"In summary, I don't understand why everything has a dependency on the top-level packages, when in reality is almost nothing does. This is causing a lot of pain for us to do expensive rebuilding that is unnecessary."


We disagree.

You're using your code as proof - but it proves nothing.

Not only are import statements affected but any qualified reference to 'org.A', plus every error message that references 'org.???'.

How do you know that every package that starts with 'org' is followed by another subpackage - you don't.


And when it really comes down to it, how many projects have ever encountered this problem AFTER defining 1000 source files, let alone 100 ?

After 6 years, you're the only one.

This is not a case we need to special case. Its a one time hit and is far from an everyday issue.

In fact its a once in 5 year problem that causes a single full build.
Comment 12 Tim Hanson CLA 2007-10-24 11:40:05 EDT
"We disagree." That means YOU disagree.

"Not only are import statements affected but any qualified reference to 'org.A',
plus every error message that references 'org.???'." Show me an example.

"After 6 years, you're the only one.

This is not a case we need to special case. Its a one time hit and is far from
an everyday issue."

That's not really a good argument. I expect your average customer knows little of the details of Java compilation. Not too mention APT is in fact a bit of a special case. If working together isn't important to you, just let us know.
Comment 13 Kent Johnson CLA 2007-10-24 11:55:03 EDT
Where's the typical case?

What example do you have that shows an average user will see this problem once a week/month/year?


You want us to consider a performance problem but won't show us how it could be happening with any frequency.
Comment 14 Walter Harley CLA 2007-10-25 19:22:03 EDT
(In reply to comment #13)
> Where's the typical case?
> 
> What example do you have that shows an average user will see this problem once
> a week/month/year?

The typical situation is that a user has a large Java project, and decides to wrap it with some web services.  The first thing they do, naturally, is some experimentation.  Unfortunately the default namespace for the WSDL Java wrapper generation examples is in org.example.  

Now that we see what's going on, there is an easy workaround: specify a non-org namespace.  That would not have been obvious to the customer or tech supporters, though - how could they realize that their performance would be vastly improved by choosing a different namespace for their WSDL?  Instead, the conclusion they draw is more like "Eclipse's WSDL support has poor performance, based on initial investigation."

I don't know how common this situation is, but it was important enough to at least one paying customer of a commercial product based on Eclipse to get promoted all the way through tech support to dev; hence this bug report, if I correctly understand.  At least this customer reported the problem; others may simply choose to use a different product.
Comment 15 Frederic Fusier CLA 2007-10-29 07:46:22 EDT
Reset target as the discussion seems not really closed on this bug...
Comment 16 Philipe Mulet CLA 2007-10-30 13:57:23 EDT
Reopening to see if there is really nothing we can do... (no promise, but we will try harder).
Comment 17 Kent Johnson CLA 2008-11-27 15:35:01 EST
Not quite a duplicate of bug 252948, but that fix takes care of the case in comment 0
Comment 18 Olivier Thomann CLA 2008-12-09 12:23:46 EST
Verified for 3.5M4 using I20081208-1800