Bug 122885 - [builder] Project build states should not store the access restrictions templates
Summary: [builder] Project build states should not store the access restrictions templ...
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.2   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 3.4 M3   Edit
Assignee: Maxime Daniel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-06 05:17 EST by Maxime Daniel CLA
Modified: 2007-10-29 10:09 EDT (History)
3 users (show)

See Also:
maxime_daniel: review? (kent_johnson)


Attachments
Suppress templates from comparison and state serialization (exploratory) (2.73 KB, patch)
2006-01-12 06:42 EST, Maxime Daniel CLA
no flags Details | Diff
Fix + tests cases - designed after option 3 above (48.37 KB, patch)
2007-04-27 01:49 EDT, Maxime Daniel CLA
no flags Details | Diff
Fix + test case (53.15 KB, patch)
2007-07-26 08:21 EDT, Maxime Daniel CLA
no flags Details | Diff
Fix + test cases (51.30 KB, patch)
2007-10-09 10:09 EDT, Maxime Daniel CLA
no flags Details | Diff
Patch for org.eclipse.jdt.compiler.tool (1.98 KB, patch)
2007-10-09 10:12 EDT, Maxime Daniel CLA
no flags Details | Diff
Patch for org.eclipse.jdt.compiler.apt (1.99 KB, patch)
2007-10-09 10:14 EDT, Maxime Daniel CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Maxime Daniel CLA 2006-01-06 05:17:50 EST
Code related, org.eclipse.jdt.internal.core.builder.State 1.53 and related classes.
Following discussions about bug 76266 and bug 122763, we (Philippe and I) believe that storing access restrictions templates into the project build state is not needed, hence should be avoided. 
This would call for their removal for the sake of optimization.
Moreover, Philippe pointed out that storing them is due to trigger a full build when the user changes his locale, which is not intended.
Note: need to check that the build state comparison is aligned as well.
Comment 1 Philipe Mulet CLA 2006-01-06 07:55:29 EST
I suspect the bug was introduced a while ago, we may need to backport fix to 3.1 maintenance stream.
I also suspect storing msg templates can make the state unnecessarily bigger than it should (PDE is creating lots of access rules in full source workspace).
Comment 2 Kent Johnson CLA 2006-01-11 14:25:52 EST
So if I release changes to the access rules, how does your workspace know to do a full build?
Comment 3 Maxime Daniel CLA 2006-01-12 06:13:31 EST
Templates are not a structural part of AccessRuleSet. In fact, AccessRuleSet carries them so that we can emit appropriate messages only. I believe that this is  not the best possible design, but fixing it would be another topic. So the combination of the classpath entry own members with the AccessRuleSet access rules should be enough to decide whether we must kick a new build or not.
So we could consider the option to ignore templates in State#readRestriction, State#writeRestriction *and* AccessRuleSet#equals. I'll do a test along these lines.
Unfortunately, this taints AccessRuleSet#equals (two instances being deemed equal even if they have different message templates). I'll explore the possibility to get rid of this issue by managing the error messages elaboration by other means and simply suppressing the templates from AccessRuleSet instances.
Comment 4 Maxime Daniel CLA 2006-01-12 06:42:27 EST
Created attachment 32903 [details]
Suppress templates from comparison and state serialization (exploratory)

The simple scenario that follows went OK:
- have projects P1 and P2;
- P2 has P1 on its build path, with access rule discouraged X;
- type P1/X;
- type P2/Y extends X;
- build (no autobuild set) -> correctly, warning on Y about discouraged access 
  to X;
- edit P2 build path, change discouraged X to discouraged Z;
- rebuild -> correctly, detects that P2 needs to be rebuilt (and the warning
  goes away).
Comment 5 Philipe Mulet CLA 2006-01-12 06:54:29 EST
AccessRule comparison should remain unchanged; i.e. equals should consider the template (as it could denote a different library).

The classpath check must then compare them differently, i.e. check for what it cares about only instead of calling AccessRuleSet#equals(...).
Comment 6 Philipe Mulet CLA 2006-01-12 07:00:17 EST
It feels like the builder should reach the AccessRules directly and compare them equal, rather than comparing AccessRuleSets.
Comment 7 Maxime Daniel CLA 2006-01-12 07:31:15 EST
(In reply to comment #5)
> The classpath check must then compare them differently, i.e. check for what it
> cares about only instead of calling AccessRuleSet#equals(...).
BTW, AccessRulesSet#equals is called from ClasspathJar#equals and ClasspathDirectory#equals, which are used to compare build states. That is, either  all the types involved into build states comparison provide a specialized comparison method (named something else than equals), or we realign their structure so that equals can endorse both responsibilities (and then build states comparison can continue to leverage it).

Comment 8 Maxime Daniel CLA 2006-01-12 10:53:33 EST
I think we could move message templates to AccessRestriction. The batch compiler uses simple message templates that are hard coded (see current AccessRuleSet constructor). For other cases, ClassPathEntry should be able to build the appropriate (AccessRule, message templates) couple into an AccessRestriction on demand (since it builds its message templates by itself).
Tell me what you think and I'll dig deeper into it if needed.
Comment 9 Kent Johnson CLA 2006-01-12 12:47:48 EST
It makes sense that ClasspathLocations are compared using their own method - not #equals().
Comment 10 Maxime Daniel CLA 2007-04-26 01:45:46 EDT
I've been playing around for a while along the following lines:
- we have two problems (access restriction and discouraged access) that take nuanced messages along two axes: a) the type of classpath entry that the restriction applies to; b) the type of language element the restriction applies to (amongst type, field, constructor, method); to keep the number of problems as low as possible while getting the problem reporter recognize and leverage nuances, I implemented a solution that lets the problem reporter combine an error message that contains a single placeholder (aka {0}) with a message elaboration that can take several parameters; message elaborations exploit the fact that problem ids have a category and a number to encode the nuances applicable to a given class of problems; combining the problem id and the elaboration id allows us to get the 24 needed message templates using 14 entries in messages.properties; note that the general form of a message that takes an elaboration should be 'Overall error explanation: {0}', the message elaborations being more complete, self-standing explanations of the details of the issue; the mechanism might be leveraged for other errors than access restrictions; I have put the problem ID in the higher bytes so as to sort elaborations according to the problem ID (the elaboration id being confined to the lower order byte);
- AccessRuleSet bears the needed information to decide which type of classpath entry it applies to (a byte), and a user-readable name for the classpath entry; this is a minimal touch approach to reducing the volume of memory needed to elaborate the desired messages; it achieves the expected effect without restructuring the code in depth;
- messages elaborations get stored in the same messages.properties file as error messages; if this proved to be a problem (say, we want the batch compiler to manage four message elaborations and the builder to manage the height remainders), we could get the caller to present the problem reporter with someone to delegate messages elaboration management to;
- the current prototype still stores a byte for the classpath entry type and a string for the classpath entry name; I am investigating means to get rid of those as well; note that the current gain is already significant, since 'resolved' templates count 987 characters over the classpath entry name for each access rules set, which means roughly 1Kb per classpath entry that bears access rules.
Comment 11 Maxime Daniel CLA 2007-04-26 02:13:08 EDT
From the code of NameEnvironment, we see that ClasspathLocation.forBinaryFolder is called for IClasspathEntry.CPE_PROJECT and IClasspathEntry.CPE_LIBRARY entries. However, the isOutputFolder field is true for IClasspathEntry.CPE_PROJECT only. Libraries should always be libraries. Hence State should be able to deduce the classpath entry type from the classpath location, without storing it explicitly as a part of access rules sets.
Only the IDE builder types of entries are concerned by this analysis though (there is an extra type for the batch compiler), hence I'll keep the type upon AccessRuleSet as far as RAM is concerned.
Comment 12 Maxime Daniel CLA 2007-04-26 03:48:34 EDT
As far as names are concerned, the current messages report names derived from the raw classpath entries, while the state stores resolved classpath locations. Considering the classpath variables case, I would contend that we need to store their name on one of the classpath location itself or the access rules set. The minimal touch approach would be to keep them upon the access rules set.
The other cases could give more opportunity to derive the classpath entry name from the classpath location.
For AccessRestriction#COMMAND_LINE type, we do not care because we have no build state.
For PROJECT, we should be able to extract the project name from the path. The API to do so does not exist yet though. It could be added to ClasspathLocation.
For LIBRARY, either a name is stored and we keep it, or else we should be able to use the full path.
The algorithm would then be:
- at write time:
  - if we have a PROJECT type, write no name;
  - else if we have an absolution path, write no name;
  - else write name;
- at read time:
  - if we have no name:
    - if the classpath location is an output folder, type is PROJECT and name is the name of the project (to be extracted from the classpath location via new API getProjectPath);
    - else type is LIBRARY and name is full path of the classpath location (that must be a ClasspathJar);
  - else type is LIBRARY and name is read from state.

Cost/benefit analysis:
- we can get rid of the type on disk, but this is only one byte per access rules set;
- we can get rid of the name on disk, but only for projects and external libs;
- the change is more risky than what I have prepared so far.

Philippe, what do you think?
Comment 13 Philipe Mulet CLA 2007-04-26 04:15:03 EDT
Pls handle this with Kent. I would make the decision on safety and relative size saving (if not noticeable, then why bother?).
Comment 14 Maxime Daniel CLA 2007-04-26 04:19:06 EDT
Possible choices are:

1 - do nothing;
2 - reconsider design of what has been done, then proceed;
3 - harden what has been done and release (for 3.3 or post 3.3);
4 - explore further the opportunities to save more space.

My preference would go to 3 because 4 is more risky and will result into more
complex code for mitigated benefits.

What I mean is that 4 has mitigated benefits over 3.
Benefits of 3 over 1 is about 1K per access rules set.
Comment 15 Maxime Daniel CLA 2007-04-27 01:49:19 EDT
Created attachment 65148 [details]
Fix + tests cases - designed after option 3 above

Would storing elaboration templates with the problem messages be a problem, we'd have to consider ways to inject elaborations from above, or else to present the problem reporter with an injection resolver. Keeping problem messages and elaborations together is simpler. Delegating back the elaborations resolution is more powerful but more complex.
Comment 16 Maxime Daniel CLA 2007-05-21 07:07:52 EDT
Targeting 3.4 as it seems to be too late in 3.3 game.
Comment 17 Maxime Daniel CLA 2007-06-19 08:15:59 EDT
Kent, would you please let me know what you think?
Comment 18 Maxime Daniel CLA 2007-07-26 08:21:17 EDT
Created attachment 74675 [details]
Fix + test case

HEAD catchup + fix of a side effect upon completion
Comment 19 Maxime Daniel CLA 2007-07-26 08:22:24 EDT
Kent, I'll release this when I'm back from vacations unless you veto it.
Comment 20 Kent Johnson CLA 2007-08-13 10:56:25 EDT
Hold off for a few days - its a very large patch.
Comment 21 Maxime Daniel CLA 2007-08-16 09:35:42 EDT
NP. "My other Summer assignment" keeps me rather busy anyway. Will revisit this round mid-September.
Comment 22 Kent Johnson CLA 2007-10-02 11:14:07 EDT
I'll reject the patch for now.

The changes are too extensive - we need more data to justify them.

Can we set up a time to go over them ?
Comment 23 Maxime Daniel CLA 2007-10-02 12:41:35 EDT
Next week would be better for me. Would Thursday afternoon (my time) be OK?
Comment 24 Maxime Daniel CLA 2007-10-09 10:09:12 EDT
Created attachment 79952 [details]
Fix + test cases

HEAD catch-up + suppressed some cosmetic differences (so as to make the review more efficient).
Currently under test.
Comment 25 Maxime Daniel CLA 2007-10-09 10:12:32 EDT
Created attachment 79953 [details]
Patch for org.eclipse.jdt.compiler.tool

The suggested fix breaks some clients (of internal APIs). This is a fix for the compiler tools. I ran the associated tests without any problem, but since the tests series is rather short, I'd be happy to hear from Olivier on the topic.
Comment 26 Maxime Daniel CLA 2007-10-09 10:14:21 EDT
Created attachment 79955 [details]
Patch for org.eclipse.jdt.compiler.apt

Same topic as for org.eclipse.jdt.compiler.tool. I ran no tests on this one yet though. Walter, Olivier, comments are welcome.
Comment 27 Kent Johnson CLA 2007-10-09 10:21:33 EDT
Where is the justification for why we should do this at all? The code has been in place for several years now.

If this is just a performance improvement, then how much faster/smaller is it?

If its more than that, then what is the argument for changing internal API calls? Is there not another approach?
Comment 28 Olivier Thomann CLA 2007-10-09 10:44:40 EDT
(In reply to comment #25)
> The suggested fix breaks some clients (of internal APIs). This is a fix for the
> compiler tools. I ran the associated tests without any problem, but since the
> tests series is rather short, I'd be happy to hear from Olivier on the topic.
Besides Kent's comments, this is fine for me. I'd like to get a better picture of why this is required.
Do you have numbers/stats about potential optimization?
Comment 29 Maxime Daniel CLA 2007-10-09 11:11:17 EDT
The justification is (leaving aesthetics aside) essentially to minimize the build state size. I'll seek stats if absolutely required, rough evaluation being a division by 4 of access rules related space in the build state.
Comment 30 Kent Johnson CLA 2007-10-09 11:16:34 EDT
So are we talking 10K, 100K or 1Mb ?

How many projects/plugins will be affected ?

Changing internal API methods to create problems is a big change for 10K.
Comment 31 Walter Harley CLA 2007-10-09 18:19:03 EDT
(In reply to comment #26)
The code in compiler.apt is just a duplicate of the code in compiler.tools (see bug 201327 for comments on why that is), so what is right in compiler.tools should be right in compiler.apt as well.
Comment 32 Maxime Daniel CLA 2007-10-10 02:49:02 EDT
Thinking more about it, the size benefit cannot be the reason to do this. I dare hope that the build state size is dwarfed by the size of class files, hence the size benefit is due to be marginal, and I won't spend even a second measuring it.

A bit more of history. In the course of fixing bug 76266, I noticed that we were storing message templates upon the access rules sets, all the way down to the build state, which is a unique case in our code, and ugly at best. Given the timing of the fix, we (Philippe and I) decided to deliberately worsen the matter by keeping the same design, but adding three more templates to the single one used so far, deferring the correction of the design to a later time. 
Hence this bug, for which the true motivation was:
a) get rid of a design flaw;
b) as a side benefit, lower the build state size to slightly better than what it was before bug 76266 was fixed.

My current proposal (details to be discussed):
- removes the design flaw we have and know of;
- does that with a slight benefit on space, a slight benefit on time as long as we have no restriction violations, at worst a potential slight loss on time when we report restriction violations;
- introduces a pattern (messages elaborations) that I believe will be needed for other scenarios.
Admittedly, we do so at the cost of a couple of changes to 'internal APIs' if I dare say (all my changes are within org.eclipse.jdt.core.internal* packages). But I thought that this period of the release cycle was the right one to promote that kind of changes.

Given the fact that the elaboration of the current proposal are sunk costs, I have no problem closing this as WONTFIX if this is what is decided. Design matters yield judgment calls.

In the opposite case, I'll be more than happy to discuss the details so that we get something better.
Comment 33 Maxime Daniel CLA 2007-10-12 09:07:25 EDT
Verified today upon a full source workspace for build I20071010-1200 that no other Eclipse plugin than org.eclipse.jdt.compiler.tool or org.eclipse.jdt.compiler.apt   gets compiler errors when applying the suggested changes. Note though that components that derive DefaultProblemFactory should nevertheless check that the provided implementation for the new method:
	CategorizedProblem createProblem(
		char[] originatingFileName,
		int problemId,
		String[] problemArguments,
		int elaborationId,
		String[] messageArguments, 
		int severity,
		int startPosition,
		int endPosition,
		int lineNumber,
		int columnNumber)
matches their need, and override it if needed.
Comment 34 Maxime Daniel CLA 2007-10-15 13:46:26 EDT
Released for 3.4M3
Committed the relevant changes to compiler tool and compiler apt as well. (Olivier, would you please ensure that these changes get into the next I-Build?)
Comment 35 Olivier Thomann CLA 2007-10-15 15:38:21 EDT
I updated the code released to remove unused messages also from org.eclipse.jdt.internal.core.util.Messages.
Also I remove the usage of String.replaceAll(String, String) since this is a 1.4 method and the compiler needs to be built using only Foundation 1.0.
Let me know if you have any concerns with the latest changes.
Comment 36 Frederic Fusier CLA 2007-10-29 10:09:08 EDT
Verified (code) for 3.4M3 using build I20071029-0010 build.