Bug 356620 - Make it possible to provide indexes for defined libraries
Summary: Make it possible to provide indexes for defined libraries
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.7   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 3.8 M5   Edit
Assignee: Satyam Kandula CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 364287 364290 365446
  Show dependency tree
 
Reported: 2011-09-02 12:04 EDT by Olivier Thomann CLA
Modified: 2019-11-08 14:03 EST (History)
11 users (show)

See Also:
srikanth_sankaran: review+


Attachments
First cut (47.78 KB, patch)
2011-10-14 23:18 EDT, Satyam Kandula CLA
no flags Details | Diff
Proposed patch (128.09 KB, patch)
2011-11-17 11:37 EST, Satyam Kandula CLA
no flags Details | Diff
Patch with core changes (without index in jar support) (74.63 KB, text/plain)
2011-12-02 10:11 EST, Satyam Kandula CLA
no flags Details
Patch for supporting the index file in a jar (68.51 KB, patch)
2011-12-07 23:47 EST, Satyam Kandula CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Olivier Thomann CLA 2011-09-02 12:04:08 EDT
In some cases, indexing takes a significant amount of time. There is not much that can be done to significantly improve the performance except for having a pre-built index.
Comment 1 Olivier Thomann CLA 2011-09-02 12:04:32 EDT
Satyam,

Please investigate.
Comment 2 Satyam Kandula CLA 2011-10-14 22:39:48 EDT
I will not be able to finish it by M3, so moving it to M4.
Comment 3 Satyam Kandula CLA 2011-10-14 23:11:13 EDT
Here is the overall approach I am taking for this in JDT/Core.

. This solution is targeted to have indexes for only jars, which includes plugin jars. It will not be targeted for class folders or source folders. Source folders and class folders are likely to change and hence tracking could be difficult.

. The index file can be generated either through an ant task or direct function call or an headless application. The index file can be generated only for a jar. To make it simple, all of these just take two parameters, a jar file to index and the path to the index file that needs to be written.

. The path to the index file can be provided through the classpath attribute. It should be like a URL string, so that the path to the indexFile can be easily mentioned even if it is in a jar. Here is the javadoc for attribute in IClasspathAttribute
#######
/**
 * Constant for the name of the index location attribute.
 * 
 * <p>The value for this attribute has to be the string representation of a URL.
 * It should point to an existing index file in a folder or a jar.</p>
 * 
 * @since 3.8
 */
String INDEX_LOCATION_ATTRIBUTE_NAME = "index_location"; //$NON-NLS-1$
#######

. If the index file specified is not present or if the indexer thinks it is corrupt, the library is reindexed. I haven't planned yet to see if the index file is really in sync with the jar. If there are multiple projects depending on the same jar, but if their index_location classpath attributes are different, the last one that sets wins. I am assuming that this will generally not happen and hence it is not worth taking care of this. 

Please let me know if you have any alternative view on the above points.
Comment 4 Satyam Kandula CLA 2011-10-14 23:18:25 EDT
Created attachment 205241 [details]
First cut

This patch is not complete. I have marked TODOs for all the incomplete tasks.
Comment 5 Martin Oberhuber CLA 2011-10-17 08:59:37 EDT
(In reply to comment #3)
> The path to the index file can be provided through the classpath attribute.
> It should be like a URL string

Would that be an absolute URL, or a relative one?

When sharing project(s) in a team, it can happen that different team members have the project (and thus the index) in a different physical location.

So the "index_location" should either support specifying the location relative (eg relative to the plugin root, or relative to the JAR being indexed; Javadocs would need to specify what it is relative to) or some sort of variable substitution mechanism is needed.
Comment 6 Satyam Kandula CLA 2011-10-18 10:20:25 EDT
(In reply to comment #5)
Thanks for this comment. This should better support both an absolute and a relative path. Do you think variable substitution mechanism is needed? I don't plan to.
I will update the Javadoc accordingly. Thanks.
Comment 7 Grant Taylor CLA 2011-11-04 16:04:55 EDT
I think this sounds good.  In our case, we have a WST server target which defines many jars.  If these jars can be pre-indexed, we'll get a big boost in performance.  It sounds like we should be able to:
- generate these index files before shipping our product
- ensure that the classpath containers point to the index files (which would be plugin resource URLs).  Is this true?  Can I set up index file references for a classpath container?
If the above is all true, then I'm good with the design.
Comment 8 Srikanth Sankaran CLA 2011-11-10 01:17:28 EST
(In reply to comment #7)

> If the above is all true, then I'm good with the design.

Satyam, for the record, please document where we stand with
respect to Grant's observations.

Jay, please review when a close to final form version as indicated
by Satyam becomes available.
Comment 9 Satyam Kandula CLA 2011-11-10 01:32:25 EST
(In reply to comment #7)
Grant, Thanks for your comments.
You should be able to specify the index file from a classpath container.
Comment 10 Satyam Kandula CLA 2011-11-17 11:37:52 EST
Created attachment 207155 [details]
Proposed patch
Comment 11 Satyam Kandula CLA 2011-11-17 11:39:36 EST
(In reply to comment #10)
I still have to improve in the Javadocs and comments. Jay, please review.
Comment 12 Satyam Kandula CLA 2011-11-17 11:42:40 EST
(In reply to comment #5)
Here is the new javadoc for attribute in IClasspathAttribute. Hope this is good enough. I wanted to refer to the javadoc of the platform spec but couldn't find a javadoc. 
#######
/**
 * Constant for the name of the index location attribute.
 * 
 * <p>The value for this attribute has to be the string representation of a
URL.
 * It should point to an existing index file in a folder or a jar. 
 * The URL can also be of platform protocol.</p>
 * 
 * @since 3.8
 */
String INDEX_LOCATION_ATTRIBUTE_NAME = "index_location"; //$NON-NLS-1$
#######
Comment 13 Srikanth Sankaran CLA 2011-11-17 19:31:28 EST
(In reply to comment #11)
> (In reply to comment #10)
> I still have to improve in the Javadocs and comments. Jay, please review.

All stake holders, you are welcome for an early test drive even as this patch
is being reviewed and tested from our side. Let us know of any issues asap, TIA.
Comment 14 Satyam Kandula CLA 2011-11-21 06:04:11 EST
While discussing with Srikanth, he has pointed out that I haven't taken care of reporting a build error/warning/info when the index file attribute is wrong or not present or corrupt.
Should the classpath validator report a warning or should there be a different API to tell whether the index file looks good?
Comment 15 Deepak Azad CLA 2011-11-21 06:56:01 EST
Satyam, a couple of nit picks about usage of the command line application which you specified as

"
Usage: eclipse -application org.eclipse.jdt.core.JavaCodeGenerateIndex [ OPTIONS ] -output <indexFile> <jarfile>

   -output <indexFile> The index file to be generated.
   <jarfile>   Jar file for which index needs to be generated.
"

1. Since the output file is specified via -output option, I assumed that there would be a default behavior in case this is not specified, i.e. an index file would be created in the same location as the jar file. Is it possible to have this default behavior?

2. There is no success message once the application is run. If I specify verbose option I get a 'Generating Index .... " however a similar message should also be there when verbose option is not specified.

3. When an index file already exists, it is 'silently' overwritten. I suppose this is deliberate?
Comment 16 Deepak Azad CLA 2011-11-21 07:10:24 EST
(In reply to comment #7)
> I think this sounds good.  In our case, we have a WST server target which
> defines many jars.  If these jars can be pre-indexed, we'll get a big boost in
> performance.  It sounds like we should be able to:
> - generate these index files before shipping our product
> - ensure that the classpath containers point to the index files (which would be
> plugin resource URLs).  Is this true?  Can I set up index file references for a
> classpath container?

(In reply to comment #9)
> (In reply to comment #7)
> Grant, Thanks for your comments.
> You should be able to specify the index file from a classpath container.

Wouldn't a better/right solution be to have an extension point similar to the ones for Javadoc and source attachments
- http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.pde.doc.user%2Freference%2Fextension-points%2Forg_eclipse_pde_core_javadoc.html
- http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.pde.doc.user%2Freference%2Fextension-points%2Forg_eclipse_pde_core_source.html
Comment 17 Dani Megert CLA 2011-11-21 08:16:37 EST
(In reply to comment #14)
> While discussing with Srikanth, he has pointed out that I haven't taken care of
> reporting a build error/warning/info when the index file attribute is wrong or
> not present or corrupt.
> Should the classpath validator report a warning or should there be a different
> API to tell whether the index file looks good?

A wrong index results in very bad behavior of the whole IDE and hence needs to be reported to the user. I'd says this should be reported as build path problem.
Comment 18 Dani Megert CLA 2011-11-21 08:42:25 EST
> Wouldn't a better/right solution be to have an extension point similar to the
> ones for Javadoc and source attachments
> -
> http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.pde.doc.user%2Freference%2Fextension-points%2Forg_eclipse_pde_core_javadoc.html
> -
> http://help.eclipse.org/indigo/index.jsp?topic=%2Forg.eclipse.pde.doc.user%2Freference%2Fextension-points%2Forg_eclipse_pde_core_source.html

I *think* this is the old way for providing the source and no longer in use. We now normally ship the source in a bundle called <bundleName> + ".source." + <version> + ".jar" and the manifest of the source bundle has something like:

Eclipse-SourceBundle: <bundleName>;version="<version>";roots:="."

Curtis, please correct me if I'm wrong. PDE needs to provide a similar mechanism for the index.
Comment 19 Curtis Windatt CLA 2011-11-21 13:44:05 EST
(In reply to comment #18)
> I *think* this is the old way for providing the source and no longer in use. We
> now normally ship the source in a bundle called <bundleName> + ".source." +
> <version> + ".jar" and the manifest of the source bundle has something like:
> 
> Eclipse-SourceBundle: <bundleName>;version="<version>";roots:="."
> 
> Curtis, please correct me if I'm wrong. PDE needs to provide a similar
> mechanism for the index.

Yes, source is now provided by generated source bundles which have an OSGi bundle structure but use that header to specify the source roots and what they provide source for.

Bug 364290 covers adding support for this in the PDE classpath container, but the discussion on how to contribute the index belongs here.  Couple of questions:

- Does the ant task produce individual files for each jar?  Is there any way an index file will belong to more than one jar?

- Are we expecting index files to typically be included inside the binary jars, in the source jars or independently?

PDE wants to address our side of this in M5.  As there won't be PDE Build support for index generation or creating a custom manifest header I'm not sure where the process for contributing will be specified.  If a custom header is used, when we read the bundles of the target platform we will store the information and later add it to the classpath entry similar to how we specify source or javadoc locations.
Comment 20 Satyam Kandula CLA 2011-11-21 23:05:42 EST
> 1. Since the output file is specified via -output option, I assumed that there
> would be a default behavior in case this is not specified, i.e. an index file
> would be created in the same location as the jar file. Is it possible to have
> this default behavior?
Users would have to do something with the generated index file and hence I thought it will be better to force them to specify it. Otherwise, I am fine to do this. If we do that I think it can be location of the jar file itself and may be the jarname appended with index could do. 
> 
> 2. There is no success message once the application is run. If I specify
> verbose option I get a 'Generating Index .... " however a similar message
> should also be there when verbose option is not specified.
I think this should behave something like javac and hence it better not generate. At the same time, I think verbose should be improved. 
> 
> 3. When an index file already exists, it is 'silently' overwritten. I suppose
> this is deliberate?
Yes and I think this is OK, again taking inspiration from javac.
Comment 21 Satyam Kandula CLA 2011-11-21 23:07:38 EST
(In reply to comment #17)
> (In reply to comment #14)
> > While discussing with Srikanth, he has pointed out that I haven't taken care of
> > reporting a build error/warning/info when the index file attribute is wrong or
> > not present or corrupt.
> > Should the classpath validator report a warning or should there be a different
> > API to tell whether the index file looks good?
> 
> A wrong index results in very bad behavior of the whole IDE and hence needs to
> be reported to the user. I'd says this should be reported as build path
> problem.
Sorry, I didn't mention that in this case the indexer will generate the index file and use it and hence there will be loss of performance but no change in search results.
Comment 22 Satyam Kandula CLA 2011-11-21 23:42:35 EST
> - Does the ant task produce individual files for each jar?  Is there any way an
> index file will belong to more than one jar?

Yes, this provides individual files for each jar. No, there is 1-1 mapping between a jar and the index file. 
> 
> - Are we expecting index files to typically be included inside the binary jars,
> in the source jars or independently?
I think this could go in the source jar or be independent but not in the binary jar. 

I haven't done yet but I am contemplating putting some signature in the index file which could tell if the index file is generated from a different jar file. If this is done, the index files cannot be part of the binary jar.  

> 
> PDE wants to address our side of this in M5.  As there won't be PDE Build
> support for index generation or creating a custom manifest header I'm not sure
> where the process for contributing will be specified.  If a custom header is
> used, when we read the bundles of the target platform we will store the
> information and later add it to the classpath entry similar to how we specify
> source or javadoc locations.
Comment 23 Srikanth Sankaran CLA 2011-11-23 04:12:53 EST
I am part way through the review and even as I go through the
rest of the files, here are some review comments for you to
chew on:

I. The changes in

org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model/JavaSearchScopeTests.java
org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model/RunJavaSearchTests.java
org.eclipse.jdt.core/META-INF/MANIFEST.MF
org.eclipse.jdt.core/antadapter/org/eclipse/jdt/internal/antadapter/messages.properties
org.eclipse.jdt.core/build.properties
org.eclipse.jdt.core/plugin.xml
org.eclipse.jdt.core/model/org/eclipse/jdt/core/IPackageFragmentRoot.java
org.eclipse.jdt.core/model/org/eclipse/jdt/core/IClasspathAttribute.java

look ok.

II. Here are some comments:

org.eclipse.jdt.core.tests.compiler/src/org/eclipse/jdt/core/tests/util/Util.java:

(1)	The current call stacks are using external paths only, but should we
conservatively include the call to path.replace('\\', '/') as done in other
Util.zipFiles methods ?

(2) What explains the absence of try-finally in this method compared to
org.eclipse.jdt.core.tests.util.Util.zip(File, String) ? 

org.eclipse.jdt.core/antadapter/org/eclipse/jdt/core/BuildJarIndex.java

(3) javadoc: introduce a blank line above @since.
(4) what is the setProject(getProject()); doing ? Is this a cut & paste issue ?

org.eclipse.jdt.core/search/org/eclipse/jdt/core/index/JavaIndexer.java
(5) Provide Javadoc (as acknowledged in TODO)

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/DefaultJavaIndexer.java

(6) FileNotFoundException ctor call missing white spaces.
(7) Is the condition in if below always true ? 

	if (org.eclipse.jdt.internal.compiler.util.Util.isClassFileName(entryDocument.getPath())) {
		new BinaryIndexer(entryDocument).indexDocument();
	}
I could not see how it can be false.
(8) Also why is one isClassFileName call fully qualified while the other is not ? 
(9) Is the field JAR_SEPARATOR really needed ?

org.eclipse.jdt.core/search/org/eclipse/jdt/core/index/JavaIndexerApplication.java
(10) Fix javadoc (as acknowledged in TODO)
(11) org.eclipse.jdt.core.index.JavaIndexerApplication.displayHelp() would
print wrong application name ?
(12) There is a mix of System.err.println and System.out.println - 
is this intentional ?
(13) Messages.CommandLineProcessing: arguments are passed in the wrong order.
Do we have tests to elicit the various messages ?
(14) CommandLineOnlyOutputError is better renamed CommandLineOnlyOneOutputError
(15) If the jar file does not exist, we will print a message but will
continue to attempt to index it ? 
(16) if (this.indexFile != null || index == argCount) {
	displayError(Messages.bind(Messages.CommandLineOnlyOutputError));
	return false;
     }
block would generate "Only one jar file need to be specified" if
-output is the last token in the command line ?

org.eclipse.jdt.core/search/org/eclipse/jdt/core/index/messages.properties
(17) "need to be specified" should be "needs to be specified" (in 2 places)

org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/PackageFragmentRoot.java
(18) org.eclipse.jdt.core.IPackageFragmentRoot.getIndexPath() is declared to
throw JME, while org.eclipse.jdt.internal.core.PackageFragmentRoot.getIndexPath()
deliberately drops the exception. Is this intentional ?

org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/ClasspathEntry.java
(19) extra attrributes could be null ? I see some of the old code guarding for
null and at other places not - so may be it is safe.
(20) getLibraryIndexLocation: Can this silently swallow malformed urls ?
(21) Please get rid of the noise change at the end of the file.

III. I am yet to review the following files:

org.eclipse.jdt.core/model/org/eclipse/jdt/core/IJavaElementDelta.java
org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/ClasspathChange.java
org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/DeltaProcessor.java
org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/JavaElementDelta.java
org.eclipse.jdt.core/search/org/eclipse/jdt/core/search/SearchParticipant.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/DiskIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/FlatFileIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/Index.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/IndexLocation.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/JarEntryIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/IndexSelector.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/JavaSearchParticipant.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/PatternSearchJob.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/AddJarFileToIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexAllProject.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexBinaryFolder.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexManager.java
org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model/JavaIndexTests.java

After I finish all, I'll share overall comments.
Comment 24 Srikanth Sankaran CLA 2011-11-24 04:57:29 EST
Hi Satyam,

Here is another of comments:

These files:
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexBinaryFolder.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexAllProject.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/IndexLocation.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/JarEntryIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/FlatFileIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/core/search/SearchParticipant.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/JavaSearchParticipant.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/IndexSelector.java

look ok.


Comments:
---------
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/AddJarFileToIndex.java
(22) org.eclipse.jdt.internal.core.search.indexing.AddJarFileToIndex.execute(IProgressMonitor)
when we fall back and regenerate the index, we do it silently, should be report/log here ?

org.eclipse.jdt.core/model/org/eclipse/jdt/core/IJavaElementDelta.java
org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/JavaElementDelta.java
(23) Do we need these new deltas at all ? Who would be the listeners to
these "index attached" and "index detached" deltas and what do we think
they will do with these ? 


I still need to review the following:

org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/ClasspathChange.java
org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/DeltaProcessor.java

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/DiskIndex.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/Index.java

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/PatternSearchJob.java
org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexManager.java
org.eclipse.jdt.core.tests.model/src/org/eclipse/jdt/core/tests/model/JavaIndexTests.java

(a) I am still trying to wrap my hands around this functionality and don't
have a full picture yet. After I finish with the remaining 7 files, I may
be in a better position (the caveat still being there that I am not familiar
with this area of JDT/Core)

(b) I think without delay, we should work towards validating the performance
objectives - Can you work with the requesting project so that you can gather
some numbers on their application so we know where we stand.
Comment 25 Satyam Kandula CLA 2011-11-24 05:21:57 EST
(In reply to comment #24)
> (b) I think without delay, we should work towards validating the performance
> objectives - Can you work with the requesting project so that you can gather
> some numbers on their application so we know where we stand.
Grant, is it possible for you to use this patch and see it works and the performance is good?
Comment 26 Srikanth Sankaran CLA 2011-11-24 06:17:03 EST
(In reply to comment #25)
> (In reply to comment #24)
> > (b) I think without delay, we should work towards validating the performance
> > objectives - Can you work with the requesting project so that you can gather
> > some numbers on their application so we know where we stand.
> Grant, is it possible for you to use this patch and see it works and the
> performance is good?

Please ask for any help you may need, Grant, Thanks for your help with this
exercise.

Independently, Satyam and I will put together some benchmarking scenarios
to see what the performance numbers look like. In particular, the current
implementation supports multiple packaging/shipping/distribution models.
While that may offer convenience to consumers of this capability, it is
not clear that the performance characteristics would be appealing across
the solutions: For example, we can support a zip file of index files, but
does it make sense from a performance point of view which after all is the
raison_d'être for this exercise, or are we better off generating the indexes
in the usual way. So, before we go any further, we need some data collected.
Stay tuned.
Comment 27 Srikanth Sankaran CLA 2011-11-25 07:23:57 EST
Here is the next batch of line level review comments:

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/Index.java
Looks ok.

org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/DeltaProcessor.java

(24) Third diff in this file reads:

URL indexLocation = ((ClasspathEntry)entries[j]).getLibraryIndexLocation();
if (indexLocation != null) { // force reindexing, this could be faster rather than maintaining the list
	this.manager.indexManager.indexLibrary(entryPath, project.getProject(), ((ClasspathEntry)entries[j]).getLibraryIndexLocation());
}

Eliminate the latter call to getLibraryIndexLocation by reusing the result of the earlier call.

(25) 5th diff - Remove extra white spaces.

org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/ClasspathChange.java

(26) Second diff marked "TODO SATYAM: Shouldn't the check be opposite?"
I agree this existing code looks suspicious. Please work with Jay to
understand this issue and raise a separate defects as needed.

(27) It is better to extract a local for the expression
(ClasspathEntry)newResolvedClasspath[i]) to avoid repeared array indexing
and dynamic casts.

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/PatternSearchJob.java
(28) Fix white space: length =paths.length; should be length = paths.length;

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/index/DiskIndex.java
(29) org.eclipse.jdt.internal.core.index.DiskIndex.readHeaderInfo(InputStream)
Line 802 changed from 
	if (size > fileLength) {
to Line 808 after the patch which reads:
	if (length != -1 && this.numberOfChunks > length) {

This certainly looks like a bug. (should have been (size > length))

org.eclipse.jdt.core/search/org/eclipse/jdt/internal/core/search/indexing/IndexManager.java
(30) Line 258 reads: 

    if (currentIndexState != REBUILDING_STATE || currentIndexState != REUSE_STATE) { // rebuild index if existing file is corrupt, unless the index is already being rebuilt

This looks wrong, as one or the other constituent conditions will always be true
Should this be:

    if (currentIndexState != REBUILDING_STATE && currentIndexState != REUSE_STATE) { // rebuild index if existing file is corrupt, unless the index is already being rebuilt

(31) In org.eclipse.jdt.internal.core.search.indexing.IndexManager.removeIndexPath(IPath)
a call to indexFile.exists() seems to have been deleted. Is this inadvertant or intentional ?

(32) The last change in org.eclipse.jdt.internal.core.search.indexing.IndexManager.removeIndexPath(IPath)
looks structurally different from the version prior to patch - is this intentional ?

(33) org.eclipse.jdt.internal.core.search.indexing.IndexManager.scheduleDocumentIndexing(SearchDocument, IPath, IndexLocation, SearchParticipant)
Fold the two lines:

Index index;
index = getIndex(this.containerPath, indexLocation, true, /*reuse index file*/ true /*create if none*/);

into one line as before.
Comment 28 Srikanth Sankaran CLA 2011-11-25 07:26:13 EST
I have only one file JavaIndexTests.java which has all the new tests
for this feature left. After I finish with that early next week, I'll
summarize my overall views/observations.
Comment 29 Deepak Azad CLA 2011-11-25 07:35:18 EST
(In reply to comment #27)
> (25) 5th diff - Remove extra white spaces.
> 
> org.eclipse.jdt.core/model/org/eclipse/jdt/internal/core/ClasspathChange.java
> (33)
> org.eclipse.jdt.internal.core.search.indexing.IndexManager.scheduleDocumentIndexing(SearchDocument,
> IPath, IndexLocation, SearchParticipant)
> Fold the two lines:
> 
> Index index;
> index = getIndex(this.containerPath, indexLocation, true, /*reuse index file*/
> true /*create if none*/);
> 
> into one line as before.

Don't you guys use common formatter settings and have Save Actions enabled to auto format edited lines? Just saying...
Comment 30 Satyam Kandula CLA 2011-11-25 11:41:58 EST
Srikanth, 
Thanks for your detailed review. I agree with many of the comments. 
(In reply to comment #23)
> (9) Is the field JAR_SEPARATOR really needed ?
It is not required in this path, but I think it is better to keep.


> (12) There is a mix of System.err.println and System.out.println - 
> is this intentional ?
 What should be the right thing to do? Should error's go in System.err and messages go in System.out? Should everything go in System.out?


> (13) Messages.CommandLineProcessing: arguments are passed in the wrong order.
Actually this is correct except the order is confusing. I have fixed this though. 


> (18) org.eclipse.jdt.core.IPackageFragmentRoot.getIndexPath() is declared to
> throw JME, while
> org.eclipse.jdt.internal.core.PackageFragmentRoot.getIndexPath()
> deliberately drops the exception. Is this intentional ?
Not intentional, changed at one place but forgot the other place. 

> (20) getLibraryIndexLocation: Can this silently swallow malformed urls ?
This is internal function and it better swallow. Will try to add some verbose. 

> (23) Do we need these new deltas at all ? Who would be the listeners to
> these "index attached" and "index detached" deltas and what do we think
> they will do with these ? 
We probably don't need the deltas. Removing them as of now. Will add when we feel necessary. 

> 
> (31) In
> org.eclipse.jdt.internal.core.search.indexing.IndexManager.removeIndexPath(IPath)
> a call to indexFile.exists() seems to have been deleted. Is this inadvertant or
> intentional ?
Yes it is intentional, because exists() is costly for the one in jar. and probably not necessary too. 

> 
> (32) The last change in
> org.eclipse.jdt.internal.core.search.indexing.IndexManager.removeIndexPath(IPath)
> looks structurally different from the version prior to patch - is this
> intentional ?
Yes, the earlier code is not correct and hence.
Comment 31 Satyam Kandula CLA 2011-11-25 11:45:18 EST
(In reply to comment #26)
> Independently, Satyam and I will put together some benchmarking scenarios
> to see what the performance numbers look like. In particular, the current
> implementation supports multiple packaging/shipping/distribution models.
> While that may offer convenience to consumers of this capability, it is
> not clear that the performance characteristics would be appealing across
> the solutions: For example, we can support a zip file of index files, but
> does it make sense from a performance point of view which after all is the
> raison_d'être for this exercise, or are we better off generating the indexes
> in the usual way. So, before we go any further, we need some data collected.
> Stay tuned.
I have tried some tests which seem to tell that there is no performance impact. I still need to run some more tests to confirm that.
Comment 32 Srikanth Sankaran CLA 2011-11-25 18:36:20 EST
(In reply to comment #30)

> > (12) There is a mix of System.err.println and System.out.println - 
> > is this intentional ?
>  What should be the right thing to do? Should error's go in System.err and
> messages go in System.out? Should everything go in System.out?

Any reasoned approach (may be the current one already is) as opposed to 
inadvertent mixed usage is fine.
Comment 33 Srikanth Sankaran CLA 2011-11-25 18:42:53 EST
(In reply to comment #31)

[...]

> I have tried some tests which seem to tell that there is no performance impact.
> I still need to run some more tests to confirm that.

While initial experiments show that there is no search performance
degradation while using a jar of index files, just to convince
ourselves we are not violating some laws of thermodynamics or laws
against time travel and such, we will work with the requesting
project to get some voluminous jars and devise some searches that
would use index data from the beginning, middle and end of the zipped
stream to gauge the best case, average case and worst case times and
compare the times against the search done against a plain vanilla index
file.
Comment 34 Srikanth Sankaran CLA 2011-11-25 18:53:29 EST
Deepak, I presume some high level operations in the UI such as refactoring,
result in multiple searches being performed for a single high level operation.

Can you suggest some suitable operation that would result in a stream of
searches being done against a binary package fragment root, so that it
can be used as a benchmarking scenario ?
Comment 35 Srikanth Sankaran CLA 2011-11-25 19:22:40 EST
(In reply to comment #5)

> So the "index_location" should either support specifying the location relative
> (eg relative to the plugin root, or relative to the JAR being indexed; Javadocs
> would need to specify what it is relative to) or some sort of variable
> substitution mechanism is needed.

Satyam, was this part addressed in the patch ? The javadoc isn't for sure and
I don't recall whether the implementation supports relative location specification.
Comment 36 Srikanth Sankaran CLA 2011-11-25 19:43:34 EST
(In reply to comment #31)

> I have tried some tests which seem to tell that there is no performance impact.
> I still need to run some more tests to confirm that.

I was under the assumption that jar file automatically implies compressed data,
turns out this is not true. So in these experiments, was there compression involved ? In our experiments, let us make sure we include both the compressed
and uncompressed archives.
Comment 37 Srikanth Sankaran CLA 2011-11-27 23:14:41 EST
The current plan is to complete the review, testing and performance
characterizations over the next two weeks and release this for the
first integration build past M4. 

Changing target milestone accordingly.
Comment 38 Srikanth Sankaran CLA 2011-11-27 23:59:14 EST
Comments on JavaIndexTests:

(34) Comment above testDeleteIndexedFile() is not clear about what it
is supposed to test.
(35) testNonExistentIndex() : In these kind of scenarios, what should
be our strategy ? should we issue a warning/info message that the index
file referenced by the class path entry is missing ? Should we log this ?
Currently we are silently recovering - is this what we want ? 
(36) Fix comment above testDeleteProject()
(37) In this file and in all files in general, complete TODO items.

That completes the line level review.
Comment 39 Srikanth Sankaran CLA 2011-11-28 01:44:47 EST
Overall, I didn't find any major issues as such, here are some observations:

(1) We should make very careful performance characterizations for the case
where the index file appears as a compressed entity in a jar file to make sure
that search performance is not degraded. We should devise experiments against
large, real life jars (Grant is helping here - Thanks) that "touch" various
parts of the jar to make assessments of average and worst case measurements.

(2) The experiments should also involve high level operations that translate
into a stream of consults against the compressed index file and preferably
touch different parts of the index file.

(3) It appears the model where the index files appear as entries in a 
folder would be the best case win-win situation from a indexing and searching
point of view. But supporting only this would mean we don't enable some
natural use cases such as index files being packaged inside the source jar
or the binary jar itself. I would like to hear what the stake holders think
about supporting such a folder only model. (Of course this becomes a moot
point if experiments show there is no major degradation)

(4) The code change is fairly complex with the meat of the implementation
interspersed with "supporting changes" which resulted from some types being
refactored (IPath -> IndexLocation, FileInputStream -> InputStream). These
supporting changes are so pervasive that the signal to noise ratio is adversely
impacted and it is difficult to form a clear opinion about the correctness
and completeness about the implementation.

(5) It appears much of the implementation complexity arises from supporting
index files inside jars. If we do go ahead with the full scope of what has
been prototyped, I would recommend splitting the patch into core + supporting
changes + tests so that just the core set of changes can be reviewed quickly
in another pass.

(6) Should the class path validation step validate the existence of the
index file and issue info/warning if something is amiss.

(7) What about the cases where the index file generation/version information
is different from the one supported by JDT/Core ?
Comment 40 Satyam Kandula CLA 2011-11-29 06:20:43 EST
(In reply to comment #35)
> (In reply to comment #5)
> 
> > So the "index_location" should either support specifying the location relative
> > (eg relative to the plugin root, or relative to the JAR being indexed; Javadocs
> > would need to specify what it is relative to) or some sort of variable
> > substitution mechanism is needed.
> 
> Satyam, was this part addressed in the patch ? The javadoc isn't for sure and
> I don't recall whether the implementation supports relative location
> specification.
As it supports the platform url specification, relative to plugin root and  relative to project will be supported, but not relative to an external jar.
Comment 41 Srikanth Sankaran CLA 2011-11-29 21:45:23 EST
(In reply to comment #39)

> (3) It appears the model where the index files appear as entries in a 
> folder would be the best case win-win situation from a indexing and searching
> point of view. But supporting only this would mean we don't enable some
> natural use cases such as index files being packaged inside the source jar
> or the binary jar itself. I would like to hear what the stake holders think
> about supporting such a folder only model. (Of course this becomes a moot
> point if experiments show there is no major degradation)

[...]

> (5) It appears much of the implementation complexity arises from supporting
> index files inside jars.

[...]

In informal chat with the requesting project (represented by Grant), it emerged
that for their project, they would only be shipping index files as folder
entries i.e not inside jars.

Grant, for posterity's sake, please confirm/record here your project's preferences
and the rationale.

Dani, Markus, Curtis (even as we wait for performance data), any strong
feelings , for/against jars with compressed index files as constituents ?

We need to finalize on this by this week preferably.
Comment 42 Satyam Kandula CLA 2011-11-30 01:06:28 EST
(In reply to comment #33)
> While initial experiments show that there is no search performance
> degradation while using a jar of index files, just to convince
> ourselves we are not violating some laws of thermodynamics or laws
> against time travel and such, we will work with the requesting
> project to get some voluminous jars and devise some searches that
> would use index data from the beginning, middle and end of the zipped
> stream to gauge the best case, average case and worst case times and
> compare the times against the search done against a plain vanilla index
> file.
I have run the tests with some big jars and could not see any degradation or improvement by using the jar for index files. While trying to understand the rationale behind this, I found that the compressed jars will need lesser IO and hence reading could be faster. Well, I could not see a performance benefit either :(. However, this explains that there should not be a loss in performance.
Comment 43 Satyam Kandula CLA 2011-11-30 01:08:00 EST
As we think we can keep the performance issue at bay, I could see one more point to consider the decision. 
 Shipping Indexes in jar could be easier and could take lesser space. For eg: An index file for a big jar takes around 12MB, when compressed in a jar takes only 5MB. This probably is not an issue as generally the whole product will be zipped.
Comment 44 Srikanth Sankaran CLA 2011-11-30 01:40:09 EST
(In reply to comment #42)
> (In reply to comment #33)
> > While initial experiments show that there is no search performance
> > degradation while using a jar of index files, just to convince
> > ourselves we are not violating some laws of thermodynamics or laws
> > against time travel and such, we will work with the requesting
> > project to get some voluminous jars and devise some searches that
> > would use index data from the beginning, middle and end of the zipped
> > stream to gauge the best case, average case and worst case times and
> > compare the times against the search done against a plain vanilla index
> > file.
> I have run the tests with some big jars and could not see any degradation or
> improvement by using the jar for index files. While trying to understand the
> rationale behind this, I found that the compressed jars will need lesser IO and
> hence reading could be faster. Well, I could not see a performance benefit
> either :(. However, this explains that there should not be a loss in
> performance.

There seems to be quite a bit of anecdotal evidence if you go by various
forums, community discussion boards etc to suggest that compressed jars need
not imply performance degradation and based on the CPU, I/O and other
configuration parameters may actually translate into performance improvement.
See the thread http://dev.eclipse.org/mhonarc/lists/platform-core-dev/msg00691.html for some historic discussion on this topic in eclipse'
very own mailing lists. Will dig up the references cited there and see what
comes up.
Comment 45 Dani Megert CLA 2011-11-30 04:39:46 EST
> Dani, Markus, Curtis (even as we wait for performance data), any strong
> feelings , for/against jars with compressed index files as constituents ?

I think we should support to ship it as archive. If it's easier for us to process a folder, we can still extract it to a folder on first access.
Comment 46 Srikanth Sankaran CLA 2011-11-30 06:21:17 EST
(In reply to comment #44)
> (In reply to comment #42)
> > (In reply to comment #33)

[...]

> > hence reading could be faster. Well, I could not see a performance benefit
> > either :(. However, this explains that there should not be a loss in
> > performance.

[...]

> very own mailing lists. Will dig up the references cited there and see what
> comes up.

I dug up my copy of "Java Performance Tuning" by Jack Shirazi- there isn't a
whole lot more than what we have discussed here: basically improvements result
from the need for several open/close operations being obviated.

Also the point Satyam made earlier about volume of I/O being reduced due to
compression also helps some cases based on configuration.

In summary, our experiments so far show that compressed indexes inside jars
can be viably supported.
Comment 47 Grant Taylor CLA 2011-11-30 09:15:39 EST
>>Grant, for posterity's sake, please confirm/record here your project's preferences and the rationale.

Our product primarily wants to use this feature for our server target classpath container.  The jars for this container come from the a different team than the Eclipse tooling.  We have two cases as well: the server target classpath container may point to a real local server, or "stubs" that represent the APIs of a server.  In both cases, we don't want to change the actual jars.  For the stubs, the reason is mainly organizational.  A different team creates these and they are not responsible for things like tooling performance.  For a stand-alone server, packing indexes in the Jars would cause every server install to be larger, even if it's not being pointed to by our tooling.

Hence, our usage of this feature will be to pre-create the indexes for the known targets (stubs, server v1.0, server, v1.1, etc) and add register the index files with the classpath container.
Comment 48 Satyam Kandula CLA 2011-11-30 09:38:17 EST
(In reply to comment #47)
> >>Grant, for posterity's sake, please confirm/record here your project's preferences and the rationale.
> 
> Our product primarily wants to use this feature for our server target classpath
> container.  The jars for this container come from the a different team than the
> Eclipse tooling.  We have two cases as well: the server target classpath
> container may point to a real local server, or "stubs" that represent the APIs
> of a server.  In both cases, we don't want to change the actual jars.  For the
> stubs, the reason is mainly organizational.  A different team creates these and
> they are not responsible for things like tooling performance.  For a
> stand-alone server, packing indexes in the Jars would cause every server
> install to be larger, even if it's not being pointed to by our tooling.
> 
> Hence, our usage of this feature will be to pre-create the indexes for the
> known targets (stubs, server v1.0, server, v1.1, etc) and add register the
> index files with the classpath container.
Grant, Thanks for your reply. I think there is a slight misunderstanding. To make sure we are on in the same page, we don't mean the index should go in to the actual jar, but we do mean that the indexes could be packaged into any jar, may be a special jar which could just include a group of indexes.
Comment 49 Grant Taylor CLA 2011-11-30 09:50:26 EST
Ok, I understand.  So my comment, obviously, was that we don't want the indexes packaged with the jars that contain the indexes classes.

As for flat files vs jars for separately packaged indexes, I would say that ideally we'd package the flat files in a jar'd plugin.  I'm not sure if this would be supported.  If not, I don't think it's a big deal for us to ship a new, unjar'd plugin that only contains the index files (jar'd or flat).
Comment 50 Satyam Kandula CLA 2011-12-02 10:11:08 EST
Created attachment 207847 [details]
Patch with core changes (without index in jar support)

Srikanth, Here is the first part of the patch. This patch is without the support for the index in a jar.
  - Incorporated much of the review components
  - Removed the getIndexPath() from IPackageFragmentRoot. Basically there is no API as of now. I will incorporate it anybody wants it.
  - This patch doesn't validate the class path as such. Filed bug 365446 to take care of this part.
Comment 51 Srikanth Sankaran CLA 2011-12-04 23:40:27 EST
(In reply to comment #13)
> (In reply to comment #11)
> > (In reply to comment #10)
> > I still have to improve in the Javadocs and comments. Jay, please review.
> 
> All stake holders, you are welcome for an early test drive even as this patch
> is being reviewed and tested from our side. Let us know of any issues asap,
> TIA.

We have confirmation from the requesting team that they have tested the patch
in both the indexes-in-a-folder-mode and compressed-indexes-in-a-jar mode
and that performance improvements are inline with expectations.

So this feature looks ready for release for next week's integration build
after M4 is out of the way.
Comment 52 Satyam Kandula CLA 2011-12-07 23:47:08 EST
Created attachment 208084 [details]
Patch for supporting the index file in a jar

Here is the second part of the jar which has support for the index file to be in the jar.
Comment 53 Srikanth Sankaran CLA 2011-12-08 08:26:35 EST
(In reply to comment #50)
> Created attachment 207847 [details]
> Patch with core changes (without index in jar support)
> 
> Srikanth, Here is the first part of the patch. This patch is without the
> support for the index in a jar.
>   - Incorporated much of the review components
>   - Removed the getIndexPath() from IPackageFragmentRoot. Basically there is no
> API as of now. I will incorporate it anybody wants it.
>   - This patch doesn't validate the class path as such. Filed bug 365446 to
> take care of this part.

Satyam, thanks a lot for breaking down the patches - it made the job of
reviewing these core set of changes a lot easier.

The changes look good.

One comment: Could you restore your first cut of changes to
org.eclipse.jdt.internal.core.ClasspathChange.requestIndexing() from
the https://bugs.eclipse.org/bugs/attachment.cgi?id=207155 rather
than go with the revised version you have in this patch.

Your first cut changes to this method are much more readable
and provable to a reviewer's satisfaction than this latest
version.

(Remember to get rid of the TODO of course.)
Comment 54 Srikanth Sankaran CLA 2011-12-08 22:51:12 EST
(In reply to comment #51)
> (In reply to comment #13)
> > (In reply to comment #11)
> > > (In reply to comment #10)
> > > I still have to improve in the Javadocs and comments. Jay, please review.
> > 
> > All stake holders, you are welcome for an early test drive even as this patch
> > is being reviewed and tested from our side. Let us know of any issues asap,
> > TIA.
> 
> We have confirmation from the requesting team that they have tested the patch
> in both the indexes-in-a-folder-mode and compressed-indexes-in-a-jar mode
> and that performance improvements are inline with expectations.
> 
> So this feature looks ready for release for next week's integration build
> after M4 is out of the way.

Satyam, we should ask the requesting project to not tear down any setup they 
have invested in to validate this, as we will ask for a revalidation after it
released and an integration build becomes available.
Comment 55 Srikanth Sankaran CLA 2011-12-08 22:57:22 EST
(In reply to comment #53)

> The changes look good.

A further editorial comment that we should not piggy back fixes for unconnected
issues discovered during implementation.

The current patch has 3-4 such fixes. It is terrific that these issues are being
discovered, but let us keep the separate concerns separate in future to simplify
the review process as well future exercise in software archaeology.
Comment 56 Srikanth Sankaran CLA 2011-12-09 08:17:15 EST
(In reply to comment #52)
> Created attachment 208084 [details]
> Patch for supporting the index file in a jar
> 
> Here is the second part of the jar which has support for the index file to be
> in the jar.

Thanks, this patch looks good. After incorporating the comment about 
requestIndexing (see comment#53) and retesting, let us target to release
it early next week.
Comment 57 Satyam Kandula CLA 2011-12-12 07:29:21 EST
Released on HEAD through commit 47da1e1f50b1360a7b560d74dfe1c60303fec552
Comment 58 Dani Megert CLA 2012-01-12 04:04:38 EST
Do we have some example JARs and indices somewhere? If not, please create some and attach them here so that the UI can be tested.
Comment 59 Dani Megert CLA 2012-01-12 04:05:30 EST
(In reply to comment #58)
> Do we have some example JARs and indices somewhere? If not, please create some
> and attach them here so that the UI can be tested.

Sorry, this was meant for bug 364287.
Comment 60 Ayushman Jain CLA 2012-01-25 08:37:44 EST
Verified for 3.8M5 using build I20120122-2000
Comment 61 Srikanth Sankaran CLA 2012-01-31 05:39:17 EST
Grant et al, this feature has been released in 3.8 M5 available
at http://download.eclipse.org/eclipse/downloads/drops/S-3.8M5-201201251800/index.php.

Thanks for testing this from an end user standpoint. Please raise
fresh defects for any issues found. Thanks.