Bug 302850 - 13 failures in JavaModel tests for the N20100214-2000 Mac OS X - Cocoa test machine
Summary: 13 failures in JavaModel tests for the N20100214-2000 Mac OS X - Cocoa test m...
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.6   Edit
Hardware: Macintosh Mac OS X
: P3 normal (vote)
Target Milestone: 3.7.2   Edit
Assignee: Satyam Kandula CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-15 07:43 EST by Frederic Fusier CLA
Modified: 2012-02-18 08:05 EST (History)
7 users (show)

See Also:
stephan.herrmann: review+


Attachments
Test patch (2.51 KB, text/plain)
2010-04-07 06:42 EDT, Satyam Kandula CLA
no flags Details
A Patch (941 bytes, patch)
2010-10-26 10:51 EDT, Satyam Kandula CLA
no flags Details | Diff
more test harness (2.70 KB, patch)
2011-03-31 15:16 EDT, Stephan Herrmann CLA
no flags Details | Diff
slightly improved test harness (3.61 KB, patch)
2011-04-03 15:22 EDT, Stephan Herrmann CLA
no flags Details | Diff
console log (45.21 KB, text/plain)
2011-12-15 08:36 EST, Stephan Herrmann CLA
no flags Details
Patch (15.93 KB, patch)
2011-12-20 09:10 EST, Satyam Kandula CLA
no flags Details | Diff
Patch for 3.7 maintenance (17.77 KB, patch)
2012-01-05 05:48 EST, Satyam Kandula CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frederic Fusier CLA 2010-02-15 07:43:19 EST
There are 13 failures in JavaModel tests for the N20100214-2000 Mac OS X - Cocoa test machine.

All these failures are due to a change in the JavaCore options but the first test on which it occurs does not change them...!?
Comment 1 Stephan Herrmann CLA 2010-02-15 08:12:53 EST
I've seen this sort of error intermittently running the tests on Linux.
While it's good for me to see that it occurs in official builds, too,
I unfortunately have no clues to contribute about what makes this
failure happen. Only: any time the problem occurred it vanished on
a subsequent test run without any source modifications.

I remember seeing changes wrt compliance levels, which is different
from the failures you are seeing, but still the overall symptom is
the same.

Weird, indeed.
Comment 2 Frederic Fusier CLA 2010-02-15 11:57:05 EST
(In reply to comment #1)
> I've seen this sort of error intermittently running the tests on Linux.
> While it's good for me to see that it occurs in official builds, too,
> I unfortunately have no clues to contribute about what makes this
> failure happen. Only: any time the problem occurred it vanished on
> a subsequent test run without any source modifications.
> 
> I remember seeing changes wrt compliance levels, which is different
> from the failures you are seeing, but still the overall symptom is
> the same.
> 
> Weird, indeed.

Thanks for the feedback Stephen. Did you notice whether the first of the failures you got occurred on the same offending test (i.e. testBug286379c() of JavaSearchBugTests) or not?
Comment 3 Stephan Herrmann CLA 2010-02-15 13:24:32 EST
(In reply to comment #2)
> Thanks for the feedback Stephen. Did you notice whether the first of the
> failures you got occurred on the same offending test (i.e. testBug286379c() of
> JavaSearchBugTests) or not?

Sorry, I have no notes besides my small little brain. But I can't recall
this particular test to play a role in this. I've also seen this with
much bigger effect like a whole test suite failing with just this one
error. To me it felt more like some kind of force majeure. I could never
recognise a pattern behind it.
Sometimes it occurred several times in a row, but if anything that I did
fixed the problem it was like in a completely remote corner of the code.
Some build weirdness or that kind. It's probably Fantômas :)

I never reported this because I had no idea how to ever reproduce.

Do you have the logs of that failed test run? Build log? Workbench log?
Comment 4 Olivier Thomann CLA 2010-03-03 10:08:01 EST
Occurred again in I20100302-0800.
Comment 5 Frederic Fusier CLA 2010-03-03 10:32:39 EST
(In reply to comment #4)
> Occurred again in I20100302-0800.

The failures occurred at the same test (with which we already got troubles to stabilize...). I'm investigating...
Comment 6 Satyam Kandula CLA 2010-04-07 06:42:53 EDT
Created attachment 164028 [details]
Test patch

The test is failing some times and not be able to reproduce nor able to find any clues. Hence, this patch is to print some messages to enable some progress.
Comment 7 Satyam Kandula CLA 2010-04-07 06:45:19 EDT
Jay, 
Can you apply this patch to enable making some progress on this test failure?
Comment 8 Frederic Fusier CLA 2010-06-28 09:15:06 EDT
Occurred again in N20100627-2000. The additional output does not show any changes in the JavaCore options along the test:
=========================================================================
Starting test JavaSearchBugTests.testBug286379c()...
	- Options at test start:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
	- Options before first exit:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
	- Options after first restart:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
	- Options before second exit:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
	- Options after second restart:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
	- Options after search:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
	- Options at test end:
		+ Task tags:           TODO,FIXME,XXX
		+ Task priorities:     NORMAL,HIGH,NORMAL
		+ Forbidden reference: error
=========================================================================

Hence, I'm still unable to figure out why the test of options during the tear down fails... :-(

I'm adding some other debug outputs to try to catch the origin of this issue...
Comment 9 Stephan Herrmann CLA 2010-07-10 08:04:17 EDT
I saw it again in our build. Unfortunately I'm still using v_A58 of the
tests so I didn't have your additional harness mentioned in comment 8.

However, inspecting the code I wonder: maybe JavaCore.getOption(String)
is right all the time but only JavaCore.getOptions() is at fault.
Note that a single option is always retrieved from the preference
service, wheras getOptions() may just use this.optionsCache.
Could that be the culprit, i.e., the optionsCache being corrupt?

E.g., JavaModelManager.setOptions(Hashtable) could see a
BackingStoreException in instancePreferences.flush(). At that point
the preferences are already changed but the optionsCache is not.
For now I'll just add e.printStackTrace() to setOptions, let's see.
Comment 10 Olivier Thomann CLA 2010-07-21 09:56:19 EDT
(In reply to comment #9)
> E.g., JavaModelManager.setOptions(Hashtable) could see a
> BackingStoreException in instancePreferences.flush(). At that point
> the preferences are already changed but the optionsCache is not.
> For now I'll just add e.printStackTrace() to setOptions, let's see.
Indeed, if there is a BackingStoreException, the cached value is not in synch anymore.
I'll use the try/catch statement only around the statements that can throw such exception.
Let's see if we get it again.
Comment 11 Stephan Herrmann CLA 2010-07-24 18:00:40 EDT
Here's *some* data at last: I changed the debug print statements from
  JavaCore.getOption(JavaCore.COMPILER_TASK_TAGS)
to
  JavaCore.getOptions().get(JavaCore.COMPILER_TASK_TAGS)
in order to test the optionsCache. Resulting output 
(w/o the additional prints from comment 8, since I'm in v_A58):

Starting test JavaSearchBugTests.testBug286379c()...
        - Options at test start:
                + Task tags:           TODO,FIXME,XXX
                + Task priorities:     NORMAL,HIGH,NORMAL
                + Forbidden reference: error
        - Options before first exit:
                + Task tags:           TODO,FIXME,XXX
                + Task priorities:     NORMAL,HIGH,NORMAL
                + Forbidden reference: error
        - Options after first restart:
                + Task tags:           TODO,FIXME,XXX
                + Task priorities:     NORMAL,HIGH,NORMAL
                + Forbidden reference: error
        - Options before second exit:
                + Task tags:           TODO,FIXME,XXX
                + Task priorities:     NORMAL,HIGH,NORMAL
                + Forbidden reference: error
        - Options after second restart:
                + Task tags:           null
                + Task priorities:     null
                + Forbidden reference: null
        - Options after search:
                + Task tags:           null
                + Task priorities:     null
                + Forbidden reference: null
        - Options at test end:
                + Task tags:           null
                + Task priorities:     null
                + Forbidden reference: null

At least we see when it happens, but it's the same simulated exit-start
sequence that a few lines earlier actually succeeded.

So what could possibly break getOptions() that was not detectable when
using getOption(String)?? After I suspected setOptions(Hashtable) to bail out
I added e.printStacktrace() statements for all relevant exceptions,
which however were not triggered in this test run.

So, I don't really have a theory at this point. Since I'm seeing the
error fairly often these days I might as well insert more debug statements.
Any suggestions what and where? At least we can now focus on this sequence:
  		simulateExit();
		simulateRestart();
Any idea what is suspecious here?
Comment 12 Satyam Kandula CLA 2010-10-26 03:58:43 EDT
This happened again with build I20101025-1300.
Comment 13 Stephan Herrmann CLA 2010-10-26 06:42:52 EDT
(In reply to comment #12)
> This happened again with build I20101025-1300.

Do we have the console output from the test run (where should I look for it?)?

Here's a new theory: it seems that JavaModelManager.optionsCache is 
corrupted somehow, but we could find neither an exception nor a concurrency
issue that would trigger corruption. All access to this field ensure:
 - it is either null or a fully initialized Hashtable
 - it is never passed directly to clients, only copies are.
One exception is in these lines:
	if (!Platform.isRunning()) {
		return this.optionsCache = getDefaultOptionsNoInitialization();
	}

Should we change this to 
	if (!Platform.isRunning()) {
		this.optionsCache = getDefaultOptionsNoInitialization();
		return new Hashtable(this.optionsCache);
	}
?

I'm not sure whether tests can run into the !Platform.isRunning() branch.
Comment 14 Satyam Kandula CLA 2010-10-26 09:37:24 EDT
(In reply to comment #13)
> Do we have the console output from the test run (where should I look for it?)?

The console output is http://fullmoon.ottawa.ibm.com/downloads/drops/I20101025-1300/testresults/macosx.cocoa.x86_5.0/org.eclipse.jdt.core.tests.model.AllJavaModelTests.txt
 
> Here's a new theory: it seems that JavaModelManager.optionsCache is 
> corrupted somehow, but we could find neither an exception nor a concurrency
> issue that would trigger corruption. All access to this field ensure:
>  - it is either null or a fully initialized Hashtable
>  - it is never passed directly to clients, only copies are.
> One exception is in these lines:
>     if (!Platform.isRunning()) {
>         return this.optionsCache = getDefaultOptionsNoInitialization();
>     }
> 
> Should we change this to 
>     if (!Platform.isRunning()) {
>         this.optionsCache = getDefaultOptionsNoInitialization();
>         return new Hashtable(this.optionsCache);
>     }
> ?
> 
> I'm not sure whether tests can run into the !Platform.isRunning() branch.
I think this should fix the problem. I couldn't think of anything either.
Comment 15 Satyam Kandula CLA 2010-10-26 09:40:12 EDT
(In reply to comment #13)
> One exception is in these lines:
>     if (!Platform.isRunning()) {
>         return this.optionsCache = getDefaultOptionsNoInitialization();
>     }
> 
> Should we change this to 
>     if (!Platform.isRunning()) {
>         this.optionsCache = getDefaultOptionsNoInitialization();
>         return new Hashtable(this.optionsCache);
>     }
Olivier, 
I believe this change could help fix the test. This change is harmless. Can we try this in M3?
Comment 16 Olivier Thomann CLA 2010-10-26 09:44:03 EDT
(In reply to comment #15)
> Olivier, 
> I believe this change could help fix the test. This change is harmless. Can we
> try this in M3?
Sure we release this for consistency, but it won't change anything as the tests are always run with the running platform.
Please proceed and release that change for M3.
We will recontribute anyway.
Comment 17 Satyam Kandula CLA 2010-10-26 10:51:05 EDT
Created attachment 181736 [details]
A Patch

Put Stephen's code snippet as specified in comment 13 in this patch.
Comment 18 Satyam Kandula CLA 2010-10-26 10:51:54 EDT
(In reply to comment #17)
> Created an attachment (id=181736) [details]
> A Patch 
> 
> Put Stephen's code snippet as specified in comment 13 in this patch.
Released this patch on HEAD for 3.7M3. 
Not resolving this bug as we are not sure this fixes.
Comment 19 Frederic Fusier CLA 2010-12-27 04:51:29 EST
Satyam,

Unfortunately, I won't have time to look at this issue, hence I assign it to you...
Comment 20 Satyam Kandula CLA 2011-03-17 01:18:39 EDT
(In reply to comment #18)
> Released this patch on HEAD for 3.7M3. 
> Not resolving this bug as we are not sure this fixes.
This patch hasn't fixed the problem. The test failed again on N20110315-2000 build.
Comment 21 Olivier Thomann CLA 2011-03-28 12:02:16 EDT
Failed again in N20110327-2000.
We need to understand what is going on with these failures. Why on Mac OS only ?
Seems to be related to a timing issue, but looking at the code I don't see anything obvious.
Comment 22 Stephan Herrmann CLA 2011-03-31 15:16:37 EDT
Created attachment 192313 [details]
more test harness

I'd like to propose some more test harness inside 
JavaModelManager.getOptions() to find out what's going on here.


What do we know?
 - after the second simulated restart we observe an inconsistency
   between getOptions() and getOption(String)
   The former has no task tags configured, whereas the second still has.
 - both above methods may use the same strategy to look up the option
   from the preferences service, the main difference I can see being
   that getOptions() first checks this.optionsCache
 - test harness in setOptions(Hashtable) never got triggered

We don't know for sure:
 - is optionsCache *really* used in the failing calls?
 - how could optionsCache become inconsistent?

Regarding concurrency:
 - the main activity I could see in the debugger is that simulateRestart()
   causes the index manager thread to be started.
 - later when indexing is actually performed creating a SourceElementParser
   triggers initialization of the options. However, I have the feeling that
   this actually happens *after* things already got inconsistent.
   (indexing seems to happen only during waitUntilIndexesReady() whereas
   I previously observed the inconsistency directly after the restart
   and before waitUntilIndexesReady() - current test harness doesn't tell
   because at this step we don't print getOptions()).

The test harness should give us fine grained tracing of getOptions() and
also tell us when exactly task tags are observed being unconfigured.

I could think of the following suspect:
 - in theory the effect could be caused by incomplete values in
   JavaModelManager.optionNames
 - optionNames are initialized from 
     JavaCorePreferenceInitializer.initializeDefaultPreferences()
 - from simulateRestart() we first start the index manager and only after
   that we invoke the JavaCorePreferenceInitializer
 - should at that point the index manager actually perform any work we would
   indeed have a race condition between reading and initializing optionNames

The test harness should also tell us if optionNames are inconsistent.

Should I commit this test harness for now?
Comment 23 Olivier Thomann CLA 2011-04-01 10:23:49 EDT
(In reply to comment #22)
> Should I commit this test harness for now?
Yes please. We need to understand what is going on there.
Thanks for your investigation.
Comment 24 Stephan Herrmann CLA 2011-04-03 15:22:21 EDT
Created attachment 192427 [details]
slightly improved test harness

This version of the test harness I committed to HEAD and BETA_JAVA7.
(update: check task tags not just for null but also for "").
Comment 25 Stephan Herrmann CLA 2011-11-15 11:50:22 EST
Right now I'm looking at something that looks very similar to this bug:
a test failing with "Workspace options should be back to their default"

However, in this case, differences are not in task tags etc. but in
JDK levels and "inline JSR".

This time I could detect an unexpected call to JMM.setOptions() as the
culprit along this call chain:

		+ org.eclipse.jdt.internal.core.JavaModelManager.setOptions(JavaModelManager.java:4846)
		+ org.eclipse.jdt.core.JavaCore.setOptions(JavaCore.java:5391)
		+ org.eclipse.jdt.launching.JavaRuntime.updateCompliance(JavaRuntime.java:2774)
		+ org.eclipse.jdt.launching.JavaRuntime.initializeVMs(JavaRuntime.java:2737)
		+ org.eclipse.jdt.launching.JavaRuntime.getVMInstallTypes(JavaRuntime.java:523)
		+ org.eclipse.jdt.internal.launching.JREPreferenceModifyListener.preApply(JREPreferenceModifyListener.java:127)
		+ org.eclipse.core.internal.preferences.PreferencesService$4.run(PreferencesService.java:448)
		+ org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
		+ org.eclipse.core.internal.preferences.PreferencesService.firePreApplyEvent(PreferencesService.java:451)
		+ org.eclipse.core.internal.preferences.PreferencesService.applyPreferences(PreferencesService.java:121)
		+ org.eclipse.core.internal.resources.ProjectPreferences.read(ProjectPreferences.java:206)
		+ org.eclipse.core.internal.resources.ProjectPreferences.updatePreferences(ProjectPreferences.java:274)
		+ org.eclipse.core.internal.resources.File.updateMetadataFiles(File.java:410)
		+ org.eclipse.core.internal.resources.File.internalSetContents(File.java:322)
		+ org.eclipse.core.internal.resources.File.appendContents(File.java:53)


The last line was explicitly called from my test, file was
".settings/org.eclipse.jdt.core.prefs"

When this resource change is observed, JavaRuntime.initializeVMs() can't
seem to find a valid VM definition and then aligns the JavaCore compliance
options with the host VM.


Despite the similarity I can't see how a similar update could occur
for JavaCore.COMPILER_TASK_TAGS.
Comment 26 Stephan Herrmann CLA 2011-12-15 08:36:55 EST
Created attachment 208432 [details]
console log

Here's slightly more information from another occurrence while running
all JDT/Core tests in the IDE.

I extracted the part from the logs that starts with the last OK output
(indicated e.g. by "+ Task tags:           TODO,FIXME,XXX")

Next we see things go awry:

optionsCache was null
Options initialized from preferences
	+ Task tags:           null

Next we have some new information:
 - option names: [ ...
 - Call stack: ...

It seems that finally the test harness from comment 22 fired

(In reply to comment #22)
> The test harness should give us fine grained tracing of getOptions() and
> also tell us when exactly task tags are observed being unconfigured.
> 
> I could think of the following suspect:
>  - in theory the effect could be caused by incomplete values in
>    JavaModelManager.optionNames

Yes, we have very incomplete optionNames, only formatter options are mentioned

>  - optionNames are initialized from 
>      JavaCorePreferenceInitializer.initializeDefaultPreferences()
>  - from simulateRestart() we first start the index manager and only after
>    that we invoke the JavaCorePreferenceInitializer
>  - should at that point the index manager actually perform any work we would
>    indeed have a race condition between reading and initializing optionNames

we actually see evidence of concurrent access to optionNames further down:

java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:806)
	at java.util.HashMap$KeyIterator.next(HashMap.java:841)
	at java.util.AbstractCollection.toString(AbstractCollection.java:449)
	at java.lang.String.valueOf(String.java:2902)
	at java.lang.StringBuffer.append(StringBuffer.java:232)
	at org.eclipse.jdt.internal.core.JavaModelManager.checkTaskTags(JavaModelManager.java:2221)
	at org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2209)

The exception is raised in the test harness, so it will never occur in RL,
but it proves that someone is modifying optionNames while we use it to
retrieve options (JMM.getOptions()).

We also see a new trigger in the picture:
+ org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2180)
+ org.eclipse.jdt.core.JavaCore.getOptions(JavaCore.java:3710)
+ org.eclipse.jdt.internal.core.JavaProject.getOptions(JavaProject.java:1664)
+ org.eclipse.jdt.internal.core.search.indexing.IndexManager.getSourceElementParser(IndexManager.java:188)
+ org.eclipse.jdt.internal.core.DeltaProcessor.getSourceElementParser(DeltaProcessor.java:1413)
+ org.eclipse.jdt.internal.core.DeltaProcessor.updateIndex(DeltaProcessor.java:2741)
...
+ org.eclipse.core.internal.resources.CharsetDeltaJob.run(CharsetDeltaJob.java:177)

What's that? A charset delta fires initialization of options before
optionNames are fully initialized?

Yea, and maybe this is the accomplice in this coup:

+ org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2158)
+ org.eclipse.jdt.core.JavaCore.getOptions(JavaCore.java:3710)
+ org.eclipse.jdt.internal.core.JavaProject.getOptions(JavaProject.java:1664)
+ org.eclipse.jdt.internal.core.search.indexing.IndexManager.getSourceElementParser(IndexManager.java:188)
+ org.eclipse.jdt.internal.core.search.indexing.IndexAllProject.execute(IndexAllProject.java:203)

This guy seems to see the full set of optionNames, but this is too late
for the other (concurrent?) guy.

At this point I'll stop analyzing for now, but I think we have some
interesting clues now.
Comment 27 Stephan Herrmann CLA 2011-12-17 16:07:26 EST
FINALLY I succeeded to reproduce!

HOWTO:

+ Set a breakpoint in
  JavaCorePreferenceInitializer.initializeDefaultPreferences()
  on the next statement after
  optionNames.add(JavaCore.CORE_ENCODING);
  Set the breakpoints "Hit count" to 3

+ Set a breakpoint inside
  org.eclipse.core.runtime.jobs.Job.schedule(long)
  with this condition:
    if (this instanceof org.eclipse.core.internal.resources.CharsetDeltaJob)
	delay += 9000;
    System.out.println("schedule: "+delay);
    return false;
  This delays every CharsetDeltaJob by additional 9 sec.
  without suspending the VM during debug

+ Run JavaSearchBugsTests in the debugger, disabling everything before
  the testBug286379 group of tests

+ Watch the JUnit view (don't intervene)
  We'll enter testBug286379c, then pause for 9.5 secs,
  Then we'll stop at the first breakpoint, wait for 10 more secs,

+ Now look at the Console: at the bottom you should see three blocks 
  each starting with:

	+ Task tags:           null
	- option names: [org.eclipse.jdt.core.encoding]
	- Call stack:
		+ org.eclipse.jdt.internal.core.JavaModelManager.checkTaskTags(JavaModelManager.java:2223)
		+ org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2180)

+ Now hit F8 for resume and watch the dreaded 13 failures.

Explanation:
============
The following call inside the test
  javaContentType.removeFileSpec("torem", IContentType.FILE_EXTENSION_SPEC)
causes scheduling of a CharsetDeltaJob, normally with 500 ms delay, however,
with my trickery I forced this job to execute exactly at the bad time:

JavaCorePreferenceInitializer.initializeDefaultPreferences() line: 67	
JavaSearchBugsTests(AbstractJavaModelTests).simulateRestart() line: 2834	
JavaSearchBugsTests.testBug286379c() line: 11704	

simulateRestart() has just created a new JavaModelManager, however,
we've just started to fill JavaModelManager.optionNames. To witness:
we already added JavaCore.CORE_ENCODING, which is the only one you'll
see in the console (see above after "option names").

If at this bad time the CharsetDeltaJob fires, it will through some
indirection call IndexManager.getSourceElementParser(..). In that
method we'll call project.getOptions(true), which will initialize all
options that are found in JMM.optionNames, which is: ONLY ONE.
All options BUT JavaCore.CORE_ENCODING will not get their defaults set.

=> hence the failures from here on.

The trigger condition had nothing to do with the OS just with the speed
of the machine which had to pass a certain length of code in exactly 500ms.
Comment 28 Stephan Herrmann CLA 2011-12-17 16:56:04 EST
What can we learn from comment 27?

The test could easily be made robust simply be adding a sleep(500).

More interesting: could the error condition also happen in real life?

The sequence 
   simulateExit();
   simulateRestart();
is fully artificial, this will never happen in RL.
Actually, someone would need to call JavaModelManager.doNotUse()
which speaks for itself :)

The question is: can some client request the options before
JavaCorePreferenceInitializer.initializeDefaultPreferences has run??
If that's possible, the bug can be reproduced in RL, too.

Normally, access to the options is protected, because class loading
triggers JavaCore.start(..) which invokes JMM.startup(), from where
JMM.initializePreferences() is called. Inside, the access
  DefaultScope.INSTANCE.getNode(JavaCore.PLUGIN_ID)
triggers the desired initialization
(JavaCorePreferenceInitializer.initializeDefaultPreferences).

In order to break this guarantee, the DefaultScope.INSTANCE must already
contain a node "org.eclipse.jdt.core" before the jdt.core plugin is
initialized. Only then, will the getNode(..) access fail to trigger
initialization.

I think we can conclude that in normal operation this behavior is only
possible by some form of misbehaving code wrongly claiming responsibility
for the "org.eclipse.jdt.core" preferences.

The tests took a different strategy for breaking the guarantee: the
DefaultScope.INSTANCE retained its data across the simulated restart,
so the second start found the jdt.core preferences already in place.

Conclusion:
-----------
If you are convinced by the above argumentation it should be safe to
silence the failures by inserting sleep(500) after addFileSpec() and
after removeFileSpec(). Alternatively, we could check if more complete
reset (also wiping DefaultScope.INSTANCE) would achieve the same effect
in a more systematic way.
Comment 29 Satyam Kandula CLA 2011-12-19 09:28:14 EST
Very nice analysis. This analysis looks convincing. There is a wait for resource change listener after addFileSpec. I believe we should just use that and also call waitUntilIndexesReady() before calling simulateRestart(). I should have done this long time ago:(. 
Thanks a lot.
Comment 30 Satyam Kandula CLA 2011-12-19 09:31:08 EST
(In reply to comment #29)
Stephan, if you don't have a patch removing all the debug statements, I can create a patch.
Comment 31 Satyam Kandula CLA 2011-12-20 09:10:44 EST
Created attachment 208613 [details]
Patch

Patch according to Stephen's analysis.
Stephen, Please look at this and see if this should be good.
Comment 32 Satyam Kandula CLA 2012-01-02 03:06:15 EST
Stephan, Can you please review the patch?
Comment 33 Stephan Herrmann CLA 2012-01-02 15:44:36 EST
(In reply to comment #29)
> [...] There is a wait for resource change listener after addFileSpec.
> I believe we should just use that

Ok, this is effective, because NotifyManager.notify(..) invokes 5 listeners,
including:
 [2] DeltaProcessingState
 [4] JavaSearchBugsTests$1$TestResourceChangeListener
Here [2] is the bad guy and [4] is what we are waiting for,
hence when we receive the event the bad guy is already gone
and we're good to proceed.
Assuming the order of listeners is fix this solution is good.

> and also call waitUntilIndexesReady() before calling simulateRestart().

This doesn't seem to be needed. But for symmetry we can leave it in.


Indeed, with the patch I can no longer force the failures.
Comment 34 Satyam Kandula CLA 2012-01-04 01:18:58 EST
(In reply to comment #33)
Stephan, Thanks for the review.
Comment 35 Satyam Kandula CLA 2012-01-04 01:20:07 EST
Released on master via commit 8e69d36fa08db2d61d2c40fecd2494cfd19819f5
Comment 36 Satyam Kandula CLA 2012-01-05 05:23:12 EST
Reopening to fix this on 3.7
Comment 37 Satyam Kandula CLA 2012-01-05 05:48:26 EST
Created attachment 209063 [details]
Patch for 3.7 maintenance
Comment 38 Srikanth Sankaran CLA 2012-01-05 06:16:53 EST
(In reply to comment #36)
> Reopening to fix this on 3.7

Thanks for backporting this Satyam. This will avoid situations where this
sporadic problems shows up in the future after our memories have faded and
stealing cycles from us retriaging all over again.

In future, my request would be to avoid reformatting code to the extent
possible so that the delta looks straightforward.
Comment 39 Satyam Kandula CLA 2012-01-05 06:19:31 EST
Thanks Srikanth,
Released the 3.7 patch via commit 6569a500f187288b6d216a4cbcd415355bab10b8
Comment 40 Satyam Kandula CLA 2012-01-05 06:20:08 EST
(In reply to comment #39)
> Thanks Srikanth,
> Released the 3.7 patch via commit 6569a500f187288b6d216a4cbcd415355bab10b8
I meant I released this on 3.7 maintenance branch
Comment 41 Srikanth Sankaran CLA 2012-01-19 07:02:43 EST
Verified by code inspection that only logging code
was removed by the commit 6569a500f187288b6d216a4cbcd415355bab10b8.