Community
Participate
Working Groups
There are 13 failures in JavaModel tests for the N20100214-2000 Mac OS X - Cocoa test machine. All these failures are due to a change in the JavaCore options but the first test on which it occurs does not change them...!?
I've seen this sort of error intermittently running the tests on Linux. While it's good for me to see that it occurs in official builds, too, I unfortunately have no clues to contribute about what makes this failure happen. Only: any time the problem occurred it vanished on a subsequent test run without any source modifications. I remember seeing changes wrt compliance levels, which is different from the failures you are seeing, but still the overall symptom is the same. Weird, indeed.
(In reply to comment #1) > I've seen this sort of error intermittently running the tests on Linux. > While it's good for me to see that it occurs in official builds, too, > I unfortunately have no clues to contribute about what makes this > failure happen. Only: any time the problem occurred it vanished on > a subsequent test run without any source modifications. > > I remember seeing changes wrt compliance levels, which is different > from the failures you are seeing, but still the overall symptom is > the same. > > Weird, indeed. Thanks for the feedback Stephen. Did you notice whether the first of the failures you got occurred on the same offending test (i.e. testBug286379c() of JavaSearchBugTests) or not?
(In reply to comment #2) > Thanks for the feedback Stephen. Did you notice whether the first of the > failures you got occurred on the same offending test (i.e. testBug286379c() of > JavaSearchBugTests) or not? Sorry, I have no notes besides my small little brain. But I can't recall this particular test to play a role in this. I've also seen this with much bigger effect like a whole test suite failing with just this one error. To me it felt more like some kind of force majeure. I could never recognise a pattern behind it. Sometimes it occurred several times in a row, but if anything that I did fixed the problem it was like in a completely remote corner of the code. Some build weirdness or that kind. It's probably Fantômas :) I never reported this because I had no idea how to ever reproduce. Do you have the logs of that failed test run? Build log? Workbench log?
Occurred again in I20100302-0800.
(In reply to comment #4) > Occurred again in I20100302-0800. The failures occurred at the same test (with which we already got troubles to stabilize...). I'm investigating...
Created attachment 164028 [details] Test patch The test is failing some times and not be able to reproduce nor able to find any clues. Hence, this patch is to print some messages to enable some progress.
Jay, Can you apply this patch to enable making some progress on this test failure?
Occurred again in N20100627-2000. The additional output does not show any changes in the JavaCore options along the test: ========================================================================= Starting test JavaSearchBugTests.testBug286379c()... - Options at test start: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options before first exit: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options after first restart: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options before second exit: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options after second restart: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options after search: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options at test end: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error ========================================================================= Hence, I'm still unable to figure out why the test of options during the tear down fails... :-( I'm adding some other debug outputs to try to catch the origin of this issue...
I saw it again in our build. Unfortunately I'm still using v_A58 of the tests so I didn't have your additional harness mentioned in comment 8. However, inspecting the code I wonder: maybe JavaCore.getOption(String) is right all the time but only JavaCore.getOptions() is at fault. Note that a single option is always retrieved from the preference service, wheras getOptions() may just use this.optionsCache. Could that be the culprit, i.e., the optionsCache being corrupt? E.g., JavaModelManager.setOptions(Hashtable) could see a BackingStoreException in instancePreferences.flush(). At that point the preferences are already changed but the optionsCache is not. For now I'll just add e.printStackTrace() to setOptions, let's see.
(In reply to comment #9) > E.g., JavaModelManager.setOptions(Hashtable) could see a > BackingStoreException in instancePreferences.flush(). At that point > the preferences are already changed but the optionsCache is not. > For now I'll just add e.printStackTrace() to setOptions, let's see. Indeed, if there is a BackingStoreException, the cached value is not in synch anymore. I'll use the try/catch statement only around the statements that can throw such exception. Let's see if we get it again.
Here's *some* data at last: I changed the debug print statements from JavaCore.getOption(JavaCore.COMPILER_TASK_TAGS) to JavaCore.getOptions().get(JavaCore.COMPILER_TASK_TAGS) in order to test the optionsCache. Resulting output (w/o the additional prints from comment 8, since I'm in v_A58): Starting test JavaSearchBugTests.testBug286379c()... - Options at test start: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options before first exit: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options after first restart: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options before second exit: + Task tags: TODO,FIXME,XXX + Task priorities: NORMAL,HIGH,NORMAL + Forbidden reference: error - Options after second restart: + Task tags: null + Task priorities: null + Forbidden reference: null - Options after search: + Task tags: null + Task priorities: null + Forbidden reference: null - Options at test end: + Task tags: null + Task priorities: null + Forbidden reference: null At least we see when it happens, but it's the same simulated exit-start sequence that a few lines earlier actually succeeded. So what could possibly break getOptions() that was not detectable when using getOption(String)?? After I suspected setOptions(Hashtable) to bail out I added e.printStacktrace() statements for all relevant exceptions, which however were not triggered in this test run. So, I don't really have a theory at this point. Since I'm seeing the error fairly often these days I might as well insert more debug statements. Any suggestions what and where? At least we can now focus on this sequence: simulateExit(); simulateRestart(); Any idea what is suspecious here?
This happened again with build I20101025-1300.
(In reply to comment #12) > This happened again with build I20101025-1300. Do we have the console output from the test run (where should I look for it?)? Here's a new theory: it seems that JavaModelManager.optionsCache is corrupted somehow, but we could find neither an exception nor a concurrency issue that would trigger corruption. All access to this field ensure: - it is either null or a fully initialized Hashtable - it is never passed directly to clients, only copies are. One exception is in these lines: if (!Platform.isRunning()) { return this.optionsCache = getDefaultOptionsNoInitialization(); } Should we change this to if (!Platform.isRunning()) { this.optionsCache = getDefaultOptionsNoInitialization(); return new Hashtable(this.optionsCache); } ? I'm not sure whether tests can run into the !Platform.isRunning() branch.
(In reply to comment #13) > Do we have the console output from the test run (where should I look for it?)? The console output is http://fullmoon.ottawa.ibm.com/downloads/drops/I20101025-1300/testresults/macosx.cocoa.x86_5.0/org.eclipse.jdt.core.tests.model.AllJavaModelTests.txt > Here's a new theory: it seems that JavaModelManager.optionsCache is > corrupted somehow, but we could find neither an exception nor a concurrency > issue that would trigger corruption. All access to this field ensure: > - it is either null or a fully initialized Hashtable > - it is never passed directly to clients, only copies are. > One exception is in these lines: > if (!Platform.isRunning()) { > return this.optionsCache = getDefaultOptionsNoInitialization(); > } > > Should we change this to > if (!Platform.isRunning()) { > this.optionsCache = getDefaultOptionsNoInitialization(); > return new Hashtable(this.optionsCache); > } > ? > > I'm not sure whether tests can run into the !Platform.isRunning() branch. I think this should fix the problem. I couldn't think of anything either.
(In reply to comment #13) > One exception is in these lines: > if (!Platform.isRunning()) { > return this.optionsCache = getDefaultOptionsNoInitialization(); > } > > Should we change this to > if (!Platform.isRunning()) { > this.optionsCache = getDefaultOptionsNoInitialization(); > return new Hashtable(this.optionsCache); > } Olivier, I believe this change could help fix the test. This change is harmless. Can we try this in M3?
(In reply to comment #15) > Olivier, > I believe this change could help fix the test. This change is harmless. Can we > try this in M3? Sure we release this for consistency, but it won't change anything as the tests are always run with the running platform. Please proceed and release that change for M3. We will recontribute anyway.
Created attachment 181736 [details] A Patch Put Stephen's code snippet as specified in comment 13 in this patch.
(In reply to comment #17) > Created an attachment (id=181736) [details] > A Patch > > Put Stephen's code snippet as specified in comment 13 in this patch. Released this patch on HEAD for 3.7M3. Not resolving this bug as we are not sure this fixes.
Satyam, Unfortunately, I won't have time to look at this issue, hence I assign it to you...
(In reply to comment #18) > Released this patch on HEAD for 3.7M3. > Not resolving this bug as we are not sure this fixes. This patch hasn't fixed the problem. The test failed again on N20110315-2000 build.
Failed again in N20110327-2000. We need to understand what is going on with these failures. Why on Mac OS only ? Seems to be related to a timing issue, but looking at the code I don't see anything obvious.
Created attachment 192313 [details] more test harness I'd like to propose some more test harness inside JavaModelManager.getOptions() to find out what's going on here. What do we know? - after the second simulated restart we observe an inconsistency between getOptions() and getOption(String) The former has no task tags configured, whereas the second still has. - both above methods may use the same strategy to look up the option from the preferences service, the main difference I can see being that getOptions() first checks this.optionsCache - test harness in setOptions(Hashtable) never got triggered We don't know for sure: - is optionsCache *really* used in the failing calls? - how could optionsCache become inconsistent? Regarding concurrency: - the main activity I could see in the debugger is that simulateRestart() causes the index manager thread to be started. - later when indexing is actually performed creating a SourceElementParser triggers initialization of the options. However, I have the feeling that this actually happens *after* things already got inconsistent. (indexing seems to happen only during waitUntilIndexesReady() whereas I previously observed the inconsistency directly after the restart and before waitUntilIndexesReady() - current test harness doesn't tell because at this step we don't print getOptions()). The test harness should give us fine grained tracing of getOptions() and also tell us when exactly task tags are observed being unconfigured. I could think of the following suspect: - in theory the effect could be caused by incomplete values in JavaModelManager.optionNames - optionNames are initialized from JavaCorePreferenceInitializer.initializeDefaultPreferences() - from simulateRestart() we first start the index manager and only after that we invoke the JavaCorePreferenceInitializer - should at that point the index manager actually perform any work we would indeed have a race condition between reading and initializing optionNames The test harness should also tell us if optionNames are inconsistent. Should I commit this test harness for now?
(In reply to comment #22) > Should I commit this test harness for now? Yes please. We need to understand what is going on there. Thanks for your investigation.
Created attachment 192427 [details] slightly improved test harness This version of the test harness I committed to HEAD and BETA_JAVA7. (update: check task tags not just for null but also for "").
Right now I'm looking at something that looks very similar to this bug: a test failing with "Workspace options should be back to their default" However, in this case, differences are not in task tags etc. but in JDK levels and "inline JSR". This time I could detect an unexpected call to JMM.setOptions() as the culprit along this call chain: + org.eclipse.jdt.internal.core.JavaModelManager.setOptions(JavaModelManager.java:4846) + org.eclipse.jdt.core.JavaCore.setOptions(JavaCore.java:5391) + org.eclipse.jdt.launching.JavaRuntime.updateCompliance(JavaRuntime.java:2774) + org.eclipse.jdt.launching.JavaRuntime.initializeVMs(JavaRuntime.java:2737) + org.eclipse.jdt.launching.JavaRuntime.getVMInstallTypes(JavaRuntime.java:523) + org.eclipse.jdt.internal.launching.JREPreferenceModifyListener.preApply(JREPreferenceModifyListener.java:127) + org.eclipse.core.internal.preferences.PreferencesService$4.run(PreferencesService.java:448) + org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) + org.eclipse.core.internal.preferences.PreferencesService.firePreApplyEvent(PreferencesService.java:451) + org.eclipse.core.internal.preferences.PreferencesService.applyPreferences(PreferencesService.java:121) + org.eclipse.core.internal.resources.ProjectPreferences.read(ProjectPreferences.java:206) + org.eclipse.core.internal.resources.ProjectPreferences.updatePreferences(ProjectPreferences.java:274) + org.eclipse.core.internal.resources.File.updateMetadataFiles(File.java:410) + org.eclipse.core.internal.resources.File.internalSetContents(File.java:322) + org.eclipse.core.internal.resources.File.appendContents(File.java:53) The last line was explicitly called from my test, file was ".settings/org.eclipse.jdt.core.prefs" When this resource change is observed, JavaRuntime.initializeVMs() can't seem to find a valid VM definition and then aligns the JavaCore compliance options with the host VM. Despite the similarity I can't see how a similar update could occur for JavaCore.COMPILER_TASK_TAGS.
Created attachment 208432 [details] console log Here's slightly more information from another occurrence while running all JDT/Core tests in the IDE. I extracted the part from the logs that starts with the last OK output (indicated e.g. by "+ Task tags: TODO,FIXME,XXX") Next we see things go awry: optionsCache was null Options initialized from preferences + Task tags: null Next we have some new information: - option names: [ ... - Call stack: ... It seems that finally the test harness from comment 22 fired (In reply to comment #22) > The test harness should give us fine grained tracing of getOptions() and > also tell us when exactly task tags are observed being unconfigured. > > I could think of the following suspect: > - in theory the effect could be caused by incomplete values in > JavaModelManager.optionNames Yes, we have very incomplete optionNames, only formatter options are mentioned > - optionNames are initialized from > JavaCorePreferenceInitializer.initializeDefaultPreferences() > - from simulateRestart() we first start the index manager and only after > that we invoke the JavaCorePreferenceInitializer > - should at that point the index manager actually perform any work we would > indeed have a race condition between reading and initializing optionNames we actually see evidence of concurrent access to optionNames further down: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:806) at java.util.HashMap$KeyIterator.next(HashMap.java:841) at java.util.AbstractCollection.toString(AbstractCollection.java:449) at java.lang.String.valueOf(String.java:2902) at java.lang.StringBuffer.append(StringBuffer.java:232) at org.eclipse.jdt.internal.core.JavaModelManager.checkTaskTags(JavaModelManager.java:2221) at org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2209) The exception is raised in the test harness, so it will never occur in RL, but it proves that someone is modifying optionNames while we use it to retrieve options (JMM.getOptions()). We also see a new trigger in the picture: + org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2180) + org.eclipse.jdt.core.JavaCore.getOptions(JavaCore.java:3710) + org.eclipse.jdt.internal.core.JavaProject.getOptions(JavaProject.java:1664) + org.eclipse.jdt.internal.core.search.indexing.IndexManager.getSourceElementParser(IndexManager.java:188) + org.eclipse.jdt.internal.core.DeltaProcessor.getSourceElementParser(DeltaProcessor.java:1413) + org.eclipse.jdt.internal.core.DeltaProcessor.updateIndex(DeltaProcessor.java:2741) ... + org.eclipse.core.internal.resources.CharsetDeltaJob.run(CharsetDeltaJob.java:177) What's that? A charset delta fires initialization of options before optionNames are fully initialized? Yea, and maybe this is the accomplice in this coup: + org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2158) + org.eclipse.jdt.core.JavaCore.getOptions(JavaCore.java:3710) + org.eclipse.jdt.internal.core.JavaProject.getOptions(JavaProject.java:1664) + org.eclipse.jdt.internal.core.search.indexing.IndexManager.getSourceElementParser(IndexManager.java:188) + org.eclipse.jdt.internal.core.search.indexing.IndexAllProject.execute(IndexAllProject.java:203) This guy seems to see the full set of optionNames, but this is too late for the other (concurrent?) guy. At this point I'll stop analyzing for now, but I think we have some interesting clues now.
FINALLY I succeeded to reproduce! HOWTO: + Set a breakpoint in JavaCorePreferenceInitializer.initializeDefaultPreferences() on the next statement after optionNames.add(JavaCore.CORE_ENCODING); Set the breakpoints "Hit count" to 3 + Set a breakpoint inside org.eclipse.core.runtime.jobs.Job.schedule(long) with this condition: if (this instanceof org.eclipse.core.internal.resources.CharsetDeltaJob) delay += 9000; System.out.println("schedule: "+delay); return false; This delays every CharsetDeltaJob by additional 9 sec. without suspending the VM during debug + Run JavaSearchBugsTests in the debugger, disabling everything before the testBug286379 group of tests + Watch the JUnit view (don't intervene) We'll enter testBug286379c, then pause for 9.5 secs, Then we'll stop at the first breakpoint, wait for 10 more secs, + Now look at the Console: at the bottom you should see three blocks each starting with: + Task tags: null - option names: [org.eclipse.jdt.core.encoding] - Call stack: + org.eclipse.jdt.internal.core.JavaModelManager.checkTaskTags(JavaModelManager.java:2223) + org.eclipse.jdt.internal.core.JavaModelManager.getOptions(JavaModelManager.java:2180) + Now hit F8 for resume and watch the dreaded 13 failures. Explanation: ============ The following call inside the test javaContentType.removeFileSpec("torem", IContentType.FILE_EXTENSION_SPEC) causes scheduling of a CharsetDeltaJob, normally with 500 ms delay, however, with my trickery I forced this job to execute exactly at the bad time: JavaCorePreferenceInitializer.initializeDefaultPreferences() line: 67 JavaSearchBugsTests(AbstractJavaModelTests).simulateRestart() line: 2834 JavaSearchBugsTests.testBug286379c() line: 11704 simulateRestart() has just created a new JavaModelManager, however, we've just started to fill JavaModelManager.optionNames. To witness: we already added JavaCore.CORE_ENCODING, which is the only one you'll see in the console (see above after "option names"). If at this bad time the CharsetDeltaJob fires, it will through some indirection call IndexManager.getSourceElementParser(..). In that method we'll call project.getOptions(true), which will initialize all options that are found in JMM.optionNames, which is: ONLY ONE. All options BUT JavaCore.CORE_ENCODING will not get their defaults set. => hence the failures from here on. The trigger condition had nothing to do with the OS just with the speed of the machine which had to pass a certain length of code in exactly 500ms.
What can we learn from comment 27? The test could easily be made robust simply be adding a sleep(500). More interesting: could the error condition also happen in real life? The sequence simulateExit(); simulateRestart(); is fully artificial, this will never happen in RL. Actually, someone would need to call JavaModelManager.doNotUse() which speaks for itself :) The question is: can some client request the options before JavaCorePreferenceInitializer.initializeDefaultPreferences has run?? If that's possible, the bug can be reproduced in RL, too. Normally, access to the options is protected, because class loading triggers JavaCore.start(..) which invokes JMM.startup(), from where JMM.initializePreferences() is called. Inside, the access DefaultScope.INSTANCE.getNode(JavaCore.PLUGIN_ID) triggers the desired initialization (JavaCorePreferenceInitializer.initializeDefaultPreferences). In order to break this guarantee, the DefaultScope.INSTANCE must already contain a node "org.eclipse.jdt.core" before the jdt.core plugin is initialized. Only then, will the getNode(..) access fail to trigger initialization. I think we can conclude that in normal operation this behavior is only possible by some form of misbehaving code wrongly claiming responsibility for the "org.eclipse.jdt.core" preferences. The tests took a different strategy for breaking the guarantee: the DefaultScope.INSTANCE retained its data across the simulated restart, so the second start found the jdt.core preferences already in place. Conclusion: ----------- If you are convinced by the above argumentation it should be safe to silence the failures by inserting sleep(500) after addFileSpec() and after removeFileSpec(). Alternatively, we could check if more complete reset (also wiping DefaultScope.INSTANCE) would achieve the same effect in a more systematic way.
Very nice analysis. This analysis looks convincing. There is a wait for resource change listener after addFileSpec. I believe we should just use that and also call waitUntilIndexesReady() before calling simulateRestart(). I should have done this long time ago:(. Thanks a lot.
(In reply to comment #29) Stephan, if you don't have a patch removing all the debug statements, I can create a patch.
Created attachment 208613 [details] Patch Patch according to Stephen's analysis. Stephen, Please look at this and see if this should be good.
Stephan, Can you please review the patch?
(In reply to comment #29) > [...] There is a wait for resource change listener after addFileSpec. > I believe we should just use that Ok, this is effective, because NotifyManager.notify(..) invokes 5 listeners, including: [2] DeltaProcessingState [4] JavaSearchBugsTests$1$TestResourceChangeListener Here [2] is the bad guy and [4] is what we are waiting for, hence when we receive the event the bad guy is already gone and we're good to proceed. Assuming the order of listeners is fix this solution is good. > and also call waitUntilIndexesReady() before calling simulateRestart(). This doesn't seem to be needed. But for symmetry we can leave it in. Indeed, with the patch I can no longer force the failures.
(In reply to comment #33) Stephan, Thanks for the review.
Released on master via commit 8e69d36fa08db2d61d2c40fecd2494cfd19819f5
Reopening to fix this on 3.7
Created attachment 209063 [details] Patch for 3.7 maintenance
(In reply to comment #36) > Reopening to fix this on 3.7 Thanks for backporting this Satyam. This will avoid situations where this sporadic problems shows up in the future after our memories have faded and stealing cycles from us retriaging all over again. In future, my request would be to avoid reformatting code to the extent possible so that the delta looks straightforward.
Thanks Srikanth, Released the 3.7 patch via commit 6569a500f187288b6d216a4cbcd415355bab10b8
(In reply to comment #39) > Thanks Srikanth, > Released the 3.7 patch via commit 6569a500f187288b6d216a4cbcd415355bab10b8 I meant I released this on 3.7 maintenance branch
Verified by code inspection that only logging code was removed by the commit 6569a500f187288b6d216a4cbcd415355bab10b8.