Community
Participate
Working Groups
JDT Core plug-in version: 3.8 M5 (org.eclipse.jdt.core_3.8.1.v20120125-1544). The compiler reports an error in N20120222-2000 which is not expected: http://download.eclipse.org/eclipse/downloads/drops/N20120222-2000/compilelogs/plugins/org.eclipse.ui.tests.harness_1.2.0.N20120222-2000/@dot.html This is the code which caused the unexpected error. public static void pressKeyCodeCombination(Display display, int[] keyCodes, boolean runEventQueue) { for (int i = 0; i < keyCodes.length; i++) keyCodeDown(display, keyCodes[i], runEventQueue); for (int i = keyCodes.length - 1; i >= 0; i--) keyCodeUp(display, keyCodes[i], runEventQueue); } For used compiler options and other details, see: http://download.eclipse.org/eclipse/downloads/drops/N20120222-2000/compilelogs/plugins/org.eclipse.ui.tests.harness_1.2.0.N20120222-2000/@dot.xml I tried to reproduce it with 3.8 M5 and N20120222-2000, but no "luck". What worries me, is that no error was reported on the same source for previous N-builds.
Worrisome, indeed. When I first saw this I couldn't find which version JDT/Core was used, so... > JDT Core plug-in version: 3.8 M5 (org.eclipse.jdt.core_3.8.1.v20120125-1544). ... that would be 3.8M5. But using that I have no "luck" either. Even using the exact preferences from http://git.eclipse.org/c/platform/eclipse.platform.ui.git/plain/tests/org.eclipse.ui.tests.harness/.settings/org.eclipse.jdt.core.prefs didn't help. "hardware failure" ;-/
We're seeing the same sorts of issues with the WTP build, along with some CodeCannotBeReached "errors". In every instance, it is a false positive, and does not repeat in the next build. Some examples (which will be deleted in a few days): http://build.eclipse.org/webtools/committers/wtp4x-R3.4.0-I/20120301032607/I-3.4.0-20120301032607/testcompilelogs/plugins/org.eclipse.wtp.releng.tools_1.2.1.v201103080400/wtpRelengTools.jar.bin.html 1. ERROR: UninitializedLocalVariable The local variable amountRead may not have been initialized ResultsSummaryGenerator.java : 580 : while (amountRead != -1); The lines of code in question begin with: int amountRead = -1; And: http://build.eclipse.org/webtools/committers/wtp4x-R3.4.0-I/20120301163247/I-3.4.0-20120301163247/compilelogs/plugins/org.eclipse.wst.validation_1.2.400.v201202090400/@dot.html 1. ERROR: CodeCannotBeReached Unreachable code ValidationConfiguration.java : 95 : return buffer.toString(); That is a very short method, with that line being the only return, no throws, and only a simple for loop before it. We were using Eclipse 3.8 M4, and are now using Eclipse 3.8 I20120228-0800, but we still encounter these random issues.
Random compiler errors that disappear on the next build without relevant source changes are very irritating. I haven't seen anything like that. If there is non-determinism, that would likely hint at a concurrency issue. The relevant information is stored in instances of UnconditionalFlowInfo. Normally, these instances are short lived and never stored in any fields, so visibility across threads is unlikely. An exception got recently introduced: bug 359334 introduced the field BlockScope.finallyInfo, which means that this one flow info is now visible for the life time of a BlockScope, which in turn is tied to the life time of an AbstractMethodDeclaration. Ergo, should an AbstractMethodDeclaration ever be shared across threads this could possibly affect this one flow info, too. But if we get so far, concurrency related havoc on flow info still seems much more unlikely than on any other object inside the compiler. I have no idea, how the same AST would be accessed concurrently. Can this ever be triggered? Other potential sources for semi-non-determinism: - memory addresses contributing to hashCode? I don't see any hashtables in the neighborhood of flow analysis, so this isn't a very hot track either. - JIT?? That kind of issue has been observed with 3.8 M4, M5 and the most recent integration build, where the most recent symptom is (related but) slightly different. Maybe we should also collect data on which JVM the bug was observed...
Satyam, could you study this in detail and identify some scenarios/evolve some hypotheses in conjunction with Stephan ? This is very worrisome.
The first instance of the problem seems to have been with N20120110-2000, where there were some bogus unreachable code diagnostics from http://git.eclipse.org/c/equinox/rt.equinox.p2.git/tree/bundles/org.eclipse.equinox.p2.repository.tools/src/org/eclipse/equinox/p2/internal/repository/comparator/java/Disassembler.java
(In reply to comment #3) It is interesting to see that the spurious errors in both WTP and SDK are similar. Stephan, nice explanation of the possible cause of errors. Here are some of my comments on this. - Concurrency related problems. I believe only the compiler component is used in both the cases, and if so there could be hardly any concurrency issues. I will need to confirm this. - Hashtable I don't understand this. - JIT: This does seem to fit. As the errors are happening probably at the same place in WTP and SDK, this is likely possible. Kim, Can you confirm if the build would have used Java 7 in N20120110-2000 and N20120222-2000? Do we know the versions in this case? Carl, Can you confirm if your build uses Java 7 and if so, which version.
(In reply to comment #6) > Kim, Can you confirm if the build would have used Java 7 in N20120110-2000 and > N20120222-2000? Do we know the versions in this case? > Carl, Can you confirm if your build uses Java 7 and if so, which version. Just curious - why are you zeroing in on JVM7 ?
WTP (and, I think platform) do use "parallelCompilation" so think it could be a thread-related problem. In one of our build.properties files, we have (complete with original comments :): # Properties introduced in 3.5 M3 (PDE base builder R35_M2). See # http://aniefer.blogspot.com/2008/10/sorting-bundles-and-parallel.html flattenDependencies=true parallelCompilation=true We (WTP) don't use VM7 to run the compiler ... we do use an IBM VM ... I'd have to dig to figure out which if 5 or 6 (I think 6) and which version ... but, my money's on the concurrency :) FYI we do use the 32 bit VM even though running on the Eclipse Foundation's 64 bit hardware, in case that'd matter.
Thanks David, this atleast rules out JIT problems in Java7. Will try to see if see there could be some issues with parallel compilation.
Thanks David, if you happen to remember any recent (last 3-4 months) build environment or test environment changes, let us know.
Created attachment 211937 [details] example of parallel compilation file This file would not literally have been from the build that had a failure (we don't keep them around) but assume would be identical to it. One odd failure we saw was in org.eclipse.wtp.releng.tools (line 21) But, the file shows threadsPerProcessor='3' (we don't change the default) but fairly large number of parallel groups.) I add this file just as a concrete illustration ... doubt its of any practical value in reproducing or tracking down. I don't recall any changes to hardware or environment, but Carl might remember better than I ... I'm just commenting since I'm up late; to take advantage of time zone differences :)
(In reply to comment #6) > - Hashtable > I don't understand this. The semi-non-deterministic behavior could potentially be caused, e.g., by iterating over a hashtable, in which case the memory addresses of contained objects could - via hashCode() - influence the order of elements. Just a theory.
(In reply to comment #12) > The semi-non-deterministic behavior I guess "quasi-non-deterministic" is the term I was searching for :)
Following up on the concurrency hypothesis: - isn't each Compiler instance uniquely connected to one ProcessTaskManager? => Is there any chance for any instances (of any relevant type) to be shared among several concurrent ProcessTaskManagers?? - is there any static field in the entire compiler that could leak relevant state to another thread? From the top of my head I'd answer "no" to both questions. But if concurrency were the culprit, we'd need to find at least one "yes", hmm...
Created attachment 212011 [details] Cleanup for field finallyInfo The field BlockScope.finallyInfo mentioned in comment 3 could indeed use some cleanup. The patch simply nulls the field when it is no longer needed. Should concurrent access to any FlowInfo be involved in this bug, then the patch could indeed help to confine those objects to a very short life cycle. Still we don't have any specific theory, how this could happen. On another track, since we observe an inconsistency/data loss between flowInfo.markAsDefinitelyAssigned(localBinding); and flowInfo.isDefinitelyAssigned(localBinding) also any confusion of localBinding (or its .id) could cause this kind of havoc.
(In reply to comment #15) > Still we don't have any specific theory, how this could happen. I'm not exactly sure what you mean by this ("no theory"), and have not looked at the complete code (just the patch snippet), but, I'm wondering ... if you are saying multiple threads can read/write this field, if it might be a case where the "volatile" keyword is needed. I've heard conflicting things about "volatile", that's its usually not required, usually doesn't do what you think it would, and usually just slows things down ... but ... it does seem required in some threading issues? http://www.javamex.com/tutorials/synchronization_volatile.shtml I myself know just enough about threading to be dangerous (to quote John D. :)
(In reply to comment #12) > (In reply to comment #6) > > - Hashtable > > I don't understand this. > > The semi-non-deterministic behavior could potentially be caused, e.g., by > iterating over a hashtable, in which case the memory addresses of contained > objects could - via hashCode() - influence the order of elements. > Just a theory. Thanks for the info.
(In reply to comment #15) > Created attachment 212011 [details] > Cleanup for field finallyInfo. I doubt if this could help, but the changes look good.
(In reply to comment #16) > (In reply to comment #15) > > > Still we don't have any specific theory, how this could happen. > > I'm not exactly sure what you mean by this ("no theory"), and have not looked > at the complete code (just the patch snippet), but, I'm wondering ... if you > are saying multiple threads can read/write this field, I meant to say: I have no idea how *any* objects in the neighborhood of the problem could ever be shared among threads. But *even if* so, any FlowInfo instances are among the most short-lived and never-stored (with this single exception) objects. > if it might be a case where the "volatile" keyword is needed. Well, I recently had one of those "impossible bugs" where synchronization looked complete - until you read about the Java memory model. But that's a different story: you know about concurrency, you have protected your code via "synchronized" and still field access gives funny results. In this bug we have to find the trick how sharing objects across thread could become possible in the first place. If we don't find that we have no chance of understanding the problem as a concurrency issue.
(In reply to comment #8) > # Properties introduced in 3.5 M3 (PDE base builder R35_M2). See > # http://aniefer.blogspot.com/2008/10/sorting-bundles-and-parallel.html > > flattenDependencies=true > parallelCompilation=true In order to track it down, it would be good to know if this is a new issue in the compiler or if this is an old issue that can now be visible because of these new settings during the compilation. Would it be possible to patch the build infrastructure to use an old compiler with these settings ?
(In reply to comment #18) > (In reply to comment #15) > > Created attachment 212011 [details] > > Cleanup for field finallyInfo. > I doubt if this could help, but the changes look good. I agree. If anything, making this field thread local would have been a more "appropriate" fix. Kim, we are going to need some help on this from you - could you pin down the exact build environment compiler version used to build N20120222-2000 and N20120110-2000. Likewise David and Carl, we need the exact version of the build environment compiler using which these problems were observed. Until we pin that down the picture could be confusing, but here is some analysis: In the disassembler case, we can make some assertions regarding some intermediate steps that "must have happened": Here is the concerned client code: switch (c) { case '\b' : buffer.append("\\b"); // <<----ALL is well here break; // <<<< Unreachable code Here is an extract of the relevant code from SwitchStatement#AnalyseCode: for (int i = 0, max = this.statements.length; i < max; i++) { Statement statement = this.statements[i]; ... ... if ((complaintLevel = statement.complainIfUnreachable(caseInits, this.scope, complaintLevel, true)) < Statement.COMPLAINED_UNREACHABLE) { caseInits = statement.analyseCode(this.scope, switchContext, caseInits); ... ... } } The only way we would have reported a "Unreachable code" error at the "break;" statement is if the analyseCode invocation on the message send marked its return flowinfo value as UNREACHABLE. Now MessageSend#analyzeCode returns its argument flowInfo (which is known not to be UNREACHABLE as otherwise we would have complained on the message send itself), after possibly modifying itself at 4 places. (1) flowInfo = this.receiver.analyseCode(currentScope, flowContext, flowInfo, nonStatic).unconditionalInits(); (2) flowInfo = flowInfo.mergedWith(assertInfo.nullInfoLessUnconditionalCopy()). addInitializationsFrom(assertWhenTrueInfo.discardInitializationInfo()); (3) flowInfo = this.arguments[i].analyseCode(currentScope, flowContext, flowInfo).unconditionalInits(); (4) flowInfo = FakedTrackingVariable.markPassedToOutside(currentScope, argument, flowInfo, false); Of these 4 modifications to the input flowInfo, (1) and (3) are very old and can be discounted. Likewise (2) was introduced very recently well after the problems were observed. So it would appear the smoking gun points to (4). While I say this "must have happened", looking at HEAD as well as looking at some past revisions of concerned files, it is unclear to me how "this which must have happend could have happened". Hence it is absolutely imperative we reconstruct the build environment compiler as a source project and then ask questions about how it could have come to pass that flowinfo at a certain point described it as being unreachable while analyzing a message send.
(In reply to comment #21) > Of these 4 modifications to the input flowInfo, (1) and (3) are > very old and can be discounted. Likewise (2) was introduced very recently > well after the problems were observed. > > So it would appear the smoking gun points to (4). A more open line of inquiry would be to just eliminate (2) and ask how (1) or (3) or (4) could have resulted in an unreachable status for the break statement and what non-deterministic agent could have caused it.
The uninitialized local variable was built with Eclipse 3.8 M4. The unreachable code error was built with Eclipse 3.8 I20120228-0800. Is that specific enough? Or do you need specific plugin versions or something else?
(In reply to comment #23) > Is that specific enough? Or do you need specific plugin versions or something > else? Thanks, this is good enough - sorry I see that you have already provided this in comment#2. Would it be possible to attach ResultsSummaryGenerator.java ValidationConfiguration.java here ? TIA.
Created attachment 212154 [details] ResultsSummaryGenerator
Created attachment 212156 [details] ValidationConfiguration
(In reply to comment #26) > Created attachment 212156 [details] > ValidationConfiguration Thanks. Very very interestingly, we have the same buffer.append method call here as in the disassembler case - so all preliminary analysis done in comment#21 should hold true: Here is the method: public static String getEnabledElementsAsString(ValidatorMetaData[] elements) { StringBuffer buffer = new StringBuffer(); for (ValidatorMetaData vmd : elements) { buffer.append(vmd.getValidatorUniqueName()); buffer.append(ConfigurationConstants.ELEMENT_SEPARATOR); } return buffer.toString(); // <<<---- UNREACHABLE CODE ERROR HERE } Carl, one more question: what is the type of ConfigurationConstants.ELEMENT_SEPARATOR ? (in the disassembler case the argument to append call was a string literal)
(In reply to comment #21) > Now MessageSend#analyzeCode returns its argument flowInfo (which > is known not to be UNREACHABLE as otherwise we would have complained > on the message send itself), after possibly modifying itself at 4 > places. That ignores the places where the object is address exposed, so we need to look at all places where the argument flowinfo was passed around. Still this looks like a promising line of inquiry to me. Consider that before the invocation of MessageSend.analyzeCode, things were just fine and were broken just after ...
(In reply to comment #27) > Carl, one more question: what is the type of > ConfigurationConstants.ELEMENT_SEPARATOR ? (in the disassembler case the > argument to append call was a string literal) public interface ConfigurationConstants { String ELEMENT_SEPARATOR = ";"; //$NON-NLS-1$
(In reply to comment #28) > Consider that before the invocation of MessageSend.analyzeCode, > things were just fine and were broken just after ... I stepped through the code trying to imagine what could go wrong and drew a blank. This one is going to be a tough nut to crack. Even as we consider various approaches, we should also look into whether we want to instrument the compiler to gather some data - the unreachable code problem looks way more amenable than the uninitialized variable problem. Given something goes wrong in MessageSend.analyzeCode, we can check after every use of flowinfo (13 of them) to see if it has been polluted already (is DEAD_END) and if so log/print a message. What other data would we want ? We need to carefully think about it - we have limited to enable this, and have it running before we will have to remove it for M7 or so.
*** Bug 372944 has been marked as a duplicate of this bug. ***
(In reply to comment #31) > *** Bug 372944 has been marked as a duplicate of this bug. *** Umm.. one more unitialized bug again coming from a for loop. While Srikanth is exploring instrumentation from the 'dead code' view, I will explore instrumentation from this point of view.
@Override public String toString() { StringBuffer buf= new StringBuffer(); for (int i= 0; i < fParts.length; i++) { buf.append(fParts[i].length()); buf.append(','); buf.append(fParts[i]); } return buf.toString(); } is the concerned method from bug 372944. So here too, we have StringBuffer and append call, Just after the append call sonmething has gone wrong !!!!!
(In reply to comment #33) >[..] > So here too, we have StringBuffer and append call, Just after the append call > sonmething has gone wrong !!!!! The point (4) in comment 21 looks closer to the culprit now, since StringBuffer is a resource, I guess.
(In reply to comment #34) > (In reply to comment #33) > >[..] > > So here too, we have StringBuffer and append call, Just after the append call > > sonmething has gone wrong !!!!! > > The point (4) in comment 21 looks closer to the culprit now, since StringBuffer > is a resource, I guess. Actually it is not a resource and also (4) from comment#21 gets short circuited internally for the most part for some of the failing cases.
(In reply to comment #33) > @Override > public String toString() { > StringBuffer buf= new StringBuffer(); > for (int i= 0; i < fParts.length; i++) { > buf.append(fParts[i].length()); > buf.append(','); > buf.append(fParts[i]); Since we don't complain about the use of i in buf.append(fParts[i]) it tells us that the flowinfo was kosher at the time of the call to flowInfo = this.arguments[i].analyseCode(currentScope, flowContext, flowInfo).unconditionalInits(); in MessageSend.analyzeCode. Either this call "corrupted" it after it made of the kosher data and thereby polluting the flowinfo lhs variable or some subsequent use of flowinfo in MessageSend.analyzeCode did it. Subsequent uses: (1) flowInfo = FakedTrackingVariable.markPassedToOutside(currentScope, this.arguments[i], flowInfo, false); I cannot see how this can cause a problem as most of the code is short circuited and is effective only for single name references or allocation expressions - fParts[i] is neither. (2) analyseArguments(currentScope, flowContext, flowInfo, this.binding, this.arguments); This should have a bearing only if the method parameters are annotated with null annotations - the method is from JRE and is not annotated. (3) manageSyntheticAccessIfNecessary(currentScope, flowInfo); Again, I cannot see how this can have any bearing. On another note: both this test case and the one from Results SummaryGenerator exhibit another interesting commonality: the last line of the loop contains a reference which we don't complain about, but we complain about loop terminating check references. Perhaps there is something wrong in the code that merges flow data - though what would explain the non-determinism is beyond comprehension at the moment.
(In reply to comment #36) > On another note: both this test case and the one from Results > SummaryGenerator exhibit another interesting commonality: the > last line of the loop contains a reference which we don't > complain about, but we complain about loop terminating check > references. Perhaps there is something wrong in the code that > merges flow data - though what would explain the non-determinism > is beyond comprehension at the moment. i.e in int amountRead = -1; do { int amountRequested = Math.max(stream.available(), DEFAULT_READING_SIZE); if (contentsLength + amountRequested > contents.length) { System.arraycopy(contents, 0, contents = new byte[contentsLength + amountRequested], 0, contentsLength); } amountRead = stream.read(contents, contentsLength, amountRequested); if (amountRead > 0) { // remember length of contents contentsLength += amountRead; } } while (amountRead != -1); we don't complain about the last use of amountRead inside the loop, but we complain about the use inside the while check.
(In reply to comment #8) > WTP (and, I think platform) do use "parallelCompilation" so think it could be a > thread-related problem. In one of our build.properties files, we have (complete > with original comments :): > > # Properties introduced in 3.5 M3 (PDE base builder R35_M2). See > # http://aniefer.blogspot.com/2008/10/sorting-bundles-and-parallel.html > > flattenDependencies=true > parallelCompilation=true > Does somebody know what exactly these setting do ? My naive understanding is that these flags enable ANT to launch parallel builds, if so there are multiple instances of the compiler running on separate VMs ? This would not raise the sort of spurious data modification problems we are seeing ? > but, my > money's on the concurrency :) So is mine - only we are stumped at the moment trying to come up with a scenario where the data would have an unanticipated agent modifying it. Continuing to investigate ...
Sorry, I don't have much time to investigate these days, but my gut feeling says: if concurrency succeeds to creep into the compiler the bug patterns could be much more diverse than what we are seeing. The internal code is not designed for thread safety so every single object in the compiler is vulnerable as soon as some of these are shared between threads. Why only hit on the flowInfo? I'd like to put some money on JIT. We don't have much info in this category, only that David reported an IBM JVM 5 or 6. Doesn't sound like sufficient data to rule out a JIT bug. A JIT bug would fit the pattern that perhaps a specific method (some flowInfo manipulation) is affected - iff JIT-optimized. Perhaps some project who observes the bug "frequently" could disable JIT for the JVM running the compiler for a while?
(In reply to comment #39) > Sorry, I don't have much time to investigate these days, but my gut feeling > says: if concurrency succeeds to creep into the compiler the bug patterns could > be much more diverse than what we are seeing. The internal code is not designed > for thread safety so every single object in the compiler is vulnerable as soon > as some of these are shared between threads. Why only hit on the flowInfo? Indeed, none of the methods in LookupEnvironment is synchronized or otherwise written to ensure for serial access. AND for the record, we have a single thread running compiler processing tasks, so these data structures that are getting stomped on are not being accessed in a concurrent fashion at all. > A JIT bug would fit the pattern that perhaps a specific method (some flowInfo > manipulation) is affected - iff JIT-optimized. Perhaps some project who > observes the bug "frequently" could disable JIT for the JVM running the > compiler for a while? What would this show ? Absence of sighting is not going to be good enough ? Alternately, is there a way to force the VM to always jit optimize a class or set of classes ? That way we can force the bug to manifest ?
(In reply to comment #8) > We (WTP) don't use VM7 to run the compiler ... we do use an IBM VM ... I'd have > to dig to figure out which if 5 or 6 (I think 6) and which version ... but, my > money's on the concurrency :) FYI we do use the 32 bit VM even though running > on the Eclipse Foundation's 64 bit hardware, in case that'd matter. David, could you please dig up this information for us ? The exact version of the VM + some deployment history (most recent change) would help.
Kim, likewise, could you post here the exact version of the VM used to run the builds (not the tests) and its deployment date ? TIA,
(In reply to comment #39) > A JIT bug would fit the pattern that perhaps a specific method (some flowInfo > manipulation) is affected - iff JIT-optimized. I guess there are multiple (at least more than one) translations involved here too - translate and optimize based on how hot the code is - need to check with the VM documentation.
A strategy to trigger a JIT bug could be to compile a file that already failed once over and over again. JITting is usually triggered by heavy usage of a method. For an Oracle VM, you could run with -XX:+PrintCompilation to see the JITting: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
(In reply to comment #41) > (In reply to comment #8) > > > We (WTP) don't use VM7 to run the compiler ... we do use an IBM VM ... I'd have > > to dig to figure out which if 5 or 6 (I think 6) and which version ... but, my > > money's on the concurrency :) FYI we do use the 32 bit VM even though running > > on the Eclipse Foundation's 64 bit hardware, in case that'd matter. > > David, could you please dig up this information for us ? The exact version > of the VM + some deployment history (most recent change) would help. I'm not David, but I am the WTP Build guy, and I can answer this question- the Java that we use can be found at /shared/webtools/apps/ibm-java2-i386-50-SR12FP5 and the -fullversion reports: java full version "J2RE 1.5.0 IBM Linux build pxi32devifx-20110627 (SR12 FP5 )" (Note that the BREE for the plugins containing both of these files is 1.5) We updated our Java levels on 2011-08-13
(In reply to comment #38) > (In reply to comment #8) > > WTP (and, I think platform) do use "parallelCompilation" so think it could be a > > thread-related problem. In one of our build.properties files, we have (complete > > with original comments :): > > > > # Properties introduced in 3.5 M3 (PDE base builder R35_M2). See > > # http://aniefer.blogspot.com/2008/10/sorting-bundles-and-parallel.html > > > > flattenDependencies=true > > parallelCompilation=true > > > > Does somebody know what exactly these setting do ? My naive understanding > is that these flags enable ANT to launch parallel builds, if so there are > multiple instances of the compiler running on separate VMs ? This would > not raise the sort of spurious data modification problems we are seeing ? > No, not separate VM's, at least from Ant's point of view. Same VM. I think they control how PDE created the compile.<featurename> like I attached to comment 11. And ... now that I actually read about the 'parallel' ant task ... http://ant.apache.org/manual/Tasks/parallel.html it makes it sound like "compiling" is not something that should be parallelized?!! But, depends a little on the "build.jar" target. It might launch a separate VM, but doubt it ... compiling would take forever then! Maybe we've just "been lucky" all these years, until some recent changes broke what little thread safety you had? Even if separate instances of the "Compiler class" are created to compile each file (which would be my guess), from some snippets of your code, it sounds like you use some class (static) variables or methods? Such as in comment 21, you have flowInfo = FakedTrackingVariable.markPassedToOutside(currentScope, argument, flowInfo, false); Is FakedTrackingVariable.markPassedToOutside a static method? Sounds like it, from naming convention. If so, even multiple instances of compiler class (in same VM) would interfere with each other. That woudln't happen to be your first use of static methods, would it?
For what its worth, I've skim read the "build.jar" PDE task and it doesn't appear to create a new VM, at least in no obvious way. Appears to just (eventually) call the build.xml file it creates for each bundle ... which compiles the source of that bundle, with an ant task similar to <!-- compile the source code --> <javac destdir="${build.result.folder}/@dot" failonerror="${javacFailOnError}" verbose="${javacVerbose}" debug="${javacDebugInfo}" includeAntRuntime="no" bootclasspath="${bundleBootClasspath}" source="${bundleJavacSource}" target="${bundleJavacTarget}" errorProperty="compilation.error.occured" > <compilerarg line="${compilerArg}" compiler="${build.compiler}"/> <classpath refid="@dot.classpath" /> <src path="src-catalog/" /> <src path="src-contentmodel/" /> <src path="src-validation/" /> <src path="src/" /> <src path="src-emfModelSynch/" /> <src path="src-search/" /> <compilerarg value="@${basedir}/javaCompiler...args" compiler="org.eclipse.jdt.core.JDTCompilerAdapter" /> <compilerarg line="-log '${build.result.folder}/@dot${logExtension}'" compiler="org.eclipse.jdt.core.JDTCompilerAdapter" /> </javac> (this happens to be for our xml.core bundle ... not one that's had any problems) ... I paste it in here to show how it all (sort of) comes down to compiler="org.eclipse.jdt.core.JDTCompilerAdapter" figured you might have first hand knowledge of that class? :) I assume it creates an _instance_ of the compiler? to use for that source? So, pretty sure we have multiple instances of the compiler class, each running in its own thread. So, not sure "instances" have to be thread safe ... but ... if this is your first use of static methods, such as FakedTrackingVariable.markPassedToOutside, then that might explain everything? I'm obviously just making guesses based on names starting with capital letter. If you have always done that sort of thing ... maybe we have just been lucky for years?
(In reply to comment #47) Thanks David for the build command. JDTCompilerAdapter is the entry point into ecj instead of using javac. The attribute fork is not set and hence it means the compilation is happening in the same process and hence there could be parallel threads. FakedTrackingVariable.markPassedToOutside is a static method but doesn't cause any sharing of any data. They are no static,non-final fields.
(In reply to comment #44) > A strategy to trigger a JIT bug could be to compile a file that already failed > once over and over again. JITting is usually triggered by heavy usage of a > method. Good point. Very conveniently for us, the utility class org.eclipse.jdt.launching/launching/org/eclipse/jdt/internal/launching/CompositeId.java referenced in https://bugs.eclipse.org/bugs/show_bug.cgi?id=372944 can be compiled all by itself with no dependencies etc. Satyam, could you create a test based on this file where we would repeatedly compile this file a few thousand times and see if anything shows up ? We need to use the identical VM used by bug 372944, I have asked for it there. With a bit of effort, we can isolate a smaller case for Disassembler.java too. Satyam, could you look into this ? Thanks.
(In reply to comment #49) > Satyam, could you create a test based on this file where we would repeatedly > compile this file a few thousand times and see if anything shows up ? This could be a better idea. I am trying to setup the complete build and having problems. I will try out this. Thanks.
Just to answer a question David asked: There haven't been any hardware changes to build.eclipse.org since the original PPC box failed and was replaced. -M.
Hi David/Carl, Can you please let us know the jvm options that you think the build uses? Thanks.
The 3.8 build doesn't actually run on Hudson. It still runs on IBM hardware. We are working on getting that running on foundation hardware. The bootstrap script that actually starts the build is org.eclipse.releng.eclipsebuilder\bootstrap.sh in /cvsroot/eclipse You can see that there are many JDKs that we use in the build to compile. The these are the execution environments we use linuxJdkArchive=jdks/jdk-1_5_0_22-fcs-bin-b03-linux-i586-09_oct_2009.zip linuxJdkArchive14=jdks/jdk-1_4_2_14-fcs-bin-b05-linux-i586-14_mar_2007.zip linuxJdkArchive16=jdks/1.6/jdk-6u17-fcs-bin-b04-linux-i586-11_oct_2009.zip linuxppcJdkArchive=jdks/IBMJava2-SDK-1.4.2-10.0.ppc.tgz linuxppcJdkArchive15=jdks/ibm-java2-sdk-5.0-6.0-linux-ppc.tgz windowsJreArchive=jdks/jdk-1_4_2_16-fcs-bin-b05-windows-i586-16_sep_2007.zip windows15JdkArchive=jdks/jdk-1_5_0_22-fcs-bin-b03-windows-i586-09_oct_2009.zip windows16JdkArchive=jdks/1.6/jdk-6u17-fcs-bin-b04-windows-i586-11_oct_2009.zip windows10FoundationArchive=jdks/weme-win-x86-foundation10_6.1.0.20060317-111429.zip windows11FoundationArchive=jdks/weme-win-x86-ppro11_6.1.1.20061110-161633.zip The bootclass paths are as follows bootclasspath="$builderDir/jdk/win32/jdk1.4.2_16/jre/lib/rt.jar:$builderDir/jdk/win32/jdk1.4.2_16/jre/lib/jsse.jar:$builderDir/jdk/win32/jdk1.4.2_16/jre/lib/jce.jar" bootclasspath_15="$builderDir/jdk/win32_15/jdk1.5.0_22/jre/lib/rt.jar:$builderDir/jdk/win32_15/jdk1.5.0_22/jre/lib/jsse.jar:$builderDir/jdk/win32_15/jdk1.5.0_22/jre/lib/jce.jar" bootclasspath_16="$builderDir/jdk/win32_16/jdk6_17/jre/lib/rt.jar:$builderDir/jdk/win32_16/jdk6_17/jre/lib/jsse.jar:$builderDir/jdk/win32_16/jdk6_17/jre/lib/jce.jar" bootclasspath_foundation="$builderDir/jdk/win32_foundation/lib/jclFoundation10/classes.zip" bootclasspath_foundation11="$builderDir/jdk/win32_foundation11/lib/jclFoundation11/classes.zip" The build itself is actually invoked with this VM jdk/linux16/jdk6_17/jre/bin/java And the execution environments are as follows when the build is invoked, see J2SE-1.5= for example $builderDir/jdk/linux16/jdk6_17/jre/bin/java -Xmx500m -Declipse.p2.MD5Check=false -Dorg.eclipse.update.jarprocessor.pack200=$builderDir/jdk/linux/jdk1.5.0_22/bin -jar ../org.eclipse.releng.basebuilder/plugins/org.eclipse.equinox.launcher.jar -Dosgi.os=linux -Dosgi.ws=gtk -Dosgi.arch=ppc -application org.eclipse.ant.core.antRunner -Declipse.p2.MD5Check=false" -q -buildfile buildAll.xml $mail $testBuild $compareMaps -DmapVersionTag=$mapVersionTag -DpostingDirectory=$postingDirectory -Dbootclasspath=$bootclasspath -DbuildType=$buildType -D$buildType=true -DbuildId=$buildId -Dbuildid=$buildId -DbuildLabel=$buildLabel -Dtimestamp=$timestamp -DmapCvsRoot=:ext:kmoir@dev.eclipse.org:/cvsroot/eclipse $skipPerf $skipTest $skipPack $tagMaps -DJ2SE-1.5=$bootclasspath_15 -DJ2SE-1.4=$bootclasspath -DCDC-1.0/Foundation-1.0=$bootclasspath_foundation -DCDC-1.1/Foundation-1.1=$bootclasspath_foundation11 -DOSGi/Minimum-1.2=$bootclasspath_foundation11 -DJavaSE-1.6=$bootclasspath_16 -DlogExtension=.xml $javadoc $updateSite $sign $repoCache -DgenerateFeatureVersionSuffix=true -Djava15home=$builderDir/jdk/linux/jdk1.5.0_22/jre -listener org.eclipse.releng.build.listeners.EclipseBuildListener"
Kim, thank you for the info.
Created attachment 212432 [details] static non-final fields in jdt.core (In reply to comment #48) > FakedTrackingVariable.markPassedToOutside is a static method but doesn't cause > any sharing of any data. They are no static,non-final fields. This is correct for the immediate context of FakedTrackingVariable. Out of curiosity I assembled a list of all static non-final fields in JDT/Core (attached, fields from "Messages" classes are omitted). None of these fields looked suspicious to me in terms of this bug, most are not part of the compiler actually. The closest I could see is UnconditionalFlowInfo.CoverageTestId. However, all access to this field is actually dead code (protected by the constant COVERAGE_TEST_FLAG). On a lazy day we may want to check some of these, and make them final wherever possible. Given the design of a single processing thread and also given the lack of static non-final fields in the vicinity of the bug, we don't have a theory how concurrency could possibly affect the compiler, right?
To follow up on the hypothesis of a JIT bug, I started thinking what kind of code change could possibly produce the observed behavior. I was flabbergasted by finding two almost identical methods in class Disassembler: escapeString and decodeStringValue, where the *second* occurrence triggered the bug, not the first. If JIT is the culprit, a translation must have happened right while compiling this class (Disassembler.java). Or the minimal difference between both methods must have contributed to the bug. Either way there're really some odds at work!
http://dev.eclipse.org/mhonarc/lists/platform-releng-dev/msg19736.html In I20120311-2000 there is a compile error in o.e.jdt.ui, but everything is fine in my workspaces (I checked in two distinct workspaces on 2 different machines). Also I do not see anything that has changed in this area for the last couple of days. Is this another instance of this bug, something else?
(In reply to comment #57) > http://dev.eclipse.org/mhonarc/lists/platform-releng-dev/msg19736.html [...] > Is this another instance of this bug, something else? For the record, this was a problem in build input and not another instance of this problem.
For completeness I followed the Hashtable track: The compiler contains a few locations that may iterate over a Set or a Map, in which case the order may be quasi non-deterministic. E.g.: - CompilationResult.getClassFiles() converts a map to an array. - StackMapFrameCodeStream.getExceptionMarkers() iterates over a set - but then sorts the result. However, both these seem to be way too late in the process for influencing the flow analysis. Here's another one closer to the site of crime: - FakedTrackingVariable.pickVarForReporting(Set, BlockScope, boolean) Here we pick the first element from a Set. The method was added on 2012-01-15, i.e., later than N20120110-2000 when the bug was observed for the first time. Additionally, even this method is called too late (at the end of analyzing a block, or *after* reporting unreachable) for influencing the analysis towards this bug, I should say. I scanned all Sets/Maps I could find in the compiler, but none locked like a hot track to me. => Still looking for a root cause of the observed non-determinism.
Bug 372944 comment 11 mentions that the error occurred as a flood of errors actually, which could be seen as an indication towards a JIT bug. Shouldn't we ask people affected by this bug to add "-proceedOnError" to their compiler command lines to check if this will produce the same kind of flood? Is there also an ant setting controlling whether compilation should stop after the first error?
(In reply to comment #60) > Shouldn't we ask people affected by this bug to add "-proceedOnError" to their > compiler command lines to check if this will produce the same kind of flood? Satyam, please check the options used for SDK build to see what is the setting for this ? We would expect it to be not in effect. Turning it on to see if we see a flood of errors would help.
While attempting to reproduce locally against a Oracle VM, we should try both the -server mode and -client mode. Some optimizations kick in only in server mode and so some bugs show up only in server mode: e.g http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5091921.
WTP just got hit with this in another build. In the past, it was one error. This time, it was 111,862 errors- it appears that we got into this state at the beginning, and that carried through to many, many compiles, across plugins and features. I see: UninitializedLocalVariable UninitializedBlankFinalField for the vast majority of these errors, and then other errors for files/plugins that build on top of these. This is still using I20120228-0800 to build.
(In reply to comment #63) > WTP just got hit with this in another build. In the past, it was one error. > This time, it was 111,862 errors- it appears that we got into this state at the > beginning, and that carried through to many, many compiles, across plugins and > features. I see: > UninitializedLocalVariable > UninitializedBlankFinalField > for the vast majority of these errors, and then other errors for files/plugins > that build on top of these. This is still using I20120228-0800 to build. This perfectly matches the observation from bug 372944 comment 11. A JIT bug could nicely explain this: once a given method is badly "optimized", it will consistently reproduce the same bug over and over. Could you please check bug 372944 comment 12 for requests for additional info that might possibly help us? Thanks.
> > This perfectly matches the observation from bug 372944 comment 11. > > A JIT bug could nicely explain this: once a given method is badly "optimized", > it will consistently reproduce the same bug over and over. > > Could you please check bug 372944 comment 12 for requests for additional info > that might possibly help us? Thanks. Comment #45 has the JVM information that we are still using. This is an IBM JVM, so I have not set the flags in question. And our builds occur on the build.eclipse.org machine - I don't have its hardware specifications readily available, but Denis Roy could probably fill in the details.
Build.eclipse.org: 2 Intel Xeon e5540 Quad-core @ 2.53GHz 24 GB RAM 2 Internal HDD 2 Gigabit NICs If you want something more specific I'll have to go digging. -M.
Interesting- the next build only had one UninitializedLocalVariable. The line is different from others being evaluated here: for (int i=0; i < files.length; i++) { It says that the i++ is the uninitialized local variable.
Stumbled across this to diagnose JIT failures - http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/topic/com.ibm.java.doc.diagnostics.50/diag/tools/jitpd.html
Satyam, have you collected a list of bug patterns in this field? So far I can see: (a) static final field with initializer: uninitialized blank final field (b) increment in an old-style for loop: uninitialized local (c) accessing any local var / arg, perhaps only after a throw/return (see bug 372944 comment 18): uninitialized local (d) return or break statement: unreachable code (not sure what is the distinguishing feature in these cases..) Can we match these patterns against VMs or anything?
(In reply to comment #69) > Satyam, have you collected a list of bug patterns in this field? Just a quick note to inform stake holders that we are busy shipping M6 this week. Once that is off the plate, this bug will be #1 priority item for us and we should see faster progress on it.
http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=%2Fcom.ibm.java.doc.diagnostics.50%2Fdiag%2Ftools%2Fjitpd.html It would help if experimental builds of WTP could be undertaken with ecj run on an IBM VM invoked as java -Xjit:verbose,optLevel=scorching and another time with: java -Xjit:verbose,optLevel=noOpt If this a translator/optimizer problem, we would expect the scorching setting to make the failures deterministic and the noOpt setting to make the failures go away. Thanks for any help you are able to offer with this exercise. We will also undertake similar experiments locally once M6 is cooked and out of the door.
(In reply to comment #71) > java -Xjit:verbose,optLevel=noOpt I don't know if the noOpt setting is the same as the -Xint option documented here: http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=%2Fcom.ibm.java.doc.diagnostics.50%2Fdiag%2Ftools%2Fjitpd.html In any case, the scorching setting is more important.
Created attachment 212620 [details] Table of the problems Hi Stephan, Here is the table of the problems we have been seeing.
The fact that this problem occurs on IBM and Oracle VMs doesn't really speak for a JIT bug. AFAIK, the JIT compilers of these VMs are completely separate implementations, so it's unlikely that they share the same bug. I think you should have another look at hashCode()/equals() implementations in lookup environments (flow analysis, bindings, ...). A possible scenario that could lead to the observed effects: - a hashCode() implementation is not completely deterministic (e.g. depends on Object#hashCode()) - an equals() implementation is buggy - the equals() is only used for HashMap/Set lookups => Now, in rare cases where the (hashCode() % mapSize) collide, objects end up in the same bucket and can be confused. If equals() is not often used outside of HashMaps, this can go undetected for a long time. A way to make this more likely to occur is to make hashCode() return a constant. That is legal but will kill performance by turning all O(1) operations into O(n).
(In reply to comment #74) > The fact that this problem occurs on IBM and Oracle VMs doesn't really speak > for a JIT bug. AFAIK, the JIT compilers of these VMs are completely separate > implementations, so it's unlikely that they share the same bug. I agree that it calls for too much of a co-incidence for a very similar bug to show up in two VMs, but stranger things have been known to happen. But while we continue on this line of inquiry, we will not close our minds to other parallel tracks that we can explore. Both on this bug and its "duplicate" we have asked for some specific experiments to be done which we feel will decisively rule out the JIT angle if that is indeed not the culprit. > I think you should have another look at hashCode()/equals() implementations in > lookup environments (flow analysis, bindings, ...). A possible scenario that > could lead to the observed effects: Stephan/Ayush, does this ring a bell as far as the flow analysis code that underwent changes foe 3.8 are concerned ? In my debugging exercises, I found the window at which the flip from good to bad happens to be extremely narrow and the object that gets corrupted all of a sudden is simply being passed around to various methods (i.e is not stored and retrieved) - Nevertheless, what you outline is a worthy hypothesis to check out. It would also be a worthwhile exercise to turn off resource leaks analysis and see if the problem still shows up (assuming that all significant code chunks have been protected by an option-check). (3.8 changed the flow analysis in two major ways : (a) null annotations support which is default off and (b) leaks analysis which is default on) - I would wait for the jit thesis to be disproved first though. We do have the catch that not sighting the problem is proof of the problem being fixed - given how sporadic it is proving to be. The other point is we also need to explain the flood of error messages that we get in some of the reports - These spurious problems are reported across several different files, several different methods. If the flow info data for a method gets corrupted when we are analyzing well inside a method, I would not expect cascading errors in completely different methods or files: This particular behavior would nicely be explained by a JIT bug - once some heavily used method got badly optimized, most/many/all subsequent uses of the method could trigger an additional instance of the problem. All that said, the skepticism you express is healthy and justified.
(In reply to comment #75) > thesis to be disproved first though. We do have the catch that not sighting > the problem is proof of the problem being fixed - given how sporadic it > is proving to be. Obviously I meant: "not sighting the problem is NOT proof of the problem being fixed".
I totally agree with comment 74 and comment 75. (In reply to comment #75) > > I think you should have another look at hashCode()/equals() implementations in > > lookup environments (flow analysis, bindings, ...). A possible scenario that > > could lead to the observed effects: This is a very good point. > Stephan/Ayush, does this ring a bell as far as the flow analysis code > that underwent changes foe 3.8 are concerned ? No hashCode() implementations have been added lately. A Map and a Set have been introduced for the resource leak analysis, but both are only used for reporting a problem after it has been detected. This is not a good match for the observed behavior, but it must be investigated anyway... > It would also be a worthwhile exercise to turn off resource leaks analysis > and see if the problem still shows up (assuming that all significant code > chunks have been protected by an option-check). I'd be much relieved to *know* that this part is innocent. Unfortunately, as you say, we can never prove absence of this error. I guess we're at the point where "likeliness" isn't a useful hint any more. Unfortunately, for my part a systematic investigation of some 200 occurrences of Set/Map and implementations of hashCode() inside the compiler will have to wait till after EclipseCon.
BTW: while torturing WTP's JVM in an Object Teams build I got this in the JIT log: .... + (no-opt) java/util/IdentityHashMap.access$200(Ljava/util/IdentityHashMap;)I @ AF7C1F10-AF7C1F2E + (no-opt) java/util/IdentityHashMap$IdentityHashMapIterator.hasNext()Z @ AF7C1F50-AF7C200E + (no-opt) java/util/IdentityHashMap$KeyIterator.next()Ljava/lang/Object; @ AF7C2030-AF7C209C + (no-opt) java/util/IdentityHashMap$IdentityHashMapIterator.nextIndex()I @ AF7C20B0-AF7C21F3 + (no-opt) java/util/IdentityHashMap.access$600(Ljava/lang/Object;)Ljava/lang/Object; @ AF7C2210-AF7C2237 + (no-opt) java/util/IdentityHashMap.unmaskNull(Ljava/lang/Object;)Ljava/lang/Object; @ AF7C2250-AF7C2294 + (no-opt) java/util/logging/LogManager.reset()V @ AF7C22B0-AF7C2508 + (no-opt) java/util/logging/LogManager.getLoggerNames()Ljava/util/Enumeration; @ AF7C2530-AF7C2594 + (no-opt) java/util/logging/LogManager.resetLogger(Ljava/lang/String;)V @ AF7C25B0-AF7C2753 + (no-opt) java/util/logging/Logger.getHandlers()[Ljava/util/logging/Handler; @ AF7C2770-AF7C27EC + (no-opt) java/util/IdentityHashMap$KeySet.iterator()Ljava/util/Iterator; @ AF7C2810-AF7C289B + (no-opt) java/util/IdentityHashMap$KeyIterator.<init>(Ljava/util/IdentityHashMap;Ljava/util/Identi tyHashMap$1;)V @ AF7C28B0-AF7C28DE + (no-opt) java/util/IdentityHashMap$KeyIterator.<init>(Ljava/util/IdentityHashMap;)V @ AF7C28F0-AF7C2928 + (no-opt) java/util/IdentityHashMap$IdentityHashMapIterator.<init>(Ljava/util/IdentityHashMap;Ljava/util/IdentityHashMap$1;)V @ AF7C2950-AF7C297E ! java/util/IdentityHashMap$IdentityHashMapIterator.<init>(Ljava/util/IdentityHashMap;)V Error code=FFFFFFFF This happened with -Xjit:verbose,vlog=ibmjvm.log,count=1,optLevel=noOpt and disappeared with -Xint.
Satyam, please attach the test projects, ant scripts etc you are experimenting with here - This will allow Stephan to critique them as well play around with them. (Let us err on the side of over-communicating/documenting side rather than on the under-communicating side)
(In reply to comment #78) > BTW: while torturing WTP's JVM in an Object Teams build I got this in the JIT > log: [...] > java/util/IdentityHashMap$IdentityHashMapIterator.<init>(Ljava/util/IdentityHashMap;)V > Error code=FFFFFFFF What exactly is this showing, Stephan ? > This happened with -Xjit:verbose,vlog=ibmjvm.log,count=1,optLevel=noOpt and > disappeared with -Xint. This shows noOpt != Xint (see comment#72). (In reply to comment #77) [...] > > It would also be a worthwhile exercise to turn off resource leaks analysis > > and see if the problem still shows up (assuming that all significant code > > chunks have been protected by an option-check). > > I'd be much relieved to *know* that this part is innocent. Unfortunately, as > you say, we can never prove absence of this error. There is one outcome that could clarify matters - if resource analysis is turned off and the problem still shows up, we will know the problem is certainly elsewhere (assuming all code is protected). We can ask for this experiment to be done after the current set of requests have been attended to.
(In reply to comment #80) > > java/util/IdentityHashMap$IdentityHashMapIterator.<init>(Ljava/util/IdentityHashMap;)V > > Error code=FFFFFFFF > > What exactly is this showing, Stephan ? I haven't the slightest. Except that JIT wasn't happy (and code failed to run - a class could not be loaded). > > This happened with -Xjit:verbose,vlog=ibmjvm.log,count=1,optLevel=noOpt and > > disappeared with -Xint. Using a slightly less drastic setting (count=9) only moved the error to: ! java/util/logging/Logger.getHandlers()[Ljava/util/logging/Handler; Error code=FFFFFFFF
Created attachment 212729 [details] Test project being used to reproduce the problem Here is the project that I am trying to run on some linux machines to reproduce. The build.cmd is the command line that I am running to reproduce the problem. Note that the vm the build.cmd has is the previous VM. Now, I am running with the b147 Sun JDK7. Please let me know if you want to get some more code added.
(In reply to comment #82) > Please let me know if you want to get some more code added. We may have to use full source files as opposed to skeleton versions. Eliminating fields from a class shifts the local slot id I think. Likewise, there could be other side effects if we try to work with simpler cases to begin with. Can we start with the offending source files in their entirety and pull in one by one their dependencies to form the minimal buy complete file set ?
(In reply to comment #83) I agree. As of now CompositeId.java is in full but not the other files. Sure, I will try to add that stuff. Thanks.
Wow, this VM (ibm-java2-i386-50-SR12FP5) is a fragile beast, playing a bit more with its jit option I frequently see the VM die like this: [java] +0x48c99 [java] +0x1ec60 [java] /opt/public/webtools/apps/ibm-java2-i386-50-SR12FP5/jre/bin/libj9jit23.so(+0x1612fb [java] +0x1612fb [java] +0x161222 [java] +0x16117c [java] +0x161033 [java] +0x160f3c [java] +0x13f0f4 [java] +0x13e638 [java] +0x13e0e8 [java] +0x86384 [java] +0x861a2 [java] +0x4b554 [java] +0x4b1c7 [java] +0x1e50a [java] +0x4a996 [java] +0x49a2b [java] +0x1e50a [java] +0x4980e [java] +0x3365 [java] +0x67b9 [java] clone+0x5e [java] Unhandled exception [java] Type=Segmentation error vmState=0x000507ff [java] J9Generic_Signal_Number=00000004 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001 [java] Handler1=F74EACFE Handler2=F74ABB69 InaccessibleAddress=00000014 [java] EDI=BCBA2740 ESI=BCBA25EC EAX=00000000 EBX=BCBA2740 [java] ECX=BC5DA128 EDX=BCBA25EC [java] EIP=F70D4B9F ES=002B DS=002B ESP=BCC88920 [java] EFlags=00210246 CS=0023 SS=002B EBP=BCC88968 [java] Module=/opt/public/webtools/apps/ibm-java2-i386-50-SR12FP5/jre/bin/libj9jit23.so [java] Module_base_address=F6F73000 [java] [java] Method_being_compiled=org/eclipse/jdt/internal/compiler/lookup/BlockScope.getBinding([[CILorg/eclipse/jdt/internal/compiler/lookup/InvocationSite;Z)Lorg/eclipse/jdt/internal/compiler/lookup/Binding; [java] Target=2_30_20110623_85457_lHdSMr (Linux 2.6.32.45-0.3-default) It'll be hard to force JITing in the desired way if the VM simply responds with segv.
(In reply to comment #85) > It'll be hard to force JITing in the desired way if the VM simply responds with > segv. Just so, we are on the same page, could you show the full command line and settings ? I assume you are using the scorching mode in a blanket fashion ? The documentation from comment#68, has some information on how to set optlevel for specific methods IIRC - so we should be able to opt out of the minefields - though it could get cumbersome.
(In reply to comment #74) [...] > I think you should have another look at hashCode()/equals() implementations in > lookup environments (flow analysis, bindings, ...). A possible scenario that > could lead to the observed effects: I investigated this trail some more and here are some observations: Test set up: - Start with I20120228-0800 build. - Import JDT/Core as a source project. - Create a Java application launch configuration with org.eclipse.jdt.internal.compiler.batch.Main as the main class and -source 6.0 c:\CompositeId.java as arguments. [Need to check what source level I should actually be using] Between CompilationUnitDeclaration.analyseCode() and CompilationUnitDeclaration.generateCode() (i.e the window in which the problem surfaces) I had a breakpoint in all equals methods + Object's and I had a breakpoint in all hashCode methods without Object's - I couldn't plant a method entry breakpoint in the native method Object.hashCode for some reason, but since Object.equals had a breakpoint, we should be good. I could see equals & hashCode called in the following call stacks only. UnresolvedReferenceBinding(ReferenceBinding).hashCode() line: 873 SimpleLookupTable.get(Object) line: 68 LookupEnvironment.updateCaches(UnresolvedReferenceBinding, ReferenceBinding) line: 1501 UnresolvedReferenceBinding.setResolvedType(ReferenceBinding, LookupEnvironment) line: 84 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, boolean, AccessRestriction) line: 686 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, AccessRestriction) line: 676 Compiler.accept(IBinaryType, PackageBinding, AccessRestriction) line: 298 LookupEnvironment.askForType(char[][]) line: 127 UnresolvedReferenceBinding.resolve(LookupEnvironment, boolean) line: 54 BinaryTypeBinding.resolveType(TypeBinding, LookupEnvironment, boolean) line: 131 BinaryTypeBinding.superInterfaces() line: 1329 BinaryTypeBinding.hasTypeBit(int) line: 998 ParameterizedTypeBinding.hasTypeBit(int) line: 636 FakedTrackingVariable.isAnyCloseable(TypeBinding) line: 502 LocalDeclaration.analyseCode(BlockScope, FlowContext, FlowInfo) line: 82 MethodDeclaration.analyseCode(ClassScope, FlowContext, FlowInfo) line: 110 TypeDeclaration.internalAnalyseCode(FlowContext, FlowInfo) line: 710 TypeDeclaration.analyseCode(CompilationUnitScope) line: 255 CompilationUnitDeclaration.analyseCode() line: 111 Compiler.process(CompilationUnitDeclaration, int) line: 776 ProcessTaskManager.run() line: 137 Thread.run() line: 736 BinaryTypeBinding(Object).equals(Object) line: 58 SimpleLookupTable.get(Object) line: 71 LookupEnvironment.updateCaches(UnresolvedReferenceBinding, ReferenceBinding) line: 1501 UnresolvedReferenceBinding.setResolvedType(ReferenceBinding, LookupEnvironment) line: 84 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, boolean, AccessRestriction) line: 686 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, AccessRestriction) line: 676 Compiler.accept(IBinaryType, PackageBinding, AccessRestriction) line: 298 LookupEnvironment.askForType(char[][]) line: 127 UnresolvedReferenceBinding.resolve(LookupEnvironment, boolean) line: 54 BinaryTypeBinding.resolveType(TypeBinding, LookupEnvironment, boolean) line: 131 BinaryTypeBinding.superInterfaces() line: 1329 BinaryTypeBinding.hasTypeBit(int) line: 998 ParameterizedTypeBinding.hasTypeBit(int) line: 636 FakedTrackingVariable.isAnyCloseable(TypeBinding) line: 502 LocalDeclaration.analyseCode(BlockScope, FlowContext, FlowInfo) line: 82 MethodDeclaration.analyseCode(ClassScope, FlowContext, FlowInfo) line: 110 TypeDeclaration.internalAnalyseCode(FlowContext, FlowInfo) line: 710 TypeDeclaration.analyseCode(CompilationUnitScope) line: 255 CompilationUnitDeclaration.analyseCode() line: 111 Compiler.process(CompilationUnitDeclaration, int) line: 776 ProcessTaskManager.run() line: 137 Thread.run() line: 736 UnresolvedReferenceBinding(ReferenceBinding).hashCode() line: 873 SimpleLookupTable.get(Object) line: 68 LookupEnvironment.updateCaches(UnresolvedReferenceBinding, ReferenceBinding) line: 1501 UnresolvedReferenceBinding.setResolvedType(ReferenceBinding, LookupEnvironment) line: 84 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, boolean, AccessRestriction) line: 686 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, AccessRestriction) line: 676 Compiler.accept(IBinaryType, PackageBinding, AccessRestriction) line: 298 LookupEnvironment.askForType(char[][]) line: 127 UnresolvedReferenceBinding.resolve(LookupEnvironment, boolean) line: 54 BinaryTypeBinding.resolveType(TypeBinding, LookupEnvironment, boolean) line: 131 BinaryTypeBinding.superclass() line: 1303 BinaryTypeBinding(ReferenceBinding).isUncheckedException(boolean) line: 1264 LoopingFlowContext(FlowContext).checkExceptionHandlers(TypeBinding[], ASTNode, FlowInfo, BlockScope) line: 328 MessageSend.analyseCode(BlockScope, FlowContext, FlowInfo) line: 117 MessageSend(Expression).analyseCode(BlockScope, FlowContext, FlowInfo, boolean) line: 227 MessageSend.analyseCode(BlockScope, FlowContext, FlowInfo) line: 68 LocalDeclaration.analyseCode(BlockScope, FlowContext, FlowInfo) line: 92 Block.analyseCode(BlockScope, FlowContext, FlowInfo) line: 39 WhileStatement.analyseCode(BlockScope, FlowContext, FlowInfo) line: 119 MethodDeclaration.analyseCode(ClassScope, FlowContext, FlowInfo) line: 110 TypeDeclaration.internalAnalyseCode(FlowContext, FlowInfo) line: 710 TypeDeclaration.analyseCode(CompilationUnitScope) line: 255 CompilationUnitDeclaration.analyseCode() line: 111 Compiler.process(CompilationUnitDeclaration, int) line: 776 ProcessTaskManager.run() line: 137 Thread.run() line: 736 UnresolvedReferenceBinding(Object).equals(Object) line: 58 SimpleLookupTable.get(Object) line: 71 LookupEnvironment.updateCaches(UnresolvedReferenceBinding, ReferenceBinding) line: 1519 UnresolvedReferenceBinding.setResolvedType(ReferenceBinding, LookupEnvironment) line: 84 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, boolean, AccessRestriction) line: 686 LookupEnvironment.createBinaryTypeFrom(IBinaryType, PackageBinding, AccessRestriction) line: 676 Compiler.accept(IBinaryType, PackageBinding, AccessRestriction) line: 298 LookupEnvironment.askForType(char[][]) line: 127 UnresolvedReferenceBinding.resolve(LookupEnvironment, boolean) line: 54 BinaryTypeBinding.resolveType(TypeBinding, LookupEnvironment, boolean) line: 131 BinaryTypeBinding.superclass() line: 1303 BinaryTypeBinding(ReferenceBinding).isUncheckedException(boolean) line: 1264 LoopingFlowContext(FlowContext).checkExceptionHandlers(TypeBinding[], ASTNode, FlowInfo, BlockScope) line: 328 MessageSend.analyseCode(BlockScope, FlowContext, FlowInfo) line: 117 MessageSend(Expression).analyseCode(BlockScope, FlowContext, FlowInfo, boolean) line: 227 MessageSend.analyseCode(BlockScope, FlowContext, FlowInfo) line: 68 LocalDeclaration.analyseCode(BlockScope, FlowContext, FlowInfo) line: 92 Block.analyseCode(BlockScope, FlowContext, FlowInfo) line: 39 WhileStatement.analyseCode(BlockScope, FlowContext, FlowInfo) line: 119 MethodDeclaration.analyseCode(ClassScope, FlowContext, FlowInfo) line: 110 TypeDeclaration.internalAnalyseCode(FlowContext, FlowInfo) line: 710 TypeDeclaration.analyseCode(CompilationUnitScope) line: 255 CompilationUnitDeclaration.analyseCode() line: 111 Compiler.process(CompilationUnitDeclaration, int) line: 776 ProcessTaskManager.run() line: 137 Thread.run() line: 736 Nothing stands out at a first glance - but a few observations/questions: (1) So if the hashCode/equals hypothesis from comment#74 were to hold true, then the data structure concerned has to be the SimpleLookupTables maintained by LookupEnvironment and updated at LookupEnvironment.updateCaches (2) The comments in LookupEnvironment.updateCaches say // hashCode is based on compoundName so this works - cannot be raw since type of parameterized type this is blatantly not so - see ReferenceBinding.hashCode() - though I don't see this contributing to a problem. (3) Short circuiting leak analysis on option: This is not fully effective. In several places, there is code of the form: boolean shouldAnalyseResource = this.binding != null && flowInfo.reachMode() == FlowInfo.REACHABLE && FakedTrackingVariable.isAnyCloseable(this.initialization.resolvedType) && currentScope.compilerOptions().analyseResourceLeaks; isAnyCloseable call executes a lot of code potentially, walks the type hierarchy and causes various updates to LE etc. (4) A bunch of types override equals without overriding hashCode, but nothing obviously amiss: BinaryField, BinaryType, CompilationUnit ExternalJavaProject , ImportContainer ImportDeclaration , JarIndexLocation JavaModel, SourceRefElement SourceType StackMapFrameCodeStream$ExceptionMarker
hashCode/equals of JavaElement and subtypes look OK to me. Only PackageFragmentRoot#equals(Object) is a bit hard to understand, since it just does an "instanceof PackageFragmentRoot" check and then proceeds even if o is of a subtype of PackageFragmentRoot. But I think it's safe, since only resource==null could yield a conflict, but this can only happen in subtypes, and both subtypes override equals(..). For bindings, ArrayBinding#hashCode() and WildcardBinding#hashCode() look problematic, since their #swapUnresolved(..) methods change the hashCode() during the lifetime of the object in some cases. However, I don't know enough about the call sequences there to judge whether that's relevant.
(In reply to comment #87) > (1) So if the hashCode/equals hypothesis from comment#74 were to hold > true, then the data structure concerned has to be the SimpleLookupTables > maintained by LookupEnvironment and updated at LookupEnvironment.updateCaches Not so. The above is true as far as Object's method overrides are concerned, but we have some other hashCode and equals methods that differ in signature such as CharOperation.hashCode(char[]) that are references in custom hash tables that need to be studied too.
Created attachment 212822 [details] Patch for force hashCode methods to return 0 (In reply to comment #74) > A way to make this more likely to occur is to make hashCode() return a > constant. That is legal but will kill performance by turning all O(1) > operations into O(n). This patch forces all hashCode methods to return 0. I ran the JDT/Core tests with it - it triggered a bunch of "failures" where basically the order of expected messages and actual messages differ to different discovery order resulting from hashCode collision - None of these is a real issue. No flow analysis related anomaly was triggered - However the test suite is really made of numerous independent small tests and is not a good data point for the type of exercise at hand. We would like the SDK built with a compiler that has these changes. Kim/David, how can we help you and make it easy for you to carry out this experiment for us ? Some alternatives: (1) Releng can apply this patch, build a compiler and use that to build SDK on an experimental basis (not the official builds) (2) We can release this change into master, wait for a nightly build to happen and then rollback the changes - Releng can then use the night build bits to build the SDK. (3) We can build ecj.jar for you with this changes and hand over the batch compiler that you can run and report back (4) Any other alternatives ? Please let us know what would work best for you -
(In reply to comment #90) > No flow analysis related anomaly was triggered - However the test suite > is really made of numerous independent small tests and is not a good data > point for the type of exercise at hand. > > We would like the SDK built with a compiler that has these changes. > ... Why bother releng with this exercise? With constant hashCode() aren't we now looking for a deterministic, platform independent variant of the bug? Anyway, I applied the patch and compiled a SDK-like set of projects in a runtime workbench. None of our suspects were sighted. Am I missing anything?
(In reply to comment #91) > Why bother releng with this exercise? With constant hashCode() aren't we now > looking for a deterministic, platform independent variant of the bug? Yes, but there could still be some pattern in the SDK code base that triggers it. Every one of the reports have come from eclipse SDK builds, plain vanilla or with additional functionality. > Anyway, I applied the patch and compiled a SDK-like set of projects in a > runtime workbench. None of our suspects were sighted. I don't know what is in this SDK-line set of projects - Given two sightings were from vanilla SDK builds, if we undertake one full build with the constant hash code compiler, it would allow us to validate some theories. (I realize that even a full good build does not necessarily rule out the hashCode/equals problem hypothesis - the constant hashCode only makes it much more likely for the bug to show up - does not make it a certainty) I agree such requests are burdensome on another project - but we have had no success in the past recreating a full SDK build locally.
(In reply to comment #82) I also tried building the full plugin org.eclipse.equinox.p2.updatesite (Errors popped up in this plugin bug 372944 comment 11 ) for around 100,000 times without any luck in reproducing the problem.
We have found a concurrency bug in the resource leak analysis functionality that explains most of the problems we have been seeing. Or to put it conservatively, at least one major cause for all the weird error messages has been uncovered and whether there are similar other bugs is something to be investigated. Looking at the code for FakedTrackingVariable.getNullStatusAggressively there is a chunk that looks like this: private int getNullStatusAggressively(LocalVariableBinding local, FlowInfo flowInfo) { int reachMode = flowInfo.reachMode(); int status = 0; try { // unreachable flowInfo is too shy in reporting null-issues, temporarily forget reachability: if (reachMode != FlowInfo.REACHABLE) flowInfo.tagBits &= ~FlowInfo.UNREACHABLE; status = flowInfo.nullStatus(local); } finally { // reset flowInfo.tagBits |= reachMode; } If the flowInfo parameter happens to refer to the static object FlowInfo.DEAD_END, then in the window of time between when we "temporarily forget reachability" and when we "reset" it in the finally block, other threads would observe the static object FlowInfo.DEAD_END in a bad state. org.eclipse.jdt.core.tests.compiler.regression.ResourceLeakTests.test056e() is one test that shows that the parameter could refer to the static object DEAD_END. By initializing the static object DEAD_END incorrectly, i.e instead of //---- right stuff public static final UnconditionalFlowInfo DEAD_END; static { DEAD_END = new UnconditionalFlowInfo(); DEAD_END.tagBits = UNREACHABLE; } if we now deliberately initialize // ---- deliberate incorrect initialization public static final UnconditionalFlowInfo DEAD_END; static { DEAD_END = new UnconditionalFlowInfo(); DEAD_END.tagBits = REACHABLE; // WRONG INIT !!! } we are able to force most of the problems to manifest themselves. Satyam will shortly document specific cases for each of the four classes of bogus errors we have seen - (1) unreachable code, (2) uninitialized local (3) Blank final already initialized (4) Blank final not initialized. For this to bug to manifest, there should be parallel compiles going on in the same VM. JDT/Core by itself does not parallellize compiles, but when this happens under control of ANT or other such build manager tools, the concurrency issue could act up. (see comment#8 and comment#46). Otherwise there are no observers for the bad data. This explains why most users are not hitting this bug and why all the reports have come from build masters/releng czars. This code was originally introduced via commit 1bf30b93f1c2f17c02d0cecfa43e877f00d01800 on 2011-09-29 and then got refactored into its present place - From a timeline perspective this spans the entire window - from the very first bug referred to in comment#5 all the way to the most recent incident reports. Satyam, please document how to reproduce each of the wrong class of diagnostics -- TIA. Stephan, please prepare a patch - it would be great if we could deliver it for this week's I build -- TIA.
(In reply to comment #94) > Stephan, please prepare a patch - it would be great if we could > deliver it for this week's I build -- TIA. While we are at it, could we also make sure that all leak analysis code is under the option check - I noticed quite a few places around the use of FakedTrackingVariable.isAnyCloseable that cause non-trivial amount of code to be executed checking whether an object is a closeable only to subsequently check the option. e.g : analyseCode in LocalDeclaration and AllocationExpression. (Such meticulous guarding will help if and when we want to disable an optional feature to see if a certain problem goes away and to provide an escape hatch thus.)
(In reply to comment #55) > Created attachment 212432 [details] > static non-final fields in jdt.core > Out of curiosity I assembled a list of all static non-final fields in JDT/Core > (attached, fields from "Messages" classes are omitted). None of these fields > looked suspicious to me in terms of this bug, most are not part of the compiler Satyam had closely scrutinized this whole list - the catch turns out to be that FlowInfo.DEAD_END **is** a static final field. It is the referent that is getting corrupted and not the reference and since there is no way to specify in Java that an object is immutable, there is no compiler offered guarantee - that also complicated analysis and it easily slipped through the cracks.
bug374651 shows another case of unexpected compilation errors. In this case there are several static final String constants that it thinks may not have been initialized in this class: http://git.eclipse.org/c/equinox/rt.equinox.p2.git/tree/bundles/org.eclipse.equinox.p2.reconciler.dropins/src/org/eclipse/equinox/internal/p2/reconciler/dropins/Activator.java
(In reply to comment #94) - All bugs except the 'Unreachable code' bugs could be reproduced with this change that Srikanth mentioned. ###### > By initializing the static object DEAD_END incorrectly, i.e instead of > > //---- right stuff > public static final UnconditionalFlowInfo DEAD_END; > static { > DEAD_END = new UnconditionalFlowInfo(); > DEAD_END.tagBits = UNREACHABLE; > } > > if we now deliberately initialize > > // ---- deliberate incorrect initialization > public static final UnconditionalFlowInfo DEAD_END; > static { > DEAD_END = new UnconditionalFlowInfo(); > DEAD_END.tagBits = REACHABLE; // WRONG INIT !!! > } > ####### - 'Unreachable code' in the switch-case block could be reproduced using the following mechanism. i) Debug the compilation of 'Disassembler.java' case through the debugger. ii) Stop at SwitchStatement.java:90 (caseInits = caseInits.mergedWith(flowInfo.unconditionalInits());) iii) Change the value of caseInits.tagBits to 0 iv) Stop at SwitchStatement.java:105 for the break statement v) Change back the value of caseInits.tagBits to 3 This will help to reproduce the case in comment 5.
*** Bug 374651 has been marked as a duplicate of this bug. ***
I looked at all locations modifying org.eclipse.jdt.internal.compiler.flow.FlowInfo.tagBits org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.definiteInits org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.potentialInits. The former drives the Unreachable code message, while the latter two drive the (a) Uninitialized local (b) Blank final not initialized (c) Blank final already initialized diagnostics. (well (a), (b) and (c) for the first 64 cobined fields + locals, but most of the cases we have been hitting have been tiny methods really). The only post 3.7 modifier of org.eclipse.jdt.internal.compiler.flow.FlowInfo.tagBits is FakedTrackingVariable.getNullStatusAggressively. The only post 3.7 modifiers of org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.definiteInits org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.potentialInits. are org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.addInfoFrom(FlowInfo, boolean) org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.resetAssignmentInfo(int) Ayush, please review these two methods to the extent they change these fields to make sure all is well. Should resetAssignmentInfo method be also be discarding nullness information ?
(In reply to comment #94) > We have found a concurrency bug in the resource leak analysis functionality > that explains most of the problems we have been seeing. Srikanth, congratulations for this truly great catch!! And a big sorry for putting in the bug in the first place. > Stephan, please prepare a patch - it would be great if we could > deliver it for this week's I build -- TIA. I'll pick up work on this in an hour or so ... (In reply to comment #96) > Satyam had closely scrutinized this whole list - the catch turns out to be > that FlowInfo.DEAD_END **is** a static final field. It is the referent that > is getting corrupted and not the reference and since there is no way to > specify in Java that an object is immutable, there is no compiler offered > guarantee - that also complicated analysis and it easily slipped through the > cracks. Very true. When I saw your analysis I started to think of how to simulate immutable objects in Java.
Created attachment 212874 [details] proposed fix Here's the tiny proposed fix for this big bug. If flowInfo == DEAD_END the whole code section has no effect, because the DEAD_END has no null information. I little "if (flowInfo != FlowInfo.DEAD_END)" seems to be all it takes. I'm currently running the tests. Next I will dig through the previous comments, and try to construct a meaningful test.
(In reply to comment #100) > I looked at all locations modifying > > org.eclipse.jdt.internal.compiler.flow.FlowInfo.tagBits > > [...] > > The only post 3.7 modifier of > org.eclipse.jdt.internal.compiler.flow.FlowInfo.tagBits is > FakedTrackingVariable.getNullStatusAggressively. Most modifiers of tagBits closely follow the pattern to exit early if (this == DEAD_END) or ((this.tagBits & UNREACHABLE_OR_DEAD) != 0). I found only one exception: UFI.unconditionalFieldLessCopy() has no such checks, which looks inconsistent to me. Secondly, in class NullInfoRegistry the check "if (this != DEAD_END)" is ridiculous actually, DEAD_END is of type UnconditionalFlowInfo, *not* NullInfoRegistry :) I don't think either of these causes harm in the sense of this bug, maybe just wasting a few CPU cycles.
(In reply to comment #102) > Created attachment 212874 [details] > proposed fix > > Here's the tiny proposed fix for this big bug. > [...] > I'm currently running the tests. RunJdtCoreTests answered: PASS
I've also reverted the compiler for tonight's build to the one from M5.
(In reply to comment #105) > I've also reverted the compiler for tonight's build to the one from M5. I'm afraid for safely avoiding this bug you'd have to go all the way back to M2. Not sure if that will trigger other issues (e.g., warnings suppressed in the code which aren't even raised -> spurious compile warnings re unused @SuppressWarnings - or code relying on some recent bug fixes?). In order to upgrade to a compiler *with* this fix, we need a successful I-Build first? Or would it make sense to "sneak in" a hand-made ecj with this fix?
Created attachment 212880 [details] improved fix (In reply to comment #95) > While we are at it, could we also make sure that all leak analysis > code is under the option check - I noticed quite a few places > around the use of FakedTrackingVariable.isAnyCloseable that cause > non-trivial amount of code to be executed checking whether an > object is a closeable only to subsequently check the option. > e.g : analyseCode in LocalDeclaration and AllocationExpression. > > (Such meticulous guarding will help if and when we want to disable > an optional feature to see if a certain problem goes away and to > provide an escape hatch thus.) In bug 370639 I ensured, that with resource analysis disabled no single instance of FakedTrackingVariable would ever be created. As you correctly note, the static method isAnyCloseable was not fully covered by checks. I fixed this by simply swapping the order of checks in three locations. With these changes I ran all JDT/Core tests with resource analysis disabled by default and with breakpoints in all ctors and static methods of FakedTrackingVariable. Only one of these breakpoints was ever triggered: FTV.cleanupAfterAssignment(). In this case adding another check looks like overkill to me. Also the actual fix has been changed to a simple early exit to improve readability.
Created attachment 212884 [details] same patch with a regression test This patch includes a regression test that triggers one instance of the actual bug in its concurrency-based version. In order to increase the probability I added a guarded sleep(5) into the critical section. With this delay and with disabling the fix the bug seems to show deterministically on my machine. I even saw it once without the delay, but for a good regression test this didn't look reliable to me. Let me know if s.o. sees an issue in this kind of test harness in the code: if (TEST_372319) { // see https://bugs.eclipse.org/372319 try { Thread.sleep(5); // increase probability of concurrency bug } catch (InterruptedException e) { /* nop */ } }
(In reply to comment #106) > (In reply to comment #105) > > I've also reverted the compiler for tonight's build to the one from M5. > > I'm afraid for safely avoiding this bug you'd have to go all the way back to > M2. Stephan, could you dig up the command line switch used to disable the resource analysis and document it here ? It can be a workaround until a fix is released and makes it way into the build environment.
(In reply to comment #109) > (In reply to comment #106) > > (In reply to comment #105) > > > I've also reverted the compiler for tonight's build to the one from M5. > > > > I'm afraid for safely avoiding this bug you'd have to go all the way back to > > M2. > > Stephan, could you dig up the command line switch used to disable > the resource analysis and document it here ? Just say: -warn:-resource > It can be a workaround > until a fix is released and makes it way into the build environment. I was going to release the current version of the fix after another run of all tests, to be sure it's included in the next build. Any objections?
I have released the fix via http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=8654e35d625c39f9786ab81699fccb6057b0bc72 Please let me know if more should be done.
(In reply to comment #110) > > It can be a workaround > > until a fix is released and makes it way into the build environment. > > I was going to release the current version of the fix after another run of all > tests, to be sure it's included in the next build. Any objections? I looked through the changes and they look good. Yes, please release them so they get included for I build today. Question: New test uses buildUniqueComplianceTestSuite 1.6, but TestAll adds it since 1.5 - Is this an issue ? (No idea) I ran the test a few times on Windoze - it passed with the fix and failed without consistently, so we seem to be good. (In reply to comment #108) > Let me know if s.o. sees an issue in this kind of test harness in the code: We have precedence of such "testability improvement" code in JDT/Core. So I don't see an issue - See the aptly named org.eclipse.jdt.internal.core.JavaModelManager.doNotUse() :) Thanks a lot for the quick turn around on the patch Stephan.
(In reply to comment #111) > I have released the fix via > http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=8654e35d625c39f9786ab81699fccb6057b0bc72 This patch looks good.
(In reply to comment #112) > I looked through the changes and they look good. Yes, please release them > so they get included for I build today. Jay, FYI.
(In reply to comment #112) > Question: New test uses buildUniqueComplianceTestSuite 1.6, but > TestAll adds it since 1.5 - Is this an issue ? (No idea) Oops, I mostly copied stuff from BatchCompilerTests and only experimented with switching to 1.6 once. But at a closer look the two locations you mention seem to be independent: when run from TestAll it gets invoked for all levels >= 1.5 and when run as a single test in the IDE only 1.6 is picked. Looks actually OK to me. > Thanks a lot for the quick turn around on the patch Stephan. Don't mention. I owe you a big beer or two for this.
(In reply to comment #100) > The only post 3.7 modifiers of > org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.definiteInits > org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.potentialInits. > > are > > org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.addInfoFrom(FlowInfo, > boolean) > org.eclipse.jdt.internal.compiler.flow.UnconditionalFlowInfo.resetAssignmentInfo(int) > > Ayush, please review these two methods to the extent they change these > fields to make sure all is well. Should resetAssignmentInfo method be > also be discarding nullness information ? All looks ok here. The resetAssignmentInfo discards the info corresponding to a loop counter after the loop, so yes both nullness info and inits info can be discarded for that particular variable.
(In reply to comment #116) > All looks ok here. The resetAssignmentInfo discards the info corresponding to a > loop counter after the loop, so yes both nullness info and inits info can be > discarded for that particular variable. OK. So that it doesn't discard the nullness info today - could this lead to some trouble somehow ?
(In reply to comment #117) > OK. So that it doesn't discard the nullness info today - could this lead > to some trouble somehow ? No, it won't, since no other local is assigned the same id.
We are starting to get "good builds" on our build.eclipse.org 4.2 primary builds. I expect us to have our first "public" build this week (in next few days) ... that is, one we feel comfortable putting on downloads.eclipse.org, even though unit tests won't be running yet. I've opened a "releng" bug to update the compiler in the base builder we are using, but want to wait until after that "first" new I-build. Can one of you JDT core members comment in bug 376450 when you think "yes, the compiler from the I-build xyz is ready for I-build production use"?
(In reply to comment #119) > Can one of you JDT core members comment in bug 376450 when you think "yes, the > compiler from the I-build xyz is ready for I-build production use"? Sure David.
The builds didn't show up these errors for a while. Hence marking this verified. Verified for 3.8M7.