229092 – [batch compiler] error reporting is non-deterministic

Bug 229092 - [batch compiler] error reporting is non-deterministic

Summary: [batch compiler] error reporting is non-deterministic

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.4
Hardware:	Other Linux

Importance:	P3 normal (vote)
Target Milestone:	3.5 M1
Assignee:	Olivier Thomann
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2008-04-28 11:35 EDT by Stephan Herrmann
Modified:	2008-08-06 12:44 EDT (History)
CC List:	3 users (show)

See Also:

Attachments
Proposed patch for (1) (2.02 KB, patch) 2008-07-01 12:14 EDT, Philipe Mulet	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stephan Herrmann

2008-04-28 11:35:08 EDT

this is in v_853:

When compiling more than one compilation unit the batch
compiler may report errors in all units or only the first.

This non-determinism is caused by a racing between
Main.new ICompilerRequester().acceptResult (line 2689)
and the Compiler.
The "normal" case seems to be reporting errors only of the
first CUD, so here is how you see errors in more than one unit:

Given 2 simple classes

public class C1 {
  void bar(C2 other) {
    other.missing();
  }
}

and

public class C2 {
  void foo() {
    this.wrong();
  }
}

and debug the batch compiler with "-classpath . C1.java".
I have two breakpoints:
a) ProcessTaskManager line 144 (last statement in run())
b) Main line 2705 (in acceptResult of the anonymous requester,
   the line where queuedUnits is defined and assigned)

1. Launch
2. See a hit of a) and in this thread do one F6
3. See the other thread hit b) and do six F6's to see
	queuedUnit contains now C2.java
4. One F8 in the TaskManager's thread compiles C2
5. F8 on the main thread reports 2 errors.

This may look contrived but it actually occurred as the difference
between running the same code on two different machines.

Which one is the intended behavior?

Comment 1 Kent Johnson

2008-04-28 16:30:55 EDT

What's your command line ?

And you're sure that C2.class is deleted prior to each test ?

Comment 2 Stephan Herrmann

2008-04-28 17:06:26 EDT

(In reply to comment #1)
> What's your command line ?

From my .launch:
<stringAttribute key="org.eclipse.jdt.launching.MAIN_TYPE" value="org.eclipse.jdt.internal.compiler.batch.Main"/>
<stringAttribute key="org.eclipse.jdt.launching.PROGRAM_ARGUMENTS" value="-classpath . C1.java"/>

> And you're sure that C2.class is deleted prior to each test ?

Yes, actually no class files were ever generated (no -proceedOnError).
If it would still exist, seeing only one error would of course be correct. 

The second error can actually be missed in two ways:

1. Thread TaskProcessManager being quicker than main and emptying
   the list of units to process before logging will pick them up,

2. After picking up the queuedUnits thread main could be too quick
   such as to find the compilationResult still empty when reporting.

Comment 3 Kent Johnson

2008-04-29 16:11:04 EDT

Errors for multiple units can be reported because of the change for bug 123476.

Even using our single threaded batch compiler, which removes the race condition, I still only get the error for the first unit.

Need to look into it more.

Comment 4 Kent Johnson

2008-04-29 16:27:39 EDT

Ok - this is the trace from the batch compiler before we added the multi-threaded support.

[parsing    C1.java - #1/1]
[reading    java/lang/Object.class]
[analyzing  C1.java - #1/1]
[parsing    D:\temp6\.\C2.java - #2/2]
----------
1. ERROR in C1.java (at line 3)
        other.missing();
              ^^^^^^^
The method missing() is undefined for the type C2
----------
1 problem (1 error)


Since C1 only needs the type C2 to exist, C2.java is not completely processed before C1 reports its error.

Now with the latest batch compiler, I get :

[parsing    C1.java - #1/1]
[reading    java/lang/Object.class]
[analyzing  C1.java - #1/1]
[parsing    D:\temp6\.\C2.java - #2/2]
[analyzing  D:\temp6\.\C2.java - #2/2]
----------
1. ERROR in C1.java (at line 3)
        other.missing();
              ^^^^^^^
The method missing() is undefined for the type C2
----------
1 problem (1 error)

And if my machine was quicker, its reasonable to expect C2.java to be finished processing so it had errors to report before C1.java reported its errors.

I'll look into protecting the unitsToProcess during accept, but with the change for bug 123476, I cannot see how we can guarantee that errors for other processed units will not be reported with the errors for the first unit.

If it matters to you that the errors are ALWAYS repeatable in this case, then you can always disable the multi-threaded compiler by setting the system property jdt.compiler.useSingleThreaded=true

Comment 5 Kent Johnson

2008-05-01 13:43:11 EDT

So with the multi-threaded compiler, we cannot guarantee that the reported errors will always be the same.

Different machines & hard drives will process units at different speeds, so as long as proceedOnError is false, the number of errors reported may not be the same.

If its important that you get the same errors each time, you can :

- enable proceedOnError so the compiler deals with every source file

- or, disable the multi-threaded compiler

Is that ok ?

Comment 6 Stephan Herrmann

2008-05-05 12:40:56 EDT

(In reply to comment #5)

> [...] If its important that you get the same errors each time, you can :
> 
> - enable proceedOnError so the compiler deals with every source file
> 
> - or, disable the multi-threaded compiler
> 
> Is that ok ?

Both tricks work and either one is OK for me.
Thanks!

Will the provisional system property still be supported in 3.4 final?
BTW.: it's 
	-Djdt.compiler.useSingleThread=true
not
	-Djdt.compiler.useSingleThreaded=true
;-)

Since non-deterministic behavior of a compiler may not be the most
normal thing to expect, where would one search for documentation
of this behavior (ie., reporting secondary errors that happen to be
detected before the compiler shuts down)?

Comment 7 Kent Johnson

2008-05-07 15:21:12 EDT

Yes the system property (-Djdt.compiler.useSingleThread=true) will be supported in 3.4.

Philippe - do we have any doc on the error reporting behaviour of the batch compiler?

Comment 8 Maxime Daniel

2008-05-13 06:16:26 EDT

(In reply to comment #5)
> So with the multi-threaded compiler, we cannot guarantee that the reported
> errors will always be the same.
Personally, I'd love to get the same errors, at least if we decided to run our regression tests on various hardware with multi-threaded operations, expecting the same results as we got before?

Comment 9 Philipe Mulet

2008-05-13 09:37:45 EDT

The problem only arises when not using -proceedOnError... I believe our tests do use the resilient mode.

Comment 10 Maxime Daniel

2008-05-13 10:28:33 EDT

I admit that they tend to. However, this is a manual option for batch tests (some may have escaped), and I have seen quite a few policy classes that return false for this (not looked how much they were used though).
Question: will the errors be guaranteed to be in the same order in case we use proceedOnError (but not useSingleThread)?

Comment 11 Philipe Mulet

2008-05-13 11:31:41 EDT

Yes, normally they should, since units are always processed in same order.
If you let the compiler consumes the entire queue of units, then you should get errors in exact same order.

If the compiler aborts because it is not proceeding on errors, then it will look for queued units, and dump the errors for these as well (as they may contain important primary error info). The problem is that because of background processing, you may have more or less units in the queue when abort decision is made. When proceed on error is enabled, then no abort decision is made, and the queue is exhausted entirely.

Once we start processing units in parallel, we may see ordering issues again, but this is beyond 3.4.

Comment 12 Maxime Daniel

2008-05-14 02:02:25 EDT

If I get you well, using proceedOnError is expected to buy us reproducibility in 3.4 timeframe, but the option is mandated, and we may change this in 3.5. Last but not least, while we have a workaround, we admit that in at least one known scenario (which is the default by the way) we will emit a random subset of the errors that we'd have got with proceedOnError.
Considering the situation, I would contend that we'd better keep the bug open for 3.5 (preferred), or else resolve it as WONTFIX.

Comment 13 Philipe Mulet

2008-05-16 03:14:29 EDT

I would keep it open for later investigation.

Even if our behavior is tolerable for 3.4 given the workaround, I believe we have a real issue which we need to tackle at some point.

Reopening but not for 3.4

Comment 14 Kent Johnson

2008-05-16 10:03:45 EDT

2 issues:

1. Our default behaviour (-proceedOnError is not enabled) should potentially change to report all errors for all source files that were compiled. So for example, if the command line has only X.java, but we request Y.java & Z.java to find types Y & Z, then we should report all errors found in X, Y & Z.

This would remove the race condition we have now where we check to see what units have problems to report, while accepting the first problem unit.


2. When the multi-threaded compiler starts to process units in several threads & as a result, cannot predict the order that units are processed, we could keep all problems associated with each unit & dump them in a sorted order by default (say using the full path of the unit).

We can no longer predict what order the additional source units (those not included on the command line) will be requested/found. On some machines, Y.java may be the 2nd unit found/processed and on another machine it may be the 3rd. So using the requested id to sort is not a solution.

Comment 15 Philipe Mulet

2008-07-01 12:06:16 EDT

Re: 1. Problem is located in Main#getBatchRequestor(...) where it treats proceedOnError = false mode as a condition to abort on first file with errors.
This is stricter than it should be. It should only record the fact there were some errors, and let it continue. I also checked that the IErrorHandlingPolicy used by batch compiler allows to accumulate all errors as well.

Comment 16 Philipe Mulet

2008-07-01 12:14:12 EDT

Created attachment 106243 [details]
Proposed patch for (1)

This allows the batch compiler to process all files in queue, and only exit at the very end; instead of in the middle, with more or less errors detected up to that point.

Comment 17 Philipe Mulet

2008-07-01 12:15:53 EDT

Stephan - could you apply the patch to 3.4.0 and assess whether this does you some good ? 

I believe (2) would be a separate request if we were to support parallel unit processing, which isn't the case right now (i.e. this bug could be closed independantly)

Comment 18 Stephan Herrmann

2008-07-04 16:56:30 EDT

(In reply to comment #17)
> Stephan - could you apply the patch to 3.4.0 and assess whether this does you
> some good ?

Yes, judging from one run of our test-suite without the single thread switch
all expected errors are actually reported.

Comment 19 Philipe Mulet

2008-07-05 05:29:23 EDT

Wondering if the simple patch couldn't be released for 3.4.1.
Any client to the batch compiler may be seeing this issue as a regression over 3.3.

Comment 20 Philipe Mulet

2008-07-05 05:30:06 EDT

Candidating for 3.4.1

Comment 21 Philipe Mulet

2008-07-08 12:02:29 EDT

Talking with Kent, we may not want to inject it in 3.4 maintenance at this point.

Comment 22 Philipe Mulet

2008-07-08 12:11:29 EDT

Olivier - Could you please release this patch into HEAD (3.5) with regression tests ?

Comment 23 Olivier Thomann

2008-07-22 15:12:18 EDT

I cannot write a regression test as is since the modified code is not run when inside the test harness because we don't want to exit once the compiler is done.
So I'll simply release the patch.
Verification should be done by looking at the source code.

Comment 24 Kent Johnson

2008-08-06 12:44:10 EDT

Verified for 3.5M1 using I20080805-1307