Bug 368152 - ConcurrentModificationException on startup in ExternalFoldersManager.createPendingFolders
Summary: ConcurrentModificationException on startup in ExternalFoldersManager.createPe...
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.7.1   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 3.8 M6   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-09 08:10 EST by Martin Oberhuber CLA
Modified: 2017-11-27 14:43 EST (History)
8 users (show)

See Also:
satyam.kandula: review+
stephan.herrmann: review+


Attachments
Proposed fix (1.58 KB, patch)
2012-01-09 09:52 EST, Jay Arthanareeswaran CLA
no flags Details | Diff
Proposed patch (4.22 KB, patch)
2012-01-20 22:47 EST, Jay Arthanareeswaran CLA
no flags Details | Diff
Updated patch (2.64 KB, patch)
2012-01-21 01:15 EST, Jay Arthanareeswaran CLA
no flags Details | Diff
new fix (4.62 KB, patch)
2012-02-14 07:47 EST, Jay Arthanareeswaran CLA
jarthana: review?
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2012-01-09 08:10:20 EST
Build ID: 3.7.1 M20110909-1335

When launching Eclipse with an existing workspace, I got an error dialog

   An internal error occurred during "Initialize Java Tooling"

and following backtrace was listed in the error log:

java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
	at java.util.HashMap$KeyIterator.next(HashMap.java:828)
	at org.eclipse.jdt.internal.core.ExternalFoldersManager.createPendingFolders(ExternalFoldersManager.java:171)
	at org.eclipse.jdt.core.JavaCore.initializeAfterLoad(JavaCore.java:3614)
	at org.eclipse.jdt.internal.ui.InitializeAfterLoadJob$RealJob.run(InitializeAfterLoadJob.java:36)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Comment 1 Jay Arthanareeswaran CLA 2012-01-09 08:54:07 EST
Looks like it's unlikely to be the same thread that is modifying while iterating. Access to pendingFolders might have to be synchronized. I will work on it.
Comment 2 Jay Arthanareeswaran CLA 2012-01-09 09:52:21 EST
Created attachment 209211 [details]
Proposed fix

One of the possible scenario that could have caused this is, classpath getting recomputed, thus forcing a folder being removed or added, while initializAfterLoad is running.

Martin, can you tell me if something changed between eclipse sessions that may have kicked off classpath recomputation, such as like workspace being moved etc.? 

The patch attached should be good enough for the concurrent modifications issues.
Comment 3 Jay Arthanareeswaran CLA 2012-01-09 09:53:13 EST
Satyam, could you spare some time to review this. Please note that I have not added any tests.
Comment 4 Martin Oberhuber CLA 2012-01-09 12:06:30 EST
(In reply to comment #2)
> Martin, can you tell me if something changed between eclipse sessions that may
> have kicked off classpath recomputation, such as like workspace being moved
> etc.? 

Hm... I don't think so. I definitely 
  - didn't move my workspace, 
  - didn't update any files in my workspace since the last launch
and I'm pretty sure I also didn't change any configuration (like installed host or target platform or JRE).
Comment 5 Satyam Kandula CLA 2012-01-12 03:51:13 EST
(In reply to comment #3)
> Satyam, could you spare some time to review this. Please note that I have not
> added any tests.
The changes look good.
Comment 6 Jay Arthanareeswaran CLA 2012-01-20 00:11:08 EST
Thanks, Satyam, I have released the fix in master.

http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=de08ae3ac42ace8070739ebb7715adc6c4ed1304
Comment 7 Markus Keller CLA 2012-01-20 13:54:26 EST
This caused a deadlock, see bug 369251. Please revert for M5.
Comment 8 Stephan Herrmann CLA 2012-01-20 15:45:18 EST
From locking at the stack dump in bug 369251 and this bug's patch I suppose instead of synchronizing the entire loop we should just fetch a copy of pendingFolders for iteration, wipe the original set and release the lock before starting the actual loop, no?

Alternatively, we would have to acquire all necessary scheduling rules before entering the synchronized block, but that would be more difficult to implement (if it's possible) because the resources are created only inside the loop AFAICS.

Unfortunately, I wouldn't immediately know how to test such a change.
Comment 9 Stephan Herrmann CLA 2012-01-20 15:53:32 EST
Illustration of comment 8:

	Iterator iterator;
	synchronized (this.pendingFolders) {
		iterator = new ArrayList(this.pendingFolders).iterator();
		this.pendingFolders.clear();
	}
	while (iterator.hasNext()) {
		Object folderPath = iterator.next();
		try {
			createLinkFolder((IPath) folderPath, false, externalFoldersProject, monitor);
		} catch (CoreException e) {
			Util.log(e, "Error while creating a link for external folder :" + folderPath); //$NON-NLS-1$
		}
	}
Comment 10 Jay Arthanareeswaran CLA 2012-01-20 22:47:54 EST
Created attachment 209869 [details]
Proposed patch

This patch is to revert the previous change and add a new schedule rule while creating the pending folders from JavaCore. The result of this is as follows:

1. Obviously, bug 369251 and the bug reported here get fixed.
2. There is one more scenario where the race condition (similar to the one reported here) might occur - from the JavaProject#addToResult(). However, it happens only in when the user tries to change the classpath when the pending folders get created. 

Keeping the pendingFolders as a synchronizedSet or introducing a schedule rule in the thread executing addToResult() could have handled it (2), but there are two reasons for not doing that:
   a) It's a rare scenario and is different than what is reported here. We can create a new bug for that.
   b) The JavaProject@addToResult() is often associated with a UI thread, which means it can't be blocked, which can happen with either of the above mentioned approaches.

Stephan, can you please review this patch? Please also note that I have moved the addFolder and createPendingFolders calls together in JavaCore for simplicity sake. This shouldn't affect the earlier fix made in that area.
Comment 11 Jay Arthanareeswaran CLA 2012-01-20 22:56:45 EST
(In reply to comment #10)
> Created attachment 209869 [details]
> Proposed patch

Another important side effect that I forgot to mention is this: if there is any job that obtains a lock on workspace root, then initializeAfterLoadJob must wait for that job and vice versa. However, the lock itself is obtained for a very short time and not during the entire job. This would address the many concurrent thread issues we faced in this area in the past in a better way.
Comment 12 Srikanth Sankaran CLA 2012-01-21 00:25:59 EST
(In reply to comment #10)
> Created attachment 209869 [details]
> Proposed patch
> 
> This patch is to revert the previous change and add a new schedule rule while
> creating the pending folders from JavaCore.

Thanks for putting together an alternate patch quickly.

Just so this is an informed/considered decision, a few questions regarding
the problem outlined in comment#0:

- Is this a ship stopper for M5 ? Or is it something
  that can wait ?
- When did the root cause for the faulty behavior get introduced ?
  Is it a regression, if so which fix caused it ? when ?
  What is the earliest version that shows the faulty behavior ?
- Is there a crisp scenario description for triggering the bug ?
- Are there any workarounds (to prevent triggering) ? How reasonable 
  are they ?
- How likely is the bug's manifestation (per educated guess) ?

If the comment#0  problem is not due to recent behavior change
it looks like a no-brainer decision to me to opt for a roll back
of the problematic fix and consider the alternate patch for inclusion
in early M6.
Comment 13 Srikanth Sankaran CLA 2012-01-21 00:29:48 EST
(In reply to comment #12)

> If the comment#0  problem is not due to recent behavior change
> it looks like a no-brainer decision to me to opt for a roll back
> of the problematic fix and consider the alternate patch for inclusion
> in early M6.

In any case, if we don't have clear agreement on the new fix
by the time of the Sunday warm up build, we should revert the
problematic fix so as not to block partner project's testing.
Comment 14 Jay Arthanareeswaran CLA 2012-01-21 00:47:12 EST
(In reply to comment #12)
> - Is this a ship stopper for M5 ? Or is it something
>   that can wait ?
    The bug itself is not a show-stopper
> - When did the root cause for the faulty behavior get introduced ?
>   Is it a regression, if so which fix caused it ? when ?
>   What is the earliest version that shows the faulty behavior ?

    The code causing the bug was introduced as part of the fix to bug 320618, which was released in 3.7.

> - Is there a crisp scenario description for triggering the bug ?
> - Are there any workarounds (to prevent triggering) ? How reasonable 
>   are they ?
> - How likely is the bug's manifestation (per educated guess) ?

    It's a timing issue and not always reproducible. To reproduce one may have to suspend thread and such. One of the workarounds I can think of is closing and opening the projects again. The bug is likely to manifest when working with large workspace, esp. when something has changed in the workspace between sessions, something like bug 369251, comment #0.
Comment 15 Jay Arthanareeswaran CLA 2012-01-21 01:15:35 EST
Created attachment 209871 [details]
Updated patch

I have split the previous patch into two parts:

1. Reverting the changes made in the first page; I have already released this under bug 369251 just so that HEAD is clean.
2. This patch, an alternate fix to the bug reported here. Stephan, please review this patch. TIA.
Comment 16 Srikanth Sankaran CLA 2012-01-21 06:37:28 EST
(In reply to comment #14)


>     The code causing the bug was introduced as part of the fix to bug 320618,
> which was released in 3.7.

For the record, that was for 3.7 M2. So this bug has been lying
dormant for 16+ months. That makes me think if there is any question
at all around the new fix as it is being reviewed, it can safely
be moved out to early M6.
Comment 17 Stephan Herrmann CLA 2012-01-21 11:27:52 EST
(In reply to comment #10)
> Created attachment 209869 [details]
> Proposed patch
> ...
> Stephan, can you please review this patch? 

Honestly, *if this is for M5* I don't see how I could quickly establish sufficient confidence that the latest patch solves the original problem without causing new issues. Nothing I can judge by a quick look at the code. I'd definitely want to test the patch back to back with the previous state, but for this I would need to reproduce the bug at least *sometimes* in the debugger.

If, however, reverting the previous patch is good enough for M5 and the latest patch shall go into M6 I'll do my best to collaborate with Jay in understanding how to trigger the problem and based on that understanding: how to fix.

As an aside: should we succeed in deterministically forcing the bug using breakpoints in the debugger, I should be able to create a regression test: I recently made experiments how OT/J can force a given execution order of concurrent executions. Even if that's not directly deployable to the SDK build, we could run such regression tests locally at certain points in time.
Comment 18 Dani Megert CLA 2012-01-21 11:57:44 EST
Let's not hurry here.
Comment 19 Srikanth Sankaran CLA 2012-01-31 05:01:28 EST
(In reply to comment #17)
> (In reply to comment #10)
> > Created attachment 209869 [details]
> > Proposed patch
> > ...
> > Stephan, can you please review this patch? 
> 
> Honestly, *if this is for M5* I don't see how I could quickly establish
> sufficient confidence that the latest patch solves the original problem without
> causing new issues.

Stephan, could you look at this now for inclusion in early M6 ? Thanks.
Comment 20 Srikanth Sankaran CLA 2012-02-03 01:40:48 EST
Satyam, could you also review this please ? TIA.
Comment 21 Srikanth Sankaran CLA 2012-02-07 02:16:53 EST
Satyam and Stephan, I know you are guys are very busy, sorry for nagging
you on this: We should try and get this in early in the mile stone period,
we have already seen two I builds go by. Seeking your indulgence, TIA.
Comment 22 Satyam Kandula CLA 2012-02-08 09:39:15 EST
(In reply to comment #10)
>    b) The JavaProject@addToResult() is often associated with a UI thread, which
> means it can't be blocked, which can happen with either of the above mentioned
> approaches.
I see that JavaProject@addToResult() is called from JavaProject#resolveClasspath(). Looking at the call stack, I believe this can be also called from IJavaProject#getResolvedClasspath(...).

I think the logic could be improved to lock only if they are some folders that really need to be created. IMO, this will the most prevalent case.
Comment 23 Jay Arthanareeswaran CLA 2012-02-14 07:47:35 EST
Created attachment 210971 [details]
new fix

(In reply to comment #22)
> I see that JavaProject@addToResult() is called from
> JavaProject#resolveClasspath(). Looking at the call stack, I believe this can
> be also called from IJavaProject#getResolvedClasspath(...).

Satyam, I can't confirm that plugins cannot request for classpath resolution before JavaCore#initializeAfterLoad(), even though I haven't found such a scenario. So, for hypothetical scenarios, I admit we have to consider the other scenario also. The new fix does the following:

1. Obviously the new schedule rule introduced in the previous patch is not mandatory. It could be useful, but not in the context of this bug esp. with the following two changes.
2. The elements of the pendingFolders map are copied to a temporary array and processed and the map is cleared right after copying the elements into the array. This is what Stephan had suggested in his earlier comment.
3. The createLinkFolder calls have been moved out of the synchronized block to keep away the deadlock issue.
Comment 24 Satyam Kandula CLA 2012-02-15 01:39:21 EST
(In reply to comment #23)
> Created attachment 210971 [details]
> new fix
This patch looks good for me. It will fix the current scenario and will also not run into a deadlock.
Comment 25 Jay Arthanareeswaran CLA 2012-02-15 06:26:24 EST
(In reply to comment #24)
> This patch looks good for me. It will fix the current scenario and will also
> not run into a deadlock.

Stephan, if you have time, take a look at this patch. I know you are busy and Satyam has already reviewed the patch. Besides the patch is a straight forward one. So, if you don't have time, that's okay.
Comment 26 Jay Arthanareeswaran CLA 2012-02-16 22:54:06 EST
Thanks, Satyam. Released the fix here: 

http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=e4d4ff5013f9df9fd57cdef7bafc039c1cd3f062
Comment 27 Srikanth Sankaran CLA 2012-02-16 23:11:19 EST
(In reply to comment #25)
> (In reply to comment #24)
> > This patch looks good for me. It will fix the current scenario and will also
> > not run into a deadlock.
> 
> Stephan, if you have time, take a look at this patch. 

Stephan, despite the resolution you are welcome to look the patch
if you manage to find time for it.
Comment 28 Satyam Kandula CLA 2012-03-14 06:53:40 EDT
Verified for 3.8M6 using build I20120312-1300
Comment 29 Jörg Thönnes CLA 2012-04-27 10:09:27 EDT
Hello, is this also fixed for 3.7.2?

Currently, I get this error on every startup and have no idea to workaround it.
Does it do any harm?
Comment 30 Dani Megert CLA 2012-04-27 11:26:58 EDT
(In reply to comment #29)
> Hello, is this also fixed for 3.7.2?

Take a look at the 'target milestone' to get the answer.
Comment 31 Stephan Herrmann CLA 2015-09-15 14:25:47 EDT
Setting a requested +1 flag that was more than overdue :/