Bug 141522 - [compiler][batch] ClassFile#buildAllDirectoriesInto should protect itself against concurrent directory creation
Summary: [compiler][batch] ClassFile#buildAllDirectoriesInto should protect itself aga...
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.2   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: 3.2.1   Edit
Assignee: Maxime Daniel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-12 07:47 EDT by Maxime Daniel CLA
Modified: 2006-09-12 03:55 EDT (History)
1 user (show)

See Also:


Attachments
Patch plus partial test case (8.37 KB, patch)
2006-05-12 10:19 EDT, Maxime Daniel CLA
no flags Details | Diff
Fix with test cases (8.40 KB, patch)
2006-05-15 04:52 EDT, Maxime Daniel CLA
no flags Details | Diff
Fix + test cases (10.54 KB, patch)
2006-05-26 09:23 EDT, Maxime Daniel CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Maxime Daniel CLA 2006-05-12 07:47:40 EDT
Eclipse 3.2 RC4 (source v_666)
The method cited above uses (twice) the following pattern to create new directories:
File f = new File(expectedPath);
if (!f.exists()) {
  if (!f.mkdir()) { // breakpoint
    Throw new IOException(...);
  }
}
The problem with this is that a concurrent process may create f between the calls to exists and to mkdir. A way to confirm this is to call the batch compiler with an explicit output directory (to avoid the reuse of the source directories) under the control of the debugger, having set a breakpoint where shown above. Start two compilations of a given class defined in a deep package. Wait their reaching the breakpoint. Complete one of the compilations. Restart the other one: it fails, saying something like:
'No .class file created for file a/b/c/d/e/f/g/h1/X.class in /tmp/test/output1 because of an IOException: The output directory 'a' is not a valid directory name. The directory cannot be created'.
The automation of the test case is kind of problematic (needs to instrument the code), but the scenario above shows that the problem may surface.

A simple fix would be to test again for existence before complaining, like:
File f = new File(expectedPath);
if (!f.exists()) {
  if (!f.mkdir() && !f.exists()) { // f may have been created by another process
    Throw new IOException(...);
  }
}
This only costs in case mkdir fails.

Note that in case of deletion, the end result should be an error anyway. Hence we only need to fix the concurrent successful compilation case.

The example cited above is clearly contrived, but this may be a realistic problem in environments using massive parallelism to build large/complex libraries.
Comment 1 Maxime Daniel CLA 2006-05-12 07:48:57 EDT
Discussed with Philippe: we'll fix that in 3.2.1 timeframe.
Comment 2 Maxime Daniel CLA 2006-05-12 10:19:42 EDT
Created attachment 41311 [details]
Patch plus partial test case

This patch:
- proposes a slightly more sophisticated fix than the one described above;
- tweaks output directory related messages;
- adds a couple of test cases in BatchCompilerTest to illustrate those errors
  that were not present in test cases yet.
However, given the nature of the problem, no test case is provided that would guard us against a regression.
Full tests under progress so far.
Comment 3 Maxime Daniel CLA 2006-05-15 04:34:03 EDT
Tests still being polished to accomodate platform differences.

Another important consideration is that the proposed solution could be optimized for the 'no problem creation' scenario. Before this can be done, the following should be evaluated:
- distribution of creation demands, weighting each type of subdirectories path (typically empty vs non empty, average depth, extreme depth, etc.), and multiple demands for a given path; this will drive the optimization effort (a radically different approach should be taken if multiple demands are the norm);
- relative cost of single call to mkdirs and on-demand repeated calls to mkdir for significant distributions of paths and existing paths.
The whole idea is to super optimize the no problem case in the context of the typical usage distribution, then add any needed diagnostic refinements to the failure case (in other words, if the creation fails, then try to determine why - at extra cost if needed).
Comment 4 Maxime Daniel CLA 2006-05-15 04:52:33 EDT
Created attachment 41431 [details]
Fix with test cases

Same as previous but fixes tests for Win* file systems.
Could still optimize the no problem scenario.
Comment 5 Maxime Daniel CLA 2006-05-16 07:51:16 EDT
Independant tests show a 4x time advantage to a straight mkdirs in the no problem case (that is, we neither get a file masking a directory nor a permission denied exception).
I would contend that a no frills mkdir be performed first, then error diagnostics be completed if/as needed in the faulty cases.
Another point of interest is that we may consider normalizing paths once and
for all at some point (despite the potential need for original name bookeeping for reporting errors).

Olivier, what do you think?
Comment 6 Olivier Thomann CLA 2006-05-16 10:29:10 EDT
Looks good.
Comment 7 Maxime Daniel CLA 2006-05-26 09:23:24 EDT
Created attachment 42701 [details]
Fix + test cases

This patch implements the strategy suggested above: go first for no-frills mkdirs; if it fails, do step by step creation attempts so as to refine diagnostics.
I would still suggest that we consider refactoring a bit around paths management. Jerome told me about the way we do that for external jars in the model area. May be worthwhile to align our policies (basics is that we are tolerant to various user input, but we standardize as soon as possible on the current platform conventions - aka \ for Win*, / for unices - and we report errors with normalized names - i.e. we accept to present the user with paths that are not the ones he entered, but that are conformant to the platform conventions, and point to the same locations as the paths he entered; we may also want to consider canonical paths as well - or else expose the rationale for not doing so).
Comment 8 Maxime Daniel CLA 2006-05-26 09:29:52 EDT
Fixed and released in TARGET_321.
Comment 9 Frederic Fusier CLA 2006-06-12 05:18:26 EDT
Released for 3.2.1
Released for 3.3 M1 while merging TARGET_321 in HEAD
Comment 10 Frederic Fusier CLA 2006-08-07 12:33:41 EDT
Verified for 3.3 M1 using build I20060807-0010 (only code review).
Comment 11 Frederic Fusier CLA 2006-09-12 03:55:06 EDT
Verified for 3.2.1 using build M20060908-1655 (only code review)