Bug 161996 - [compiler][batch][options] ecj can't cope with [] brackets in classpath names
Summary: [compiler][batch][options] ecj can't cope with [] brackets in classpath names
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.2.1   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 3.3 M5   Edit
Assignee: Maxime Daniel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 147461
  Show dependency tree
 
Reported: 2006-10-23 15:18 EDT by Ben Konrath CLA
Modified: 2007-02-07 14:22 EST (History)
3 users (show)

See Also:


Attachments
patch against 3.2.1 to fix this problem (5.88 KB, patch)
2006-10-23 15:19 EDT, Ben Konrath CLA
no flags Details | Diff
diagram for new state machine (30.69 KB, image/png)
2006-10-23 15:21 EDT, Ben Konrath CLA
no flags Details
updated patch for 3.2.1 with junit test (7.31 KB, patch)
2007-01-27 00:48 EST, Ben Konrath CLA
no flags Details | Diff
fixed small problem in unit test (7.15 KB, patch)
2007-01-27 00:59 EST, Ben Konrath CLA
no flags Details | Diff
patch for v_732 (N20070127-0010) with and updated test (10.54 KB, patch)
2007-01-29 00:11 EST, Ben Konrath CLA
no flags Details | Diff
diagram for new state machine in v_732 patch (50.90 KB, image/png)
2007-01-29 00:12 EST, Ben Konrath CLA
no flags Details
.dia (Dia diagram) file for 3.2.1 png (2.45 KB, application/octet-stream)
2007-01-29 13:30 EST, Ben Konrath CLA
no flags Details
.dia (Dia diagram) file for 3.3 png (3.41 KB, application/octet-stream)
2007-01-29 13:31 EST, Ben Konrath CLA
no flags Details
Suggested fix + test cases (7.48 KB, patch)
2007-02-01 07:33 EST, Maxime Daniel CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ben Konrath CLA 2006-10-23 15:18:34 EDT
The jdt.core batch compiler can't deal with [] brackets in classpath names. Square brackets are used for symlinks in jpackage.org packaging conventions. See this bug for some more details: 

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199961

I filed the bug against 3.2.1 but I think it's a problem in 3.3 as well.
Comment 1 Ben Konrath CLA 2006-10-23 15:19:57 EDT
Created attachment 52549 [details]
patch against 3.2.1 to fix this problem

We will be carrying this patch in Fedora Eclipse.
Comment 2 Ben Konrath CLA 2006-10-23 15:21:08 EDT
Created attachment 52550 [details]
diagram for new state machine 

I whipped up a little diagram of the new state machine.
Comment 3 Olivier Thomann CLA 2006-10-23 16:22:36 EDT
This code has changed quite a bit with the addition og the output folder per path entry (usage of [-d ...] at the end of the source path entries).
I'll try to build up test cases and fix this in HEAD.

Philippe,

Candidate for 3.2.2?

Comment 4 Philipe Mulet CLA 2006-10-24 05:01:19 EDT
Ben - can you add a comment with a sample of the offending command line ?

Olivier - I think this would qualify for 3.2.2, need to see the patch first before blessing.
Comment 5 Olivier Thomann CLA 2006-10-24 10:07:59 EDT
What happened if the brackets are used to qualify a folder name on the command line?
..... test/[folder] ....

How can we find out that this is not an encoding or an access rule? When you have an extension after the bracket, we can make the distintion, but in this case I don't see.
Comment 6 Maxime Daniel CLA 2006-10-25 03:05:51 EDT
This is a problem that exists at the shell level as well for other characters (aka parenthesis for a bash). The way the shell copes with this is to quote such characters. 
Full example:
touch 'ty('
  here '' tells the shell not to interpret the embedded (
cat ty( 
  -> syntax error
cat 'ty('
cat ty\(
cat ty.
  all ok (no contents)
Having said that, I do not believe we could mandate the quoting of square brackets in file names in general, and this would defeat the purpose of quoting when it is needed to cope with the shell itself anyway (like in ecj -cp'/opt/a whitespace/anything.jar').
One option we could explore would be to have an alternative way of specifying encoding character sets (and access rules and output directories) that would not rely upon square brackets, and a specific option to tell ecj not to interpret square brackets at all.
Comment 7 Olivier Thomann CLA 2006-10-31 14:24:36 EST
(In reply to comment #5)
> What happened if the brackets are used to qualify a folder name on the command
> line?
> ..... test/[folder] ....
> 
> How can we find out that this is not an encoding or an access rule? When you
> have an extension after the bracket, we can make the distintion, but in this
> case I don't see.
In order to remove the ambiguity, can we say that a file separator needs to be added after the folder name in case the folder name is within brackets and you want to specify either an encoding or access rules for it or an output folder?

Ben, would this be fine for you?
Comment 8 Maxime Daniel CLA 2006-11-01 05:33:24 EST
What about the src/[utf-8] source folder? Or the src[utf-8] source folder?
Comment 9 Olivier Thomann CLA 2006-11-01 15:32:45 EST
If we have a trailing file separator, we can always make the distinction between encoding, access rules or output folder.
src/[utf-8] would be equivalent to the actual src[utf-8].
If the folder name is not between brackets, then the file separator should not be mandatory.
If you have a folder called "[utf-8]" inside the folder src, then its entry on the source path or classpath would be:
src/[utf-8]/

Any thoughts?

 
Comment 10 Maxime Daniel CLA 2006-11-02 01:38:39 EST
Seems ok to me. Users will need to twist directory lists generators to add the / at the end but this should not be a big deal.
Another alternative to comment #6 would be the following. Let -lsb be a new option, that takes a value of 0 or more (lsb for litteral square bracket). The default would be 0, matching the current behavior. With a value of 1, the following square brackets would be considered as regular characters, while sequences of two square brackets would be considered as a rule separator.
Examples:
ecj src[utf-8]
ecj -lsb 0 src[utf-8]
source directory src encoded with utf-8
ecj -lsb 1 src[utf-8][[-**/impl/*]]
source directory src[utf-8] with access rule -**/impl/*
Comment 11 Ben Konrath CLA 2006-11-03 16:09:35 EST
(In reply to comment #4)
> Ben - can you add a comment with a sample of the offending command line ?

Sorry for the delay, here are examples of the two forms of classpaths that have been problematic for us:

% java -jar eclipse-3.2.1/plugins/org.eclipse.jdt.core_3.2.1.v_677_R32x.jar -cp "[bork].jar" HelloWorld.java
incorrect classpath: [bork].jar

% java -jar eclipse-3.2.1/plugins/org.eclipse.jdt.core_3.2.1.v_677_R32x.jar -cp "[bork][blah].jar" HelloWorld.java
incorrect classpath: [bork][blah].jar
Comment 12 Olivier Thomann CLA 2006-11-03 16:11:58 EST
Would our proposals fix your issues ?
Comment 13 Ben Konrath CLA 2006-11-03 18:00:19 EST
Hi Olivier and Maxime, thanks for looking into this problem.

(In reply to comment #12)
> Would our proposals fix your issues ?

The problem I have with the solutions proposed in comment #7 and comment #10 is that in some cases users would be required to use different command line arguments for the jdt.core batch compiler and a proprietary javac. And this isn't desirable because we would like to use jdt.core batch compiler as a straight replacement for a proprietary javac.

What about splitting up the classpath line by pathSeparator and checking if the file exist before it is sent through the state machine. If the file exits it can be removed from the classpath line and set at that point. The state machine in the patch I provided would have to be modified a little to take the above into account.

This solution would still have a problem if files 'path' and 'path[rule]' existed and a user specified '-cp path[rule]'. In this case there is no way to tell if the user wanted the file 'path' with 'rule' on the classpath or wanted the file 'path[rule]' on the classpath. I think we would need to add the -lsb option proposed in comment #10 to get around this issue. That way, it's at least possible for a user to choose which behaviour they would like. While this still is not exactly the same as a proprietary javac, it would be encountered so infrequently that it shouldn't matter too much. 

Any thoughts on this solution? If this proposal seems acceptable, I'll whip up a new patch. Thanks.
Comment 14 Maxime Daniel CLA 2006-11-06 03:05:12 EST
I wonder if the 'straight replacement' requirement would not deserve a bug of its own. This may drive a need for more non-command line ways to direct the batch compiler (aka enrich environment values and/or profiles à la .your_command_of_choice_rc). At least the 'can we do the straight replacement trick as of today?' question should be asked, this bug possibly being only one of the current naysayers.
Comment 15 Ben Konrath CLA 2007-01-27 00:48:35 EST
Created attachment 57640 [details]
updated patch for 3.2.1 with junit test

I looked at this problem again and tried solve it with the following constraints:

(1) solution works with HEAD and 3.2.x 
(2) solution works with [-].jar and [-][-].jar on the classpath without extra command line flags
(3) solution will make it possible to work with any file on the classpath containing '[' or ']'

I couldn't find a solution that would meet these requirements so I think I'm just going come up with a different solution for 3.3 and drop constraint #3. I don't expect you to take these patches but I'll continue to post what I have to get comments. Thanks.
Comment 16 Ben Konrath CLA 2007-01-27 00:59:17 EST
Created attachment 57641 [details]
fixed small problem in unit test
Comment 17 Ben Konrath CLA 2007-01-29 00:11:32 EST
Created attachment 57669 [details]
patch for v_732 (N20070127-0010) with and updated test

I used the same 'parsing backwards' strategy as the patch to 3.2.1 and again only fixed the problem for the Jpackage.org conventions: [square][bracket].jar and [squarebraket].jar. By parsing the classpath in reverse order, the state machine code can ignore the square  brackets in the path after it encounters the '.jar' section of the path. The tests in BatchCompilerTest (including the tests I added) successfully finish with this patch. Comments are appreciated, thanks.
Comment 18 Ben Konrath CLA 2007-01-29 00:12:58 EST
Created attachment 57670 [details]
diagram for new state machine in v_732 patch
Comment 19 Ben Konrath CLA 2007-01-29 13:30:40 EST
Created attachment 57722 [details]
.dia (Dia diagram) file for 3.2.1 png 

I'm attaching these .dia files so that I don't have to keep them around in my homedir.
Comment 20 Ben Konrath CLA 2007-01-29 13:31:28 EST
Created attachment 57723 [details]
.dia (Dia diagram) file for 3.3 png
Comment 21 Olivier Thomann CLA 2007-01-29 13:43:36 EST
Maxime,

Since you wrote the initial decoding with destination path, I'll let you have a look at the patch from Ben.
Too late for 3.2.2, but could be done for 3.3.
Comment 22 Maxime Daniel CLA 2007-02-01 02:19:10 EST
(In reply to comment #14)
> I wonder if the 'straight replacement' requirement would not deserve a bug of
> its own. This may drive a need for more non-command line ways to direct the
> batch compiler (aka enrich environment values and/or profiles à la
> .your_command_of_choice_rc). At least the 'can we do the straight replacement
> trick as of today?' question should be asked, this bug possibly being only one
> of the current naysayers.
> 
I definitely think this deserves a separate bug because, as of today, we cannot claim to be compatible without breaking our existing clients (-cp dir[+p/q] would be adding 'dir[+p/q]' classpath entry for javac, while it would be adding 'dir' classpath entry with explicit positive access to 'p/q' for ecj, and adding an explicit option to tell ecj to behave as javac just adds a non javac compatible option...). Opened bug 172423 to this effect.

In the context of this bug, I will thus focus on extending our support to cases in which brackets are used in file or directory names of classpath entries (not destination directories), including the specific cases of comment #15, item 2.

First part of the work will be to extend our test cases in that area (we obviously miss some - see below).

Ben, while the reverse state machine idea is quite clever, I'll give a try to a forward automaton on my side and see where I can get before adopting yours. 
If I did not mess things up when testing your patch for 3.3, it reverses classpath entries (aka -cp a;b becomes -cp b;a). I have not tried your patch for 3.2, but it may have the same issue. We definitely need to augment our tests coverage here.
Comment 23 Maxime Daniel CLA 2007-02-01 07:23:12 EST
Released more tests in BatchCompilerTest (some inactive). Added a new helper to specifically test classpath analysis.
Comment 24 Maxime Daniel CLA 2007-02-01 07:33:13 EST
Created attachment 58003 [details]
Suggested fix + test cases

This fix uses a forward automaton which memorizes the opening bracket position until it knows what to do with it. Brackets that are part of a name simply get concatenated in the current classpath name. When the automaton reaches the end of the classpath entry or a path delimiter with an active brackets, it rewinds to the first bracket position and takes the rulesStart state, then completes its job using the same states machine as before.
Olivier, could you please review this patch and let me know what you think?
Comment 25 Maxime Daniel CLA 2007-02-01 10:39:29 EST
Released for 3.3 M5.
Comment 26 Ben Konrath CLA 2007-02-01 10:50:24 EST
(In reply to comment #22)
> Ben, while the reverse state machine idea is quite clever, I'll give a try to a
> forward automaton on my side and see where I can get before adopting yours. 
> If I did not mess things up when testing your patch for 3.3, it reverses
> classpath entries (aka -cp a;b becomes -cp b;a). I have not tried your patch
> for 3.2, but it may have the same issue. We definitely need to augment our
> tests coverage here.

Yep, you're right the 3.2 patch has the same problem. I'll backport your patch to 3.2 so that I can include it with our releases. Thanks for looking into this. 

Comment 27 Maxime Daniel CLA 2007-02-01 13:21:15 EST
You're welcome.
Comment 28 David Audel CLA 2007-02-06 06:21:16 EST
Verified for 3.3 M5 using build I20070206-0010
Comment 29 Olivier Thomann CLA 2007-02-07 11:17:00 EST
Ben,

Is this classpath a possible classpath?
-classpath D:\tests_sources;D:\[test.jar]

I think it should be. This fails right now.

Does the fix improve your cases?
Comment 30 Ben Konrath CLA 2007-02-07 14:22:30 EST
(In reply to comment #29)
> Ben,
> 
> Is this classpath a possible classpath?
> -classpath D:\tests_sources;D:\[test.jar]

Jpackage.org only uses '[name].jar' and '[name1][name2].jar' so we don't see that classpath on our systems by default. I think it would be difficult to fix the classpath you have posted becuase the state machine sees 'test.jar' as a rule. The implementation that I posted didn't fix this problem either. 

If this case is important, perhaps the code could validate the rules before accepting them and check if the classpath is a file it contains rules that are not valid. But even this would not work correctly in every case.