Bug 40498 - StringMatcher.java could not be read (build all fails)
Summary: StringMatcher.java could not be read (build all fails)
Status: RESOLVED WORKSFORME
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.0   Edit
Hardware: PC Linux-GTK
: P3 major (vote)
Target Milestone: 3.0 M8   Edit
Assignee: Platform-VCM-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-07-18 14:52 EDT by Douglas Pollock CLA
Modified: 2004-02-17 13:04 EST (History)
3 users (show)

See Also:


Attachments
Eclipse Log (53.37 KB, text/plain)
2003-07-24 10:50 EDT, Douglas Pollock CLA
no flags Details
Patch for org.eclipse.ui.views (1.15 KB, patch)
2003-07-28 09:19 EDT, Douglas Pollock CLA
no flags Details | Diff
Patch for org.eclipse.ui.workbench (1.17 KB, patch)
2003-07-28 09:19 EDT, Douglas Pollock CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Douglas Pollock CLA 2003-07-18 14:52:35 EDT
The machine is a recently built RH9 box, running KDE, and Eclipse using GTK. 
Trying to build platform-ui from CVS (importing the other pieces as binary
projects), I get a failure that complains:

"The project was not built since the source file
/org.eclipse.ui.workbench/Eclipse
UI/org/eclipse/ui/internal/misc/StringMatcher.java could not be read."

When I first tried to open the file, it complained that it was not valid UTF-8.
 I switched to ASCII, and now it opens fine.  The build still fails.


STEPS TO REPRODUCE:
1.) Install the latest I20030717 build on a RH9 box under KDE.
2.) Open Eclipse, add a CVS perspective, and add a CVS resource pointing to
dev.eclipse.org (anonymous)
3.) Checkout platform-ui
4.) Import all the other Eclipse stuff as binary projects.
5.) Rebuild all.

OBSERVED RESULTS:
29 problems (24 errors).  The errors are all traced back to the error mentioned
above.  Opening StringMatcher at this point should cause problems.
Comment 1 Douglas Pollock CLA 2003-07-22 14:17:49 EDT
Found this in the log.  There are multiple entries, all the same.  Destroying
the project and checking it out again does nothing.  Neither does updating.  I
am now using the M2 build; problem is still present.

!STACK 1
org.eclipse.core.internal.resources.ResourceException: Resource is out of sync
with the file system: /org.eclipse.ui.workbench/Eclipse UI/org/eclipse/ui/CVS/Root.
	at java.lang.Throwable.<init>(Throwable.java)
	at java.lang.Throwable.<init>(Throwable.java)
	at org.eclipse.core.runtime.CoreException.<init>(CoreException.java:35)
	at
org.eclipse.core.internal.resources.ResourceException.<init>(ResourceException.java:30)
	at
org.eclipse.core.internal.localstore.FileSystemResourceManager.read(FileSystemResourceManager.java:406)
	at org.eclipse.core.internal.resources.File.getContents(File.java:214)
	at org.eclipse.core.internal.resources.File.getContents(File.java:204)
	at
org.eclipse.team.internal.ccvs.core.util.SyncFileWriter.readFirstLine(SyncFileWriter.java:398)
	at
org.eclipse.team.internal.ccvs.core.util.SyncFileWriter.readFolderSync(SyncFileWriter.java:171)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseSynchronizer.cacheFolderSync(EclipseSynchronizer.java)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseSynchronizer.getFolderSync(EclipseSynchronizer.java)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.isCVSFolder(EclipseFolder.java)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.isIgnored(EclipseFolder.java)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.members(EclipseFolder.java)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.calculateAndSaveChildModificationStates(EclipseFolder.java:390)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.isModified(EclipseFolder.java:359)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.calculateAndSaveChildModificationStates(EclipseFolder.java:394)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.isModified(EclipseFolder.java:359)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.calculateAndSaveChildModificationStates(EclipseFolder.java:394)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.isModified(EclipseFolder.java:359)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.calculateAndSaveChildModificationStates(EclipseFolder.java:394)
	at
org.eclipse.team.internal.ccvs.core.resources.EclipseFolder.isModified(EclipseFolder.java:359)
	at
org.eclipse.team.internal.ccvs.ui.CVSLightweightDecorator.isDirty(CVSLightweightDecorator.java:99)
	at
org.eclipse.team.internal.ccvs.ui.CVSLightweightDecorator.isDirty(CVSLightweightDecorator.java:112)
	at
org.eclipse.team.internal.ccvs.ui.CVSLightweightDecorator.decorate(CVSLightweightDecorator.java:189)
	at
org.eclipse.ui.internal.decorators.LightweightDecoratorDefinition.decorate(LightweightDecoratorDefinition.java:158)
	at
org.eclipse.ui.internal.decorators.LightweightDecoratorManager$LightweightRunnable.run(LightweightDecoratorManager.java:54)
	at org.eclipse.core.internal.runtime.InternalPlatform.run(InternalPlatform.java)
	at org.eclipse.core.runtime.Platform.run(Platform.java)
	at
org.eclipse.ui.internal.decorators.LightweightDecoratorManager.decorate(LightweightDecoratorManager.java)
	at
org.eclipse.ui.internal.decorators.LightweightDecoratorManager.getDecorations(LightweightDecoratorManager.java)
	at
org.eclipse.ui.internal.decorators.DecorationScheduler$1.run(DecorationScheduler.java)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:58)
!ENTRY org.eclipse.core.resources 4 274 Jul 21, 2003 09:49:58.584
!MESSAGE Resource is out of sync with the file system:
/org.eclipse.ui.workbench/Eclipse UI/org/eclipse/ui/CVS/Root.
Comment 2 Douglas Pollock CLA 2003-07-24 10:50:57 EDT
Created attachment 5547 [details]
Eclipse Log

A log file showing the actual UTF8 conversion failure.
Comment 3 Douglas Pollock CLA 2003-07-24 11:26:17 EDT
The StringMatcher.java file contains the hexidecimal values 0x91 and 0x92 in
multiple positions.  I don't believe these to be valid UTF-8 encoded characters.
 For example, the following sequence of bytes can be seen in vi:

         * pattern which may contain <91>*<92> for 0 and many characters and
         * <91>?<92> for exactly one character.
Comment 4 Douglas Pollock CLA 2003-07-24 11:30:45 EDT
From bash, executing "rm StringMatcher.java; cvs update -d -C
StringMatcher.java" still leaves the strange hexadecimal values in the file.
Comment 5 Douglas Pollock CLA 2003-07-24 12:27:17 EDT
I've confirmed this on a Debian box.  This file displays this way under Linux. 
The characters appear as left and right quotes under Windows, as well as in the
Mozilla browser on Linux.  However, on a Linux terminal it displays as an
invalid UTF-8 character (both uxterm and xterm).  In Eclipse, it complains that
it is not valid UTF-8.

The last person to edit this file must have used Windows and inserted these
characters, which Windows happens to encode as 0x91 and 0x92.  However, 0x91 and
0x92 are not valid UTF-8 characters, and hence Linux complains.  Why does
Mozilla display it properly?  (font?  special handler code?)

This is really two problems.  CVS seems to contain a file that is not valid
UTF-8.  The eclipse core should escape those bytes before storing them to the
file system.  (But wouldn't eclipse core use Java's IO libraries to do this anyway?)

You could probably also point a finger at Linux' UTF-8 locale implementation,
but it does seem to match the specification.
Comment 6 Dani Megert CLA 2003-07-25 04:23:06 EDT
Moving to Platform UI since they own that particular copy of the string matcher.
Comment 7 Douglas Pollock CLA 2003-07-28 09:18:35 EDT
The problem occurs in a second (duplicate?) StringMatcher class located in
"org.eclipse.ui.views".  I'm supplying patches for both projects.  Note that
this does not fix the problem of how 0x91 and 0x92 ended up in CVS in the first
place.
Comment 8 Douglas Pollock CLA 2003-07-28 09:19:30 EDT
Created attachment 5564 [details]
Patch for org.eclipse.ui.views
Comment 9 Douglas Pollock CLA 2003-07-28 09:19:51 EDT
Created attachment 5565 [details]
Patch for org.eclipse.ui.workbench
Comment 10 Dani Megert CLA 2003-07-28 09:24:51 EDT
Note: There are several more instances of the StringMatcher class with different
owners.
Comment 11 Douglas Pollock CLA 2003-07-28 09:28:18 EDT
As a note, it looks like the code generating patches is also affected.  Text
from the original is not included in the patch file starting at the first
offending character.  It looks like the patch generator doesn't like including
unrecognized characters, and doesn't recover as well as it might from such an
error.  (arg!)
Comment 12 Nick Edgar CLA 2003-07-29 21:17:22 EDT
Moving to VCM.
Comment 13 Dani Megert CLA 2003-07-30 04:03:59 EDT
Maybe this is a VM problem? Did you try using another VM?
Comment 14 Douglas Pollock CLA 2003-07-30 09:50:27 EDT
Under Sun's 1.4.2 VM, the code will compile.  When the source is viewed in an
editor, it will display, but missing the 0x91 and 0x92 characters.  Editing the
file and then saving it will overwrite the 0x91 and 0x92 characters with their
UTF-8 equivalents.

So, there are still files in CVS that are not valid UTF-8.  Sun's VM is tolerant
of these oddities, but IBM's VM that I was using is not (pj9xia32131-20030714a).
 Somehow, invalid UTF-8 can be written to a CVS repository using Eclipse.  It
wasn't Sun's 1.4.2 VM that wrote them to CVS (see above).

Further testing with other VMs?
Comment 15 Dani Megert CLA 2003-07-30 10:02:17 EDT
I'm not a VM guy, so I don't know if there's a spec for this i.e. which VM
behavior is the one we can expect.

Comment 16 Jean-Michel Lemieux CLA 2003-07-30 13:25:32 EDT
The CVS plugin transfers bytes to the server and is agnostic about the encoding
used in the platform. The stack trace relates to the CVS decorators and the fact
that some of the projects in your workspace were out-pf-sync with the file
system. This is not related to the java builder not compiling the class.

Eclipse uses the default OS encoding or uses the overriden setting under
Preferences > Workbench > Editors. There shouldn't be a plugin that assumes
UTF-8 as the default.

To conclude, this is not a CVS problem but a problem with the java compiler.
However I'm not sure what encoding scheme it should use to parse the source
files when two developers are using different OS encodings and committing the
files to CVS.
Comment 17 Knut Radloff CLA 2003-07-31 08:25:40 EDT
On Windows I can't open the files in question either when selecting UTF-8 
encoding. The default encoding was 8859 anyway so this wasn't a problem. With 
the default encoding the questionable characters are not shown at all. I.e., 
they are 0 length characters.
The build works fine because the Java compiler still uses 8859. The build would 
probably fail on my Windows box as well if I specified UTF-8 encoding on the 
command line (file.encoding property).
Not sure where these bogus characters come from.

Removed the offending characters in the three Platform UI StringMatcher files. 
Suggest Team, Search and JDT Debug and JDT UI teams do the same.
Comment 18 Douglas Pollock CLA 2003-07-31 09:27:15 EDT
See Sections 3.1 and 3.3 of the Java Language Specification.
("http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#95413"
[Section 3.1]).

"Programs are written using the Unicode character set."

It's not a valid Java program if it isn't written in Unicode.
Comment 19 Dani Megert CLA 2003-07-31 11:20:31 EDT
Fixed for Search and JDT UI.
Comment 20 Philipe Mulet CLA 2003-09-29 18:05:05 EDT
Jean-Michel,

What makes you think there is a Java compiler bug here ? If the specified 
encoding is incorrect, then how could we process it without any errors ?
Comment 21 Jean-Michel Lemieux CLA 2003-10-03 09:18:12 EDT
Let me take that back. What I was trying to say is that if the java spec says 
that Java source files must be encoded as Unicode then either the VM (as Doug 
has observed) or the JDT Java Editor is not ensuring that the file is written 
as UTF-8?

BTW, I've also fixed the StringMatcher in Team/CVS.

Comment 22 Douglas Pollock CLA 2003-10-03 10:18:19 EDT
Many apologies, but I don't think that I read it closely enough the first 
time.  There is an "except".  Any character (e.g., 0x91) is allowed in 
comments, string/character literals and identifiers.  Only keywords, 
separators, and operators need to be in low ASCII (or escaped using "\uXXXX" 
sequences).

There is no problem using Sun's JDK 1.4.2.  I'm beginning to think this is a 
VM bug.
Comment 23 Chris McLaren CLA 2004-02-09 16:41:03 EST
this is late in the game but
i encountered this problem today on a new linux install with the IBM 1.4.1 VM.
the problem disappeared without any other changes using the SUN 1.4.2 VM.
Comment 24 Philipe Mulet CLA 2004-02-17 13:04:17 EST
Closing as JRE issue.