186984 – IFile.getCharset should throw out of sync

Bug 186984 - IFile.getCharset should throw out of sync

Summary: IFile.getCharset should throw out of sync

Status:	VERIFIED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Resources (show other bugs)
Version:	3.3
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.4 M3
Assignee:	Szymon Brandys
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-05-15 07:27 EDT by Dani Megert
Modified:	2011-03-30 06:52 EDT (History)
CC List:	2 users (show)

See Also:

Flags:	john.arthorne: review+

Attachments
Fix (3.16 KB, patch) 2007-10-12 09:29 EDT, Szymon Brandys	no flags	Details \| Diff
Test (1.75 KB, patch) 2007-10-12 12:24 EDT, Szymon Brandys	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dani Megert

2007-05-15 07:27:25 EDT

I20070515-0010.

Most IFile methods throw an IResourceStatus.RESOURCE_NOT_FOUND CoreException when the file no longer exists but getCharset methods don't and hence strange messages are shown to the user, e.g.:

Replacing getCharset() and using file.getContentDescription().getCharset() doesn't help either, even though getContentDescription is spec'ed to handle does not exist:
	 * <li> This resource does not exist.</li>

A general problem with all those methods is that they specify when a core exception is thrown but they do not specify the status code and hence clients have no clue what happened.

Comment 1 John Arthorne

2007-05-15 11:01:31 EDT

Investigate for 3.4.

Comment 2 Dani Megert

2007-05-15 11:03:13 EDT

Thanks. It would also be good to specify the status codes in the throws sections.

Comment 3 Szymon Brandys

2007-10-12 05:55:08 EDT

This is how it looks from my point of view.
Looking at javadoc of IFile#getCharset(boolean) we can see:

If checkImplicit is true, this method uses the following algorithm to determine the charset to be returned: 

- the charset defined by calling #setCharset, if any, and this file exists, or 
- the charset automatically discovered based on this file's contents, if one can be    determined, or 
- the default encoding for this file's parent (as defined by IContainer#getDefaultCharset). 

So for me it means that even if the file doesn't exist, its encoding will be retrieved from the parent. But on the other side in the javadoc we can read:

Throws: 
CoreException if this method fails. Reasons include: 
- This resource could not be read. 
- This resource is not local. 
- The corresponding location in the local file system is occupied by a directory. 

But those exceptions shouldn't affect the alghoritm. So even if we can't read the file for any of these reasons, we should continue without exception.

So I can suggest two ways:

1. Avoid throwing exceptions if the file can't be read
2. Throw exceptions always when file doesn't exist (including the case when exists in WS and doesn't in FS) and modify the desc of the alghoritm in javadoc.

What do you think?

Comment 4 Dani Megert

2007-10-12 06:48:15 EDT

Mmmh. I probably was a bit confused when I filed this bug. The problem is not about the existence (i.e. IResourceStatus.RESOURCE_NOT_FOUND) but about what is done when the resource exists but is out of sync. In the case where the resource's content is accessed to determine the charset I would expect that a OUT_OF_SYNC_LOCAL core exception is thrown and that this is spec'ed in Javadoc.

Comment 5 Szymon Brandys

2007-10-12 06:58:41 EDT

I'm referring to the javadoc of getCharset() too. In my opinion neither RESOURCE_NOT_FOUND nor OUT_OF_SYNC_LOCAL should be thrown or the algorithm described in the javadoc should be modified.

That's why I wrote "So even if we can't read the file for any of these reasons, we should continue without exception". "Can't read" means that we can't read the file for any reasons, what includes both RESOURCE_NOT_FOUND and OUT_OF_SYNC exceptions.

Comment 6 Dani Megert

2007-10-12 07:04:06 EDT

>"Can't read" means that we can't read the file for any reasons, what includes >both RESOURCE_NOT_FOUND and OUT_OF_SYNC.
Not really. For the RESOURCE_NOT_FOUND case the Javadoc cleary spec's what happens i.e. either return 'null' or go to the parent. That's good.

But for the read case there can be two cases:
- it fails because the file is out of sync
- it is in sync but reading fails

In both of those cases the code should not be smart, ignore the error and do something else.

Comment 7 Szymon Brandys

2007-10-12 08:35:15 EDT

I had a discussion with Dani. The main issue in my opinion is which part of javadoc is more important. The one that states that we can determine the file charset even if it doesn't exist (or we have problems while reading it) or the one that states what kind of exceptions can be thrown :-)

Dani states that if we choose the first way (the comment #3) we will break current behavior. But I double check that this method doesn't throw exceptions when file doesn't exist. The only problem is when file is not in sync (doesn't exist in FS). In this case it doesn't follow the algorithm and an exception is thrown.

// Ways:
// 1. Avoid throwing exceptions if the file can't be read
// 2. Throw exceptions always when file doesn't exist (including the case when
// exists in WS and doesn't in FS) and modify the desc of the alghoritm in
// javadoc.

Dani votes for the second way, but I don't think that the way number 1 breaks current behavior. In my opinion exception codes in javadoc are there by mistake and the way number 1 is better. But I will wait for third vote :-)

Comment 8 Dani Megert

2007-10-12 09:19:13 EDT

>The only problem is when file is not in sync (doesn't
>exist in FS). In this case it doesn't follow the algorithm and an exception is
>thrown.
Not is sync does not mean the file is not in the FS it means it got modified outside Eclipse.

>I double check that this method doesn't throw exceptions
>when file doesn't exist.
Yes and that is expected. I am talking about the case where the file exists on the FS but the resource info is out of sync and the content determined from the file.

>Dani votes for the second way, but I don't think that the way number 1 breaks
>current behavior. 
Mmh, so I wasn't convincing in the discussion ;-) Let's try again: your choice can cause data corruption because you return the wrong (parents) encoding instead of telling the client of the API that something went wrong.

Here's a test case that shows this bug:
1. in Eclipse create a plugin.xml file with this header:
   <?xml version="1.0" encoding="utf-8"?>
2. go to the FS and copy that file to plugin2.xml
3. in Eclipse refresh the project
4. select the file and open its properties ==> encoding is UTF-8 (correct)
5. outside Eclipse go and edit the file: change header to:
   <?xml version="1.0" encoding="ascii"?>
6. save the file (outside Eclipse)
7. in Eclipse select the file and open its properties
   ==> BUG: encoding is said to be UTF-8 determined from content which is clearly not true.

Using the incorrect encoding can lead to data corruption once a wrong encoding is blindly used to write a file.

Comment 9 Szymon Brandys

2007-10-12 09:29:29 EDT

Created attachment 80224 [details]
Fix

Comment 10 Szymon Brandys

2007-10-12 09:43:02 EDT

Thanks Dani :-) I think I understand now

Comment 11 John Arthorne

2007-10-12 10:39:13 EDT

Patch looks good. We need a test for this in CharsetTest to make sure we remain consistent with the spec.

Comment 12 Szymon Brandys

2007-10-12 12:24:54 EDT

Created attachment 80244 [details]
Test

Comment 13 John Arthorne

2007-10-12 13:10:42 EDT

Thanks Szymon and Dani, patch released.

Comment 14 John Arthorne

2007-11-16 11:49:16 EST

I have backed out of the change to File.getCharset due to performance problems (see bug 209167). I have updated the spec of IFile.getCharset to indicate that it returns a cached value of the encoding.  

I have left IFile.getContentDescription() with the sync check, for clients who want to be sure they are getting an encoding that reflects the current file state. This also matches the spec which states it is equivalent to IContentTypeManager.getDescriptionFor(getContents(), getName(), IContentDescription.ALL), which would throw OUT_OF_SYNC_LOCAL on getContents() when out of sync.

Comment 15 James Blackburn

2011-03-30 06:52:56 EDT

(In reply to comment #8)

As part of Bug 303517 I've beefed up the test to match Dani's description in comment 8.  The test now changes the content type and asserts we notice the change, rather than just checking sync state.  It also asserts that #getContentDescription() does the right thing whether lightweight refresh is on or off.