Bug 303511 - Allow to specify encoding for source attachments
Summary: Allow to specify encoding for source attachments
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.4.2   Edit
Hardware: All All
: P3 enhancement with 1 vote (vote)
Target Milestone: 3.7 M3   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-22 11:34 EST by Liangliang Zhang CLA
Modified: 2011-10-19 05:29 EDT (History)
6 users (show)

See Also:
srikanth_sankaran: review+


Attachments
Proposed patch (4.93 KB, patch)
2010-10-20 14:09 EDT, Jay Arthanareeswaran CLA
no flags Details | Diff
Updated patch (11.46 KB, patch)
2010-10-21 03:40 EDT, Jay Arthanareeswaran CLA
no flags Details | Diff
Modified test project (17.71 KB, application/zip)
2010-10-21 03:41 EDT, Jay Arthanareeswaran CLA
no flags Details
Updated patch (13.72 KB, patch)
2010-10-21 10:35 EDT, Jay Arthanareeswaran CLA
no flags Details | Diff
Updated patch (14.76 KB, patch)
2010-10-22 06:28 EDT, Jay Arthanareeswaran CLA
no flags Details | Diff
Fixes for tests (9.37 KB, patch)
2010-10-24 15:51 EDT, Jay Arthanareeswaran CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Liangliang Zhang CLA 2010-02-22 11:34:11 EST
My OS's default encoding is GBK, while my java project "HelloWorldPrj"'s encoding is UTF-8 which uses one lib named "terasoluna-thin-server.jar". This lib's source attachment "terasoluna-server4jweb-src_2.0.1.0.zip"'s source code's encoding is Shift-JIS. 

   There is only one way to view this lib's source properly: 
Windows -> Preferences -> General -> Workspace, set "Text file encoding" to "Shift-JIS".
   Set either my project, or "terasoluna-thin-server.jar", or "terasoluna-server4jweb-src_2.0.1.0.zip", or all above's property "Text file encoding" to "Shift-JIS" has no effect.
   
   How to view all soucre attachement properly when add a soucre attachement's encoding is "Big5"'s lib?

   Hope this is reported to a right place!
Comment 1 Jay Arthanareeswaran CLA 2010-10-20 14:09:55 EDT
Created attachment 181323 [details]
Proposed patch

This patch takes care the following scenarios:

1. The source attachment is a Zip file from the workspace. In this scenario, the encoding that is set for the Zip file is used for displaying the source. In the absence of encoding for the ZIP, inherited encoding from the project is used.

2. The source attachment is a source folder from the workspace. Here, the encoding set for the specific source file (java etc.) is used. If there is no encoding set for the individual file explicitly, then the inherited encoding is applied.

For instance, if ProjectA uses a folder or a ZIP from ProjectB as a source attachment, then in the absence of encoding for the source file/ZIP, ProjectB's encoding will be used.

3. The source attachment is an external folder or a archive - the encoding used will be that of the project (explicit or inherited) that contains the library concerned. 

Regressions tests to be added.
Comment 2 Jay Arthanareeswaran CLA 2010-10-21 03:40:01 EDT
Created attachment 181364 [details]
Updated patch

Same patch with minor changes and tests. The tests need some changes to one of the test projects, which I will be attaching.
Comment 3 Jay Arthanareeswaran CLA 2010-10-21 03:41:51 EDT
Created attachment 181365 [details]
Modified test project

This ZIP to extracted to \org.eclipse.jdt.core.tests.model\workspace. The project files need to be overwritten.
Comment 4 Jay Arthanareeswaran CLA 2010-10-21 03:42:20 EDT
Srikanth, can you review this for 3.7M3 please?
Comment 5 Dani Megert CLA 2010-10-21 05:25:51 EDT
(In reply to comment #1)
1. and 2. look good but 3. is not OK: the encoding for external libraries must be taken from the workspace because it is not deterministic which project resolves e.g. the rt.jar (the first one wins).
Comment 6 Jay Arthanareeswaran CLA 2010-10-21 06:20:48 EDT
(In reply to comment #5)
> (In reply to comment #1)
> 1. and 2. look good but 3. is not OK: the encoding for external libraries must
> be taken from the workspace because it is not deterministic which project
> resolves e.g. the rt.jar (the first one wins).

Well, I thought it would allow the user to use a different encoding at a project level, if he/she wants to, for external source attachments. Of course, if it doesn't have to be different, one can simply do nothing, as the inherited encoding from the workspace will be used. With this, we can address scenarios where the user generally wants UTF-8 for the workspace, but one particular external source attachment requires a different encoding, say Shift-JIS.

I agree it's unlikely that the user would want to use different encoding in different projects for the same ZIP file. However, the advantage is the user doesn't have to touch the workspace encoding when the external archive is used in only one project.

Let me know if I have overlooked something here.
Comment 7 Dani Megert CLA 2010-10-21 06:44:32 EDT
Shared external JARs, like rt.jar, are attached to a single project (its parent). This parent is not guaranteed to be the same all the time and there is no clear indication in the UI which project this is. Therefore it's not possible to use the project encoding for external JARs.
Comment 8 Markus Keller CLA 2010-10-21 07:11:01 EDT
I agree with Dani that the workspace encoding should be used for external source attachments.

> With this, we can address scenarios
> where the user generally wants UTF-8 for the workspace, but one particular
> external source attachment requires a different encoding, say Shift-JIS.

In that case, the user can either copy the source into the workspace, or he can create a link to the external source and set the encoding on the link.
Comment 9 Jay Arthanareeswaran CLA 2010-10-21 10:35:57 EDT
Created attachment 181399 [details]
Updated patch

Agreed on the points raised by Dani and Markus.

I have removed the code that looks for the project's encoding. Now, in the case of external resources, the workspace's default encoding is used. Tests have been modified to reflect this.
Comment 10 Srikanth Sankaran CLA 2010-10-22 05:43:05 EDT
Looks good to me.
Comment 11 Jay Arthanareeswaran CLA 2010-10-22 06:28:06 EDT
Created attachment 181484 [details]
Updated patch

Same fix with one more test added as per Srikanth's suggestion.
Comment 12 Jay Arthanareeswaran CLA 2010-10-22 06:49:34 EDT
Released in HEAD for 3.7M3.
Comment 13 Dani Megert CLA 2010-10-23 07:13:42 EDT
This causes JUnit failures in the latest build. Please fix for the next one.
Comment 14 Jay Arthanareeswaran CLA 2010-10-24 15:51:52 EDT
Created attachment 181601 [details]
Fixes for tests

The failures were caused by weird '\r' in the an archive used by the tests. The tests have been fixed now with this patch.
Comment 15 Jay Arthanareeswaran CLA 2010-10-25 03:00:35 EDT
(In reply to comment #14)
> Created an attachment (id=181601) [details]
> Fixes for tests
> 
> The failures were caused by weird '\r' in the an archive used by the tests. The
> tests have been fixed now with this patch.

Between, the tests were all passing on my local machine before commit. Something went wrong with the CVS commit that caused the '\r' problem. Now I am removing all '\r' in the tests themselves in order to have a reliable comparison of sources.
Comment 16 Olivier Thomann CLA 2010-10-26 15:06:24 EDT
Verified for 3.7M3 using I20101025-1602 (4.1 build)
Comment 17 Olivier Thomann CLA 2010-10-26 15:17:17 EDT
For external libraries, could it be possible to add an extra attribute to the classpath entry?
If not present, use the workspace encoding and if present, use it.
This would make it possible to point to external libraries using different encodings.
Comment 18 Dani Megert CLA 2010-10-27 02:42:43 EDT
(In reply to comment #17)
> For external libraries, could it be possible to add an extra attribute to the
> classpath entry?
> If not present, use the workspace encoding and if present, use it.
> This would make it possible to point to external libraries using different
> encodings.
We could do that if someone really reports that the currently solution is not sufficient.
Comment 19 Dani Megert CLA 2011-10-19 05:29:46 EDT
*** Bug 361356 has been marked as a duplicate of this bug. ***