Bug 361356 - Allow to specify encoding for source attachments
Summary: Allow to specify encoding for source attachments
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.8   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 3.8 M5   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 368015
  Show dependency tree
 
Reported: 2011-10-19 04:59 EDT by Victor Homyakov CLA
Modified: 2012-07-18 06:33 EDT (History)
5 users (show)

See Also:


Attachments
Proposed patch (15.50 KB, patch)
2012-01-05 11:47 EST, Jay Arthanareeswaran CLA
no flags Details | Diff
Updated patch (21.62 KB, patch)
2012-01-19 23:08 EST, Jay Arthanareeswaran CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Homyakov CLA 2011-10-19 04:59:47 EDT
Eclipse uses encoding of current project to display source/javadoc of attached libraries despite the fact that the encoding used in the library may be different.

Steps to reproduce:
1. Create project in UTF-8 encoding.
2. Attach library with source/javadoc written in another (Cp1251) encoding.
3. Try to view source or open Javadoc view of attached library from main project - you will see question signs instead of national symbols.

As an enhancement I would like to propose possibility to set encoding for each used library manually (like for other project resources).
Comment 1 Dani Megert CLA 2011-10-19 05:17:46 EDT
The library provider has two choices:
1. it offers Javadoc for each platform
2. it generates the Javadoc so that the 
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
directive is inside the Javadoc.

Eclipse does nothing but display the given HTML in the browser widget i.e. I would assume that the same Javadoc will also not render correctly when you open it with your browser outside Eclipse.
Comment 2 Victor Homyakov CLA 2011-10-19 05:22:52 EDT
I was not talking about HTML. I was talking about Javadoc in source codes.
Comment 3 Dani Megert CLA 2011-10-19 05:29:46 EDT
(In reply to comment #2)
> I was not talking about HTML. I was talking about Javadoc in source codes.

I see. We've added the support to specify the encoding in the 3.7 release.

*** This bug has been marked as a duplicate of bug 303511 ***
Comment 4 Victor Homyakov CLA 2011-10-19 05:51:07 EDT
> I see. We've added the support to specify the encoding in the 3.7 release.

I am using the 3.7:
Eclipse Java EE IDE for Web Developers.
Version: Indigo Service Release 1
Build id: 20110916-0149

But I still don't see the way to specify the encoding for one particular library and make sources readable.

> *** This bug has been marked as a duplicate of bug 303511 ***

I've read bug 303511 and see that my case is falling under scenario 3 (The source attachment is an external folder or archive), and this case was not fixed in patch - Eclipse uses project encoding, exactly as I described.
Comment 5 Dani Megert CLA 2011-10-19 06:57:11 EDT
> I've read bug 303511 and see that my case is falling under scenario 3 (The
> source attachment is an external folder or archive), and this case was not
> fixed in patch - Eclipse uses project encoding, exactly as I described.

It should use the workspace encoding for external JARs, hence as a workaround you can set the workspace encoding to Cp1251 or put the attachments into a folder in your workspace which as the Cp1251 encoding.
Comment 6 Victor Homyakov CLA 2011-10-19 08:47:26 EDT
(In reply to comment #5)
> It should use the workspace encoding for external JARs, hence as a workaround
> you can set the workspace encoding to Cp1251

I cannot set the workspace encoding to more than one value (UTF-8 and Cp1251) simultaneously. So either JAR with UTF-8 source or JAR with Cp1251 source will be unreadable. As I wrote, the preferred way (for me) is to assign encoding to each used source manually.

> or put the attachments into a
> folder in your workspace which as the Cp1251 encoding.

This works great for manually attached libraries. And how to deal with Maven dependencies (sources are attached by Maven plugin from ~/.m2/repository/)?
Comment 7 Dani Megert CLA 2011-10-20 03:24:09 EDT
I do see your point. We can reopen this bug as a feature request if you want.
Comment 8 Victor Homyakov CLA 2011-10-20 04:14:38 EDT
(In reply to comment #7)
> We can reopen this bug as a feature request if you want.
Reopen, please.

Thanks for your help and patience!
Comment 9 Ayushman Jain CLA 2011-10-20 04:19:30 EDT
Jay, please take a look. Thanks!
Comment 10 Dani Megert CLA 2011-10-20 04:26:20 EDT
This is a follow up of enhancement 303511. We'll need to see whether we have time to do this.
Comment 11 Srikanth Sankaran CLA 2011-11-07 00:42:57 EST
Will investigate/consider for M4.
Comment 12 Srikanth Sankaran CLA 2011-11-30 00:17:29 EST
Since Jay had to go on unavoidable unplanned time off for several days,
this is not ready: retargetting to 3.8 M5.
Comment 13 Jay Arthanareeswaran CLA 2012-01-05 11:47:30 EST
Created attachment 209089 [details]
Proposed patch

I have introduced a new classpath attribute for storing the source attachment encoding in the .classpath file. Now the order of encoding selection will be in the following order:

1. Encoding set on the resource representing the source file (.java or .zip)
2. Encoding specified for the corresponding classpath entry
3. Encoding of the project enclosing the source, if a member
4. Workspace encoding

I will release this fix after some more testing.
Is there a bug to handle the UI part already?
Comment 14 Dani Megert CLA 2012-01-06 02:21:28 EST
> I will release this fix after some more testing.
Please wait until we have verified it through the UI part.

> Is there a bug to handle the UI part already?
Not that I'm aware of.
Comment 15 Jay Arthanareeswaran CLA 2012-01-06 05:36:16 EST
Raised bug 368015 for the UI part.
Comment 16 Deepak Azad CLA 2012-01-18 00:44:16 EST
(In reply to comment #13)
> Now the order of encoding selection will be in
> the following order:
> 
> 1. Encoding set on the resource representing the source file (.java or .zip)
> 2. Encoding specified for the corresponding classpath entry
> 3. Encoding of the project enclosing the source, if a member
> 4. Workspace encoding

This looks quite wrong to me. Take the case when the source attachment is a source folder in the workspace
- You will almost never have the encoding set on individual .java files, hence you move to step 2
- If an encoding is specified on the classpath it will be used.
- However, the canonical encoding in this case is the encoding on the container (project or workspace), but it is never used.

In short, for source attachments from the workspace the new attribute should not have any meaning. Ideally, we should not even allow to set the attribute for attachments from workspace, but I guess it is also OK if we silently ignore it.
Comment 17 Jay Arthanareeswaran CLA 2012-01-18 00:58:11 EST
(In reply to comment #16)
> This looks quite wrong to me. Take the case when the source attachment is a
> source folder in the workspace
> - You will almost never have the encoding set on individual .java files, hence
> you move to step 2

Though I agree that it's unlikely, we can't not support that case.

> - If an encoding is specified on the classpath it will be used.
> - However, the canonical encoding in this case is the encoding on the container
> (project or workspace), but it is never used.
> 
> In short, for source attachments from the workspace the new attribute should
> not have any meaning. Ideally, we should not even allow to set the attribute
> for attachments from workspace, but I guess it is also OK if we silently ignore
> it.

If you look at comment #4, one of the requirements is to be able to support encoding at classpath entry (or library) level. If the user knows that the source attachment is inside the project/workspace and fine with the project/workspace encoding being used, he can very well do that by not setting the encoding for the library.
Comment 18 Deepak Azad CLA 2012-01-18 01:20:24 EST
(In reply to comment #17)
> If you look at comment #4, one of the requirements is to be able to support
> encoding at classpath entry (or library) level. If the user knows that the
> source attachment is inside the project/workspace and fine with the
> project/workspace encoding being used, he can very well do that by not setting
> the encoding for the library.

By that logic the algorithm should be
1. Encoding specified for the corresponding classpath entry
2 a. Encoding set on the resource representing the source file (.java or .zip)
  b. Encoding of the project enclosing the source, if a member
  c. Workspace encoding

2 a,b,c constitute a well defined algorithm to compute the encoding for an artifact in the workspace. I don't think we should break that. 

In any case I still think that it would be better to not use the new classpath attribute for source attachments from workspace.
Comment 19 Jay Arthanareeswaran CLA 2012-01-18 01:44:20 EST
(In reply to comment #18)
> By that logic the algorithm should be
> 1. Encoding specified for the corresponding classpath entry
> 2 a. Encoding set on the resource representing the source file (.java or .zip)
>   b. Encoding of the project enclosing the source, if a member
>   c. Workspace encoding
> 
> 2 a,b,c constitute a well defined algorithm to compute the encoding for an
> artifact in the workspace. I don't think we should break that. 
> 
> In any case I still think that it would be better to not use the new classpath
> attribute for source attachments from workspace.

(b) and (c) are not explicitly set by the user or the author of the file. It's an algorithm used by the tool and we shouldn't really let it dictate terms to the user. If there is an encoding set explicitly, be it a resource, or a classpath entry, it must be given a higher priority than a default.

Think about this scenario: User has a source attachment, for which he wants to use a particular encoding. And let's say a particular file X.java in the source attachment has a different encoding than other files. How else can we support this?
Comment 20 Deepak Azad CLA 2012-01-18 02:01:13 EST
(In reply to comment #19)
> (b) and (c) are not explicitly set by the user or the author of the file. It's
> an algorithm used by the tool and we shouldn't really let it dictate terms to
> the user. If there is an encoding set explicitly, be it a resource, or a
> classpath entry, it must be given a higher priority than a default.
> 
> Think about this scenario: User has a source attachment, for which he wants to
> use a particular encoding. And let's say a particular file X.java in the source
> attachment has a different encoding than other files. How else can we support
> this?

... so you are saying that the encoding for the source attachment will be different from the encoding for the same resource accessed from its original location in the workspace?
Comment 21 Jay Arthanareeswaran CLA 2012-01-18 02:13:39 EST
(In reply to comment #20)
> ... so you are saying that the encoding for the source attachment will be
> different from the encoding for the same resource accessed from its original
> location in the workspace?

Yes, only if the user has opted for a different encoding than the project's encoding and the individual resource doesn't have any encoding set on it. And I would assume that the user knows what he is doing. When the resource is directly accessed, the content may not be displayed correctly, which is the case even today.
Comment 22 Deepak Azad CLA 2012-01-18 04:09:13 EST
(In reply to comment #21)
> And I
> would assume that the user knows what he is doing. When the resource is
> directly accessed, the content may not be displayed correctly, which is the
> case even today.

The idea should be to specify one property in only once place, and for source attachments from workspace I think the encoding is best defined in the original location, something we should not change.

As part of bug 368015 we can also prevent users from setting the encoding for attachments from workspace.

Also, the algorithm to compute the encoding is not trivial it needs to be documented along with the new API org.eclipse.jdt.core.IClasspathAttribute.SOURCE_ATTACHMENT_ENCODING.
Comment 23 Jay Arthanareeswaran CLA 2012-01-18 04:36:32 EST
(In reply to comment #22)
> The idea should be to specify one property in only once place, and for source
> attachments from workspace I think the encoding is best defined in the original
> location, something we should not change.
>
> As part of bug 368015 we can also prevent users from setting the encoding for
> attachments from workspace.

Are you saying that the user will have to set the encoding on the library's source attachment for external folders and on the resource if the source attachment is part of the workspace? The user will be confused if he doesn't see the encoding for some attachments, won't he?

For consistency sake, I would prefer it to be provided for all libraries in one place - where we attach the source. Perhaps, we can warn the user if it's an internal resource and if there is already an encoding set on the resource.
 
> Also, the algorithm to compute the encoding is not trivial it needs to be
> documented along with the new API
> org.eclipse.jdt.core.IClasspathAttribute.SOURCE_ATTACHMENT_ENCODING.

I agree, the document should be updated.
Comment 24 Dani Megert CLA 2012-01-18 05:49:13 EST
The encoding property must only be defined and editable in the UI for case 3 from bug 303511 comment 1. If the property is not set, the workspace encoding is used.

If possible JDT Core should also define and implement the property like that. However, I'm OK, if only the UI imposes the restriction.
Comment 25 Deepak Azad CLA 2012-01-19 05:00:56 EST
(In reply to comment #24)
> the UI imposes the restriction.
I have imposed the restriction in bug 368015.

Jay, your patch is OK for me once you update the doc.
(In reply to comment #23)
> > Also, the algorithm to compute the encoding is not trivial it needs to be
> > documented along with the new API
> > org.eclipse.jdt.core.IClasspathAttribute.SOURCE_ATTACHMENT_ENCODING.
> 
> I agree, the document should be updated.
Comment 26 Jay Arthanareeswaran CLA 2012-01-19 05:06:08 EST
(In reply to comment #25)
> (In reply to comment #24)
> > the UI imposes the restriction.
> I have imposed the restriction in bug 368015.
> 
> Jay, your patch is OK for me once you update the doc.

Thanks, Deepak. I have a patch with the updated doc. I will also test this fix with the UI and after that I will post/release the patch. Will do that after completing the 3.7.2 testing.
Comment 27 Dani Megert CLA 2012-01-19 11:01:17 EST
(In reply to comment #14)
> > I will release this fix after some more testing.
> Please wait until we have verified it through the UI part.

You're good to go now.
Comment 28 Jay Arthanareeswaran CLA 2012-01-19 23:08:22 EST
Created attachment 209785 [details]
Updated patch

Same fix, with update documentation for the new classpath attribute.
Comment 29 Jay Arthanareeswaran CLA 2012-01-20 01:39:18 EST
(In reply to comment #28)
> Created attachment 209785 [details]
> Updated patch
> 
> Same fix, with update documentation for the new classpath attribute.

Tested the fix with the UI portion (bug 368015) and things look good, though I could test only the external libraries through the UI. Released with few cosmetic changes on the doc + additional tests for external libraries.

http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=a74cc2e7acd1e0aec09bded5c981fd784a55b04c
Comment 30 Deepak Azad CLA 2012-01-20 01:42:05 EST
I got a mid-air collision while trying to say...

"Hold on with releasing the patch. Looks like some more work needs to be done to
support jars in a container e.g. rt.jar. I am currently investigating this."
Comment 31 Jay Arthanareeswaran CLA 2012-01-20 01:54:01 EST
(In reply to comment #30)
> I got a mid-air collision while trying to say...
> 
> "Hold on with releasing the patch. Looks like some more work needs to be done
> to
> support jars in a container e.g. rt.jar. I am currently investigating this."

I did play around with containers and they looked to be alright. Let me know your findings. We still have time for the warm-up build.
Comment 32 Deepak Azad CLA 2012-01-20 01:59:28 EST
(In reply to comment #31)
> I did play around with containers and they looked to be alright.
Really? Can you view and edit the source attachment encoding for rt.jar ?
Comment 33 Jay Arthanareeswaran CLA 2012-01-20 02:09:40 EST
(In reply to comment #32)
> Really? Can you view and edit the source attachment encoding for rt.jar ?

Arrg, I see the value entered doesn't get stored if I come back again! But it's okay for individual jars.
Comment 34 Dani Megert CLA 2012-01-20 03:19:10 EST
(In reply to comment #33)
> (In reply to comment #32)
> > Really? Can you view and edit the source attachment encoding for rt.jar ?
> 
> Arrg, I see the value entered doesn't get stored if I come back again! But it's
> okay for individual jars.

Reopening then.
Comment 35 Srikanth Sankaran CLA 2012-01-21 00:42:05 EST
(In reply to comment #32)
> (In reply to comment #31)
> > I did play around with containers and they looked to be alright.
> Really? Can you view and edit the source attachment encoding for rt.jar ?

Do we have a consensus as to whether the JDT/Core side of things
is incorrect or merely incomplete (or is complete and is not really
contributing to the problem scenario) ?
Comment 36 Deepak Azad CLA 2012-01-21 01:37:22 EST
This bug can be closed now. (See also bug 368015 comment 17)
Comment 37 Satyam Kandula CLA 2012-01-24 09:36:45 EST
Verified for 3.8M5 using build I20120123-1800
Comment 38 Victor Homyakov CLA 2012-07-17 12:28:59 EDT
Does not work in 3.8 (Version: 3.8.0 Build id: M20120626-2030) for sources downloaded by Maven plugin (m2e) - encoding is always "Default (UTF-8)".

Steps to reproduce:
1. Install M2E - Maven Integration for Eclipse (version 1.1.0.20120530-0009 for Eclipse 3.8).
2. Window -> Preferences -> Maven -> check "Download Artifact Sources" option.
3. Create Maven project or enable Maven dependency management on existing project (right-click on project -> Configure -> Convert to Maven Project).
4. Add to project dependency with sources in non-UTF-8 encoding.
5. Open Project Explorer, go to project -> Maven Dependencies, right click on dependency, change encoding, apply changes. After reopening encoding will be again "Default (UTF-8)".
Comment 39 Dani Megert CLA 2012-07-18 02:23:37 EDT
(In reply to comment #38)
> Does not work in 3.8 (Version: 3.8.0 Build id: M20120626-2030) for sources
> downloaded by Maven plugin (m2e) - encoding is always "Default (UTF-8)".
> 
> Steps to reproduce:
> 1. Install M2E - Maven Integration for Eclipse (version 1.1.0.20120530-0009 for
> Eclipse 3.8).
> 2. Window -> Preferences -> Maven -> check "Download Artifact Sources" option.
> 3. Create Maven project or enable Maven dependency management on existing
> project (right-click on project -> Configure -> Convert to Maven Project).
> 4. Add to project dependency with sources in non-UTF-8 encoding.
> 5. Open Project Explorer, go to project -> Maven Dependencies, right click on
> dependency, change encoding, apply changes. After reopening encoding will be
> again "Default (UTF-8)".

Please file a new bug report with steps to reproduce the problem using a plain Eclipse SDK:
http://download.eclipse.org/eclipse/downloads/drops4/R-4.2-201206081400/
to rule out Maven specific issue.
Comment 40 Victor Homyakov CLA 2012-07-18 06:33:26 EDT
(In reply to comment #39)
> Please file a new bug report with steps to reproduce the problem using a plain
> Eclipse SDK:
> http://download.eclipse.org/eclipse/downloads/drops4/R-4.2-201206081400/
> to rule out Maven specific issue.

OK