Bug 106832 - [content type] Add known content types
Summary: [content type] Add known content types
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.1   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Platform-Resources-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 106856 229897 (view as bug list)
Depends on: 519534 155323
Blocks:
  Show dependency tree
 
Reported: 2005-08-12 08:34 EDT by Dani Megert CLA
Modified: 2017-07-12 05:22 EDT (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2005-08-12 08:34:35 EDT
R3.1

It would be a good thing to define known content types like JAR in the Platform
(see bug 106830).
Comment 1 John Arthorne CLA 2005-08-12 16:46:32 EDT
*** Bug 106856 has been marked as a duplicate of this bug. ***
Comment 2 Rafael Chaves CLA 2005-08-15 13:46:42 EDT
I agree the SDK could be contributing more content types, but I think we should
restrain ourselves to things the SDK deals with. That could include JAR files,
but I am not sure about graphical image files.

There is a tension here between the extensible nature of Eclipse (we don't want
to push to much stuff onto our clients) and the need to provide a good out of
the box experience for end users of the SDK.

Right now I don't know how to address is. Suggestions are welcome.
Comment 3 Martin Olsson CLA 2005-08-16 13:10:12 EDT
Yeah sure but .png and .gif needs to be added somewhere. If not im platform,
maybe in JDT or something like that.
Comment 4 Dani Megert CLA 2006-01-31 02:59:11 EST
>but I think we should restrain ourselves to things the SDK deals with.
In my opinion it would make sense to at least divide known content types into text and binary (i.e. non-text). See also bug 125639, bug 106856 and bug 106830.
Comment 5 David Williams CLA 2006-01-31 03:26:06 EST
I'd second the motion to have a generic parent for binary content types. I think this would give two concrete advantages. 1. the platform (at some level) could classify "obvious" binary files, like jpeg, gif, as 'binary', but if an add-on provider really provided special function for that content type, such as some  say for image editing, then the provider could define an "image content type" and then the more specific definition would be used (so, platform is not overstepping its scoped mission). 

And, 2. currently there is no way for a user to specify something as "binary" via preference pages (and hence to be avoided in searches, indexing, etc.) ... I know I've got a few spreadsheets in some test projects and I'd prefer to jsut add .xls to a "binary" content type to avoid searching and indexing them, etc. 
Comment 6 Rafael Chaves CLA 2006-03-23 11:16:05 EST
Re: comment 0.

According to the ZIP specification, a ZIP file starts with: central a 4 byte signature: (0x02014b50). To support a ZIP content type, it would be just a matter of declaring it associated to the "zip" extension and using a BinarySignatureContentDescriber with the given signature.

A JAR content type could be declared (by JDT?) as a sub content type of the ZIP content type contributed by Runtime that just associates it to the "jar" extension. This would preclude JAR files that have the "zip" extension, but I think this might be good enough. Doing better than that would require further analysis of the ZIP contents, a price the content description mechanism can't afford to pay (not to mention manifests are optional, so JAR files are basically ZIP files that *we know* contain Java classes and resources in them).

Note though that IIRC, these changes are considered API changes (and the API is frozen for 3.2).
Comment 7 Rafael Chaves CLA 2006-03-23 11:19:19 EST
Re: comment 4 - the intended way for clients to tell text and binary content types apart is:

IContentType text =
  Platform.getContentTypeManager().getContentType(
    IContentTypeManager.CT_TEXT);

boolean isMyContentTypeText = myContentType.isKindOf(text);
Comment 8 Dani Megert CLA 2006-03-24 11:06:24 EST
The important part of my comment 4 is "known content types" (I should have better written "commonly known") i.e. we both know that e.g. an ".html" or ".htm" file is 'text' but currently the Eclipse SDK doesn't recognize it as text when using the code from comment 7 (I knew about that code ;-) and that's exactly the point about this PR: unless the commonly known content types are provided, the code from comment 7 isn't worth much. For example text actions, like converting line delimiters don't operate on html files unless a plug-in is installed that declares said content type.
Comment 9 Rafael Chaves CLA 2006-03-24 11:26:10 EST
I understand now. One strategy to deal with that could be: for all popular text formats we don't have a content type in the SDK, "someone" associates the corresponding file extensions to either the basic text, xml or properties content types (declared by runtime). For instance, the "htm" and "html" extensions could be associated to the Text content type. In the Eclipse SDK, *.html files will be treated as plain text files. In WTP, they will belong to the HTML content type. The key is that if two content types accept the same file name, the most specific wins (HTML extends Text). This should work.
Comment 10 Rafael Chaves CLA 2006-03-24 11:45:09 EST
See "Additional file associations": 

http://help.eclipse.org/help31/topic/org.eclipse.platform.doc.isv/guide/runtime_content_contributing.htm
Comment 11 David Williams CLA 2006-03-24 23:04:28 EST
Seems like a good idea to me to add commonly known text formats as Text. 
But, if there is not a generic "binary" one, there is no way for a user 
to add some file extension as plain "binary". Put another way, in addition to the many commonly known Text types ... there's many commonly known Binary types ... such as GIF, JPG. 
Comment 12 David Williams CLA 2006-03-25 18:41:41 EST
Just to dwell on this, if its not obvious, another way to think if the issue is that, to me, there are known text types, and known binary types, and unknown types. And, yes, the best assumption to make about unknown types is that they are binary, but that assumption is a matter of policy, and I can imagine cases where "known binary" and "assumed binary" might want to be treated differently. For example, a code repository system might want to verify with a user that's the intent, if it was "unknown". 

Perhaps I do not understand the resistence to this idea. Is there some performance or storage implication I'm not aware of? 
Comment 13 Rafael Chaves CLA 2006-03-27 19:05:45 EST
David, my role here is just advisory, as the original designer (along with Andre Weinand) and maintainer of the content type framework, as I am no longer part of the Platform/Core team (since August last year). I am in no special position to veto or implement any solution.

My opposition to what you suggest is based on the following:

First, there is no interesting property that is shared by all binary content types, as there are for text content types, so having a root content type for binary content types creates an artificial concept, only as a placeholder for .

Second, that design decision (having content types extending Text be considered text content types and everything else considered binary) is widespread throught the API and implementation. Changing that now would be really hard, at least from the perspective of backward compatibility. For instance, what do we do with existing binary content types?

About your scenario: what CVS does is to present all files it does not recognize (they all default to binary) to the user, that can then change those that should be considered text ('ascii', in CVS lingo).

It seems all you want is some place where the knowledge about file extensions that user has decided to mark as binary, so he does not have to be asked again. Is that right? The other thing of having 'text' as being the default for files of unknown content types and relying on the user to mark the exceptions as binary files sounds dangerous to me (see duplicates).
Comment 14 Rafael Chaves CLA 2006-03-29 11:33:31 EST
Talked to Jeem about this and his take is that the change suggested in comment 9 is safe at this point as it is not a breaking change. Pascal, I could look into making that happen (a new PR would be warranted).

The suggestion I made in comment 6 would be a breaking change (not being appropriate at this time) as there is a possibility that binary content types defined for the same extension already exist.
Comment 15 David Williams CLA 2006-04-06 13:59:06 EDT
(In reply to comment #13)

> 
> It seems all you want is some place where the knowledge about file extensions
> that user has decided to mark as binary. 
> 

Well, mostly, but to me there is a fundamental big difference between "unknown" and "binary". And, myself, I would prefer there was a thought out in advance architecture for capturing that distinction, or else each component will be tempted to come up with their own schemes and heuristics and policies, so will be impossible to change later ... oh, maybe we're already there :) 

I do think addiing known text content types to text ... the original topic of this bug ... would be a big improvement ... so, I'll try to stop whining about it for a while :) 



Comment 16 Rafael Chaves CLA 2006-04-07 12:47:10 EDT
Unfortunately, the suggestion I made in comment 9 does not work (amazingly I forgot the content type resolution rules myself). I can't think of any solution that does not require a significant amount of code/API changes.

For 3.3, I would suggest the content type aliasing mechanism to be extended to support aliases based on file names/extensions. See "Content type aliasing":

http://help.eclipse.org/help31/index.jsp?topic=/org.eclipse.platform.doc.isv/guide/runtime_content_contributing.htm
Comment 17 David Williams CLA 2006-04-07 12:54:06 EDT
(In reply to comment #16)
> Unfortunately, the suggestion I made in comment 9 does not work (amazingly I
> forgot the content type resolution rules myself). 

What it is that doesn't work? And why? (I'm surprised too, so want to make sure I keep educated :) 

Comment 18 Rafael Chaves CLA 2006-04-07 13:17:31 EDT
Well, it almost works...

Imagine we associate *.htm and *.html to the basic text content type, and the user has plug-ins that contribute a real HTML content type (WTP) and HTML editor associated to it. When two content types are associated to the same file name, and one extends the other, the chosen content type will be:
1) the one that better matches the contents
2) if both deem the contents as VALID, the most specific one wins
3) if both deem the contents as INDETERMINATE, the most general one wins

So my suggestion does not always work because of case 3. Text-based content types are expected to return INDETERMINATE when the file is empty, or incomplete. In that case, the Text content type will win, so the user will be forced to use a text editor and manually guarantee the file contents are VALID for the HTML content type (let's say, writing "<html></html>", closing the text editor, and opening the file again). For advanced users, that would be fine. For the majority of users, I would say that this behavior would not be acceptable.
Comment 19 David Williams CLA 2006-04-07 14:03:46 EDT
Ah ... the empty file case ... hm, I think we return valid when its empty!? 
Is that a critical part of the behavior? Or would just be an API change? 

I see what you mean. Typically, though, when a content type is provided by downsteam component, users create new files with wizards, that have some amount of valid content ... so, IMHO, would not be a show stopper. 

But ... maybe gives more weight to 
my "binary" vs. "text" vs. "undefined" argument :) 
(Ooohh, that wasn't very long at all, before I started whining again :) 



Comment 20 John Arthorne CLA 2007-01-18 16:29:46 EST
Moving content type bugs to the resources component.
Comment 21 John Arthorne CLA 2007-01-18 16:37:49 EST
I'm truly sorry for all the spam.  Apparently I don't know how to use the "change several bugs at once" feature.
Comment 22 Szymon Ptaszkiewicz CLA 2012-12-10 11:08:50 EST
*** Bug 229897 has been marked as a duplicate of this bug. ***
Comment 23 Mickael Istria CLA 2017-07-12 05:21:35 EDT
So we have images in place (bug 155323) and I've opened bug 519534 to track archive/zip.
Bug #191525 is also quite interesting in making the Platform more able to detect Text-based files as Text content-type (which by opposition means that it would detect binary files as files that are not text).