platform-core-home/documents/content_types.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.7 - (view) (download) (as text)

1 : rchaves 1.1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2 :     <html>
3 :     <head>
4 :     <title>Central content type catalog for Eclipse</title>
5 : dj 1.4 <link rel="stylesheet" href="../default_style.css" type="text/css">
6 : rchaves 1.1 </head>
7 :     <body text="#000000" bgcolor="#ffffff">
8 :     <h1>A central content type catalog for Eclipse</h1>
9 : rchaves 1.3 <p><font size="-1">Last modified: May 12th, 2004</font> </p>
10 : rchaves 1.1 <p><cite><strong>Plan item description:</strong> Content-type-based editor lookup.
11 :     The choice of editor is currently based on file name pattern. This is not very
12 :     flexible, and breaks down when fundamentally different types of content are
13 :     found in files with undistinguished file names or internal formats. For example,
14 :     many different models with specialized editors get stored in XML format files
15 :     named *.xml. Eclipse should support a notion of content type for files and resources,
16 :     and use these to drive decisions like which editor to use. This feature would
17 :     also be used by team providers when doing comparisons based on file type. The
18 :     several existing file-type registries in Eclipse should be consolidated. [Platform
19 :     Core, Platform UI] [Theme: User experience] (bug <a
20 :     href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>, <a
21 :     href="http://dev.eclipse.org/bugs/show_bug.cgi?id=51791">51791</a>, <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52784">52784</a>)
22 :     </cite></p>
23 :     <p>This plan item is about two important features: </p>
24 :     <ol>
25 :     <li>a single content-type repository to be provided by Eclipse on top of which
26 :     content-type related features provided by any plugins could be built upon, and</li>
27 :     <li>a mechanism for automatically determining the content type given a file name and/or its
28 :     contents.</li>
29 :     </ol>
30 :     <h2>Driving forces</h2>
31 :     <ul>
32 :     <li>the catalog must be extensible: plug-ins must be able to contribute new
33 :     content types;</li>
34 :     <li>content types must have an identity, a unique identifier by which they can
35 :     be unambiguously retrieved from the catalog;</li>
36 :     <li>content types should be hierarchical: new content types are very often specializations
37 :     of existing ones (example, Ant Scripts and Plugin manifests are kinds of XML
38 :     documents, XML is a kind of text document), so it should be possible for new
39 :     content types to inherit interesting properties from existing ones (see <a href="#FAQ-hierarchy">FAQ</a>);</li>
40 :     <li>content types have either a predominantly binary or text nature;</li>
41 :     <li>content types are associated with specific file names/extensions;</li>
42 :     <li>some level of automatic content/name based type discovery must be provided;</li>
43 :     <li>existing plug-in-specific content type registry could be replaced/built
44 :     upon the central catalog;</li>
45 :     <li>encoding determination is strongly related concern and should be taken into
46 :     consideration when sketching a solution.</li>
47 :     </ul>
48 :     <h3>On automatic content type detection</h3>
49 :     <p>Content types determine many properties and actions related to files such as
50 :     encoding, associated editors, etc. Automatic content type determination allows
51 :     content type specific actions without requiring the user to manually define
52 :     the content type for a given file. Content type detection is based on:</p>
53 :     <ul>
54 :     <li>file selection specifications</li>
55 :     <li>file contents</li>
56 :     </ul>
57 :     <p>Content type determination based on file name/extension (&quot;file selection
58 :     specs&quot;) is the easiest one to compute. Each content type has a set of file
59 :     selection specs associated to it. Determining the content type corresponding
60 :     to a file selection spec is done by a simple lookup on the catalog. </p>
61 :     <p>Content type determination based on file contents is more complex, and requires
62 :     examining the contents. Since we are talking about an open set of possible content
63 :     types, this examination implies in delegation to content type detectors contributed
64 :     by other plug-ins (content describers).</p>
65 :     <h2>Solution</h2>
66 :     <h3>The proposed API</h3>
67 :     <p>The proposed API contains 4 new interfaces in a new package called <code>org.eclipse.core.runtime.content</code>:</p>
68 :     <ul>
69 :     <li><code><a href="#IContentType">org.eclipse.core.runtime.content.IContentType</a></code></li>
70 :     <li><code><a href="#IContentTypeManager">org.eclipse.core.runtime.content.IContentTypeManager</a></code></li>
71 :     <li><code><a href="#IContentDescription">org.eclipse.core.runtime.content.IContentDescription</a></code></li>
72 :     <li><code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code></li>
73 :     </ul>
74 :     <p>Following is a brief description for each of them. </p>
75 :     <h4><code><a name="IContentType"></a>org.eclipse.core.runtime.content.IContentType</code></h4>
76 :     <p>Represents a content type in the platform. <code>IContentType</code> instances
77 :     are provided by the platform, built from extensions to the <code>org.eclipse.core.runtime.contentTypes</code>
78 :     extension point. Relevant properties for <code>IContentType</code> are:</p>
79 :     <ul>
80 :     <li>unique id (example: org.eclipse.core.runtime.xml), which is based on the
81 :     plug-in's unique identifier (the registry namespace);</li>
82 :     <li>user-friendly name (example: Text, or XML document, or ZIP file, );</li>
83 :     <li>file selection spec - comma-separated lists of associated file names and
84 :     extensions;</li>
85 :     <li>default charset (example: ISO-8859-1, for Java properties files);</li>
86 :     <li>content describer (see <code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code>),
87 :     a class that knows how to recognize if a given stream of bytes contains compatible
88 :     to the content type, and how to extract other content-type specific information
89 :     from the stream.</li>
90 :     </ul>
91 :     <p>Also, <code>IContentType</code> provides methods that check whether the given
92 :     file name is matched by this content type file selection spec, or whether a
93 :     content type is a subtype of another content type.</p>
94 :     <h4><a name="IContentTypeManager"></a><code>org.eclipse.core.runtime.content.IContentTypeManager</code></h4>
95 :     <p>Represents the content type registry. Provides methods for obtaining the content
96 :     type associated to a file name, and for discovering the corresponding content
97 :     type for a stream of bytes. <code>IContentTypeManager</code> allows clients
98 :     to:</p>
99 :     <ul>
100 :     <li>retrieve the content type for a given id;</li>
101 :     <li>retrieve a set of content types associated to a given file name;</li>
102 :     <li>discover which content types recognize a given stream as a valid sample
103 :     for the corresponding file format;</li>
104 :     <li>obtain a description for a stream of bytes, including platform (such as
105 :     encoding) and custom (content type specific) properties.</li>
106 :     </ul>
107 :     <h4><code><a name="IContentDescriber"></a>org.eclipse.core.runtime.content.IContentDescriber</code></h4>
108 :     <p>Content-based content type detection and content description rely on specialized
109 :     content detectors associated to content types. When a content type is contributed
110 :     to the platform, a content describer class may be provided. Content describers
111 :     are able to detect if a given stream of bytes is conformant to the content type
112 :     file format, and may also be able to extract important properties from the contents,
113 :     such as what charset was used to encode the contents (for text files), and any
114 :     content type specific information that may be required.</p>
115 :     <p>The main method in <code>IContentDescriber</code> is:</p>
116 :     <p><code>int describe(InputStream contents, IContentDescription description, int
117 :     optionsMask) throws IOException;</code></p>
118 :     <p>The first thing implementations for this method must do is to check if the
119 :     contents represent a valid sample for their corresponding content type file
120 :     format. If not (or if cannot be determined), this method should exit immediately,
121 :     returning <code>IContentDescription.INVALID</code> or <code>IContentDescription.INDETERMINATE</code>,
122 :     depending on how strict the file format is. Otherwise, this method should return
123 :     <code>IContentDescription.VALID</code>, but only after trying to provide all
124 :     required information (according to the specified options, if any) by reading
125 :     the contents and filling the <a href="#IContentDescription">content description</a>
126 :     provided.</p>
127 :     <p><strong><em>Note</em></strong><em>: it is essential that for this mechanism
128 :     to work in a suitable manner, the execution of content describers by the platform
129 :     should not cause the activation of the plugins providing them. In the Eclipse
130 :     3.0 runtime, plug-ins that have built-in bundle manifests will be able to selectively
131 :     enable/disable auto-activation on a per-package basis (for more information,
132 :     see <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52393">bug 52393</a>).
133 :     Although this will not be enforced by the platform, content describers <strong>must</strong>
134 :     be self-contained and not trigger auto-activation.</em></p>
135 :     <h4><code><a name="IContentDescription"></a>org.eclipse.core.runtime.content.IContentDescription</code></h4>
136 :     <p>Content descriptions are obtained by calling <code>IContentTypeManager#getDescriptionFor</code>
137 :     method. A content description contains interesting information (such as encoding)
138 :     about an arbitrary stream of bytes. These information are filled partially by
139 :     the platform and partially by the content describer elected (if any).</p>
140 :     <h3>Conflict resolution</h3>
141 : rchaves 1.3 <p>Content types are managed by the platform but plug-ins
142 : rchaves 1.1 are in charge of contributing content types. While this provides good flexibility,
143 :     it also opens oportunities for conflicts. There are a few scenarios where conflicts
144 :     may arise:</p>
145 :     <ol>
146 :     <li><em>two content types provided by independently developed plug-ins intended
147 :     for the same file selection specification</em> - this may happen for popular
148 :     file types that are not provided by the platform. In this case, one of the
149 :     content types will be automatically made into an alias for the other. The
150 :     intended effect is that content types/code depending on the content type transformed
151 :     into an alias should still work, acting on the alias target instead. Content
152 :     type priorities help deciding which content type should be preserved, and
153 :     which ones made into aliases. If a plug-in provider feels it should be the
154 :     official provider of a content type, a high priority should be assigned to
155 :     it. Usually, &quot;normal&quot; priorities (the default value) should be used,
156 :     meaning that if a similar high priority content type is available, it should
157 :     be picked instead. </li>
158 :     <li><em>two related (but different) content types that share a same file name/extension
159 :     specification</em> - think of general XML documents and Ant build scripts
160 :     (a sub-type of XML document, inheriting its file selection specifications).
161 :     This is different than the previous case here at most one content type will
162 :     be contributing file specs (the more basic type). In this scenario, for APIs
163 :     that return content-types based exclusively on file name, the ancestor will
164 :     be appear first in the returned array, since it is more general. Note that
165 :     when a general type specifies a file extension for it to be associated with,
166 :     and a subtype specifies a file name that has the same file extension, the
167 :     more specific type will appear before the general one. For APIs that do content-based
168 :     analysis, if both content type describers deem the contents as valid, the
169 :     more specific content type will also appear first. For two sibling content
170 :     types that deem a same set of bytes as valid, no ordering between them will
171 :     be enforced (aliasing is not done since those types do not explicitly specify
172 :     file specs - they inherit them instead).</li>
173 :     <li><em>two completely unrelated content types that share a same file name/extension
174 :     specification</em> - this is the more unlikely scenario (imagine an image
175 :     file format and a word-processor file format sharing the same file extension).
176 :     In this case, aliasing is not desirable (since the content types are fundamentally
177 :     different). User intervention is required (by disabling one of the content
178 :     types, for instance, and manually associating any editors, etc to the other
179 :     content type) to avoid incorrect aliasing.</li>
180 :     </ol>
181 :     <h3>Frequently Asked Questions</h3>
182 :     <ol>
183 :     <li> How will plugin providers benefit from a central content type catalog?
184 :     <p> Generally by using the same content type registry and sharing the same
185 : rchaves 1.2 concept of content/file type. Other examples are:
186 : rchaves 1.1 <ul>
187 :     <li>a builder could use its well-known content type to filter out files
188 :     whose names don't match with the content type file section spec;</li>
189 :     <li>the user interface could know what editors to offer for a given file
190 :     selected (associations between editors and content types should be kept
191 :     separately from the content type catalog);</li>
192 : rchaves 1.2 </ul></p>
193 : rchaves 1.1 </li>
194 :     <li><a name="FAQ-hierarchy"></a>Why are content types hierarchical?
195 :     <p>To allow important properties to be inherited by new specialized content
196 :     types:
197 :     <ul>
198 :     <li>the default charset</li>
199 :     <li>text/binary nature</li>
200 :     <li>content description</li>
201 :     <li>associations defined externally by plug-ins (for instance, any editors
202 :     associated with an ancestor should work with any descendants)</li>
203 :     </ul></p>
204 :     </li>
205 :     <li> What happens if the base type for a new content type is not present in
206 :     the platform (the plug-in that provides it is not available)?
207 :     <p>The content type (and consequently any descendants) will be deemed
208 :     invalid and ignored.</p>
209 :     </li>
210 :     <li><a name="FAQ-MIME"></a>Do Eclipse's content types have anything to do with
211 :     IANA's MIME Media Types?
212 :     <p>Not so far.</p>
213 :     </li>
214 :     <li>How can users customize the way content types are chosen?
215 : rchaves 1.2 <p>By:
216 : rchaves 1.1 <ul>
217 :     <li>associating additional file specs to existing content types</li>
218 :     <li>defining a content type as the default one for a given file spec (not
219 :     supported yet)</li>
220 :     <li>overriding content type defined attributes, such as default encoding
221 :     (not supported yet)</li>
222 : rchaves 1.2 </ul></p>
223 : rchaves 1.1 </li>
224 :     <li>Can plug-ins override the content type describer for an existing content
225 :     type?
226 :     <p>Not so far. It is up to the plugin provider to determine whether a content
227 :     type describer will be provided.</p>
228 :     </li>
229 :     <li>Can two completely unrelated content types be associated with the same file
230 :     spec?
231 :     <p>What happens is that only one of the content types (arbitrarily selected)
232 :     will be enabled. If only one of them is declared as high priority, it will
233 :     be picked. Otherwise, one will be arbitrarily chosen by the platform. In
234 :     either case, the others will be made into aliases for the elected content
235 :     type.</p>
236 :     </li>
237 :     <!--li>How are conflicts (two different content types associated to the same file)
238 :     prevented?
239 :     <p>They are not. At least, not automatically. It is up to clients to decide
240 :     what to do when more than one content type is offered by the platform. A
241 :     client that does not care about which one is picked, will randomly choose
242 :     one of them. A client that cares but does not know which one to choose may
243 :     refuse to use any of them. User-guided code may ask the user what should
244 :     be done. The content type chosen may be marked as the default one for the
245 :     file spec, or the user may want to mark one as an alias for the other.</p>
246 :     </li-->
247 :     <li>What if a given file name is matched by two different file specs provided
248 :     by two completely unrelated content types?
249 :     <p> As seem above, the only way this can happen is when two <em>different</em>
250 :     file specs (for instance, a file name and a file extension) accept the same
251 :     file name (for instance, one content type is associated with a &quot;xml&quot;
252 :     file extension, other is associated with a &quot;plugin.xml&quot; file name.
253 :     ) File name specs have priority over file extension specs (so plugin.xml
254 :     is a plugin manifest before being a XML document). The normal case is that
255 :     the content type that defines a file name spec is based on the file type
256 :     that defines a file extension spec (a plugin manifest is a kind of XML document).
257 :     This ensures that actions applicable to general XML documents will be applicable
258 :     to a plugin manifest.</p>
259 :     </li>
260 :     <li>What are content type aliases?
261 :     <p>When a content type is marked as an alias for another content type (due
262 :     to a file spec conflict), all of its properties are ignored, and any associations
263 : rchaves 1.2 with it will actually be made on the target type.</p></li>
264 : rchaves 1.1 <li>What are aliases for?
265 :     <p>It is a mechanism to prevent conflicts. When multiple plugins contribute
266 :     content types associated with the same file specs, we have a conflict. Conflicts
267 :     are bad because introduce ambiguity (which one is the right content type?).
268 :     Most of times when such conflicts arise, it is a case of independently developed
269 :     plugins trying to contribute the same content type (semantically speaking).
270 :     Aliasing between conflicting content types will be automatically created
271 :     when necessary.</p>
272 :     </li>
273 :     <li>How do I prevent my specialized content-type to be disabled even if its
274 :     parent is not available?
275 :     <p> Sometimes a plugin A does not depend on plugin B, but declares a content
276 :     type which is intended to be a specialization of another content type declared
277 :     by B. To prevent the content type declared by A to be disabled: </p>
278 :     <ol>
279 :     <li>declare a low priority content type associated with the same file specs
280 :     the intended parent is usually associated with (a placeholder);</li>
281 :     <li>make your specialized content type have this placeholder as its base
282 :     type;</li>
283 :     </ol>
284 :     <p>If the originally intended base type is available, your base type will
285 :     be marked as just an alias, and your specialized content type will be properly
286 :     attached to the official content type. Otherwise, the placeholder will be
287 :     elected, and although things might not be as great as intended (actions
288 :     associated to the original content type will not be available), your content
289 :     type will still be enabled.</p>
290 :     </li>
291 :     <li>When should a file-association be contributed instead of declaring a new
292 :     (derived) content type?
293 :     <p>New content types should be created only if there is no existing content
294 :     type with the semantics required. Otherwise, when only additional file specs
295 :     must be provided, file associations are the way to go.</p>
296 :     </li>
297 :     <li>Are file specs inherited?
298 :     <p> Only if none is specified in the sub type. </p>
299 :     </li>
300 :     <li>How does a client figure out whether a given file is a text file or not?
301 :     <p>The proposed approach is to check if the file's content type is a kind
302 : rchaves 1.2 of the &quot;org.eclipse.core.runtime.text&quot; content type, which is
303 :     intended to be the ancestor for all text oriented content types. If it turns
304 :     out to be a very frequent idiom, we might consider proving a convenience
305 :     API to do that.</p>
306 : rchaves 1.1 </li>
307 :     <li>Do content types have to contribute content describers?
308 :     <p> No, although if the file has a identifiable signature/format, it is recommended,
309 :     because improves the overall quality of content-based content type lookups.</p>
310 :     </li>
311 :     </ol>
312 : rchaves 1.5 <p> <em>Note: comments are encouraged. Any questions/concerns not addressed here
313 :     should be discussed in the platform-core-dev list, or bug <a
314 :     href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>. </em></p>
315 :     <h2>Addendum: issues to be addressed in the 3.1 cycle</h2>
316 : rchaves 1.7 <p><font size="-1">Last modified: January 17th, 2005</font></p>
317 : rchaves 1.5 <p>The solution described above was implemented and relatively succesful. Some
318 :     components took advantage of the new content type infrastructure, but still
319 :     in many cases file-association is being done in an ad-hoc manner. Also, no UI
320 :     was provided for customizing content types (such as changing the default encoding,
321 :     adding associations with files) so the user has no control on how the content
322 :     type detection mechanism works. Thus, the main issues to be addressed in the
323 :     3.1 cycle are:</p>
324 :     <ul>
325 :     <li>ensure the content type framework works for clients that have not adopted
326 :     it yet, or at least we understand why it is not (cannot be made) suitable
327 :     to them. Examples are: Platform/UI, Platform/CVS, and products built on top
328 :     of Eclipse.</li>
329 :     <li>ensure users are granted appropriate means so they can customize the behavior
330 :     of content type detection so things just work for them.</li>
331 :     <li>ensure content type resolution works (or can be made to work) appropriately
332 :     in setups where multiple independently developed products contribute conflicting
333 :     content types. </li>
334 :     </ul>
335 :     <h3>Support more use cases</h3>
336 : rchaves 1.7 <p>Ensure the content type support works for the SDK plug-ins and for products
337 :     built on top of Eclipse.</p>
338 : rchaves 1.6 <p>TBD</p>
339 : rchaves 1.5 <p><strong>Platform/UI - file/editor association</strong></p>
340 : rchaves 1.7 <p>Content type-editor association is definitely the most important use case for
341 :     the content type support. The basic idea is that for a given file or stream
342 :     of data, the UI should be able to:</p>
343 :     <ol>
344 :     <li>provide a default editor to be used when opening the file</li>
345 :     <li>present any other editors that are also able to handle the file</li>
346 :     <li>allow the user to pick a completely unrelated file not initially suggested</li>
347 :     <li>remember the user decision (if the user wants so) so it becomes the default
348 :     for that file</li>
349 :     </ol>
350 :     <p> <strong>1</strong>, <strong>2</strong> and <strong>4</strong> are currently
351 :     supported by the existing file-editor association mechanism. <strong>3</strong>
352 :     is being requested by users, and it is orthogonal (as 4 is) to the content type
353 :     support provided by runtime.</p>
354 :     <p>Content types add a level of indirection between files and editors. At a first
355 :     glance, there is no reason why changing the default editor would affect what
356 :     content type is assigned to a file, so users should be able to pick up any editors
357 :     without affecting what content type detection.</p>
358 : rchaves 1.5 <p><strong>Platform/Team - binary vs ascii files</strong></p>
359 : rchaves 1.7 <p>The Team plug-in keeps a catalog of file extensions and their expected content
360 :     type (either binary or ASCII). If content types were broadly adopted throughout the rest
361 :     of Eclipse (so that most files dealt with by users have a content type), couldn't the Team
362 :     plug-in use content type based encoding determination to figure out a good default for this
363 :     setting?</p>
364 : rchaves 1.5 <h3>Give users more power</h3>
365 : rchaves 1.7 <p>Ensure users have means to customize how the content type detection works for
366 :     them. Provide UI for content types. May provide some way of showing related
367 :     objects for a given content type (editors, views, comparators, etc). Users cannot
368 :     provide content type detection code, so user-defined content types would be
369 :     useful only for cases where content type detection is file name based (like
370 :     for non-formatted text files, such as source files, configuration files, etc).
371 :     </p>
372 :     <h3>Improve content type determination</h3>
373 : rchaves 1.5 <p>Ensure content type detection works (or can be made to work) appropriately
374 :     when incompatible products are deployed together.</p>
375 : rchaves 1.7 <ol>
376 :     <li><strong>nature-specific content types</strong> - to reduce chances for collisions,
377 :     there should be a way of constraining which content types are available on
378 :     a project basis. One possible solution would be to have content type-nature
379 :     associations. Natures could declare somre preferred content types, and those
380 :     preferred content types would always win.</li>
381 :     <li><strong>user-defined content type determination - </strong>two plug-ins
382 :     provide completely unrelated content types A and B for files with extension
383 :     &quot;.foo&quot;. The current implementation will choose one of them (by priority,
384 :     depth, or arbitrarily if they have the same priority and depth) and mark the
385 :     other one as an alias, so if they have sub-content types, the two trees will
386 :     be combined under the chosen one. If A is automatically chosen, the user might
387 :     want to choose B instead as the default content type. </li>
388 :     <li><strong>fine-grained user-defined content type determination</strong> -
389 :     if for some reason we have problems detecting the right content type for a
390 :     file, the user should be able to fix that by saying: I want <em><strong>this</strong></em>
391 :     content type for this file. This is similar to allowing users to specify encoding
392 :     on a file basis. A possible implementation could be to use persistent properties
393 :     for that. Another would be to use project preferences (because they are shared).</li>
394 :     <li><strong>product defined policies for overriding content types</strong> -
395 :     a product may want to override some of the existing content type definitions.
396 :     Products should be able to do that in a way that would circumvent the regular
397 :     conflict resolution.</li>
398 :     </ol>
399 :     <p>See also corresponding PR <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=78654">78654</a>
400 :     - content type should be used universally</p>.
401 : rchaves 1.1 </body></html>