platform-core-home/documents/content_types.html
Parent Directory
|
Revision Log
Revision 1.5 - (view) (download) (as text)
| 1 : | rchaves | 1.1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| 2 : | <html> | ||
| 3 : | <head> | ||
| 4 : | <title>Central content type catalog for Eclipse</title> | ||
| 5 : | dj | 1.4 | <link rel="stylesheet" href="../default_style.css" type="text/css"> |
| 6 : | rchaves | 1.1 | </head> |
| 7 : | <body text="#000000" bgcolor="#ffffff"> | ||
| 8 : | <h1>A central content type catalog for Eclipse</h1> | ||
| 9 : | rchaves | 1.3 | <p><font size="-1">Last modified: May 12th, 2004</font> </p> |
| 10 : | rchaves | 1.1 | <p><cite><strong>Plan item description:</strong> Content-type-based editor lookup. |
| 11 : | The choice of editor is currently based on file name pattern. This is not very | ||
| 12 : | flexible, and breaks down when fundamentally different types of content are | ||
| 13 : | found in files with undistinguished file names or internal formats. For example, | ||
| 14 : | many different models with specialized editors get stored in XML format files | ||
| 15 : | named *.xml. Eclipse should support a notion of content type for files and resources, | ||
| 16 : | and use these to drive decisions like which editor to use. This feature would | ||
| 17 : | also be used by team providers when doing comparisons based on file type. The | ||
| 18 : | several existing file-type registries in Eclipse should be consolidated. [Platform | ||
| 19 : | Core, Platform UI] [Theme: User experience] (bug <a | ||
| 20 : | href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>, <a | ||
| 21 : | href="http://dev.eclipse.org/bugs/show_bug.cgi?id=51791">51791</a>, <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52784">52784</a>) | ||
| 22 : | </cite></p> | ||
| 23 : | <p>This plan item is about two important features: </p> | ||
| 24 : | <ol> | ||
| 25 : | <li>a single content-type repository to be provided by Eclipse on top of which | ||
| 26 : | content-type related features provided by any plugins could be built upon, and</li> | ||
| 27 : | <li>a mechanism for automatically determining the content type given a file name and/or its | ||
| 28 : | contents.</li> | ||
| 29 : | </ol> | ||
| 30 : | <h2>Driving forces</h2> | ||
| 31 : | <ul> | ||
| 32 : | <li>the catalog must be extensible: plug-ins must be able to contribute new | ||
| 33 : | content types;</li> | ||
| 34 : | <li>content types must have an identity, a unique identifier by which they can | ||
| 35 : | be unambiguously retrieved from the catalog;</li> | ||
| 36 : | <li>content types should be hierarchical: new content types are very often specializations | ||
| 37 : | of existing ones (example, Ant Scripts and Plugin manifests are kinds of XML | ||
| 38 : | documents, XML is a kind of text document), so it should be possible for new | ||
| 39 : | content types to inherit interesting properties from existing ones (see <a href="#FAQ-hierarchy">FAQ</a>);</li> | ||
| 40 : | <li>content types have either a predominantly binary or text nature;</li> | ||
| 41 : | <li>content types are associated with specific file names/extensions;</li> | ||
| 42 : | <li>some level of automatic content/name based type discovery must be provided;</li> | ||
| 43 : | <li>existing plug-in-specific content type registry could be replaced/built | ||
| 44 : | upon the central catalog;</li> | ||
| 45 : | <li>encoding determination is strongly related concern and should be taken into | ||
| 46 : | consideration when sketching a solution.</li> | ||
| 47 : | </ul> | ||
| 48 : | <h3>On automatic content type detection</h3> | ||
| 49 : | <p>Content types determine many properties and actions related to files such as | ||
| 50 : | encoding, associated editors, etc. Automatic content type determination allows | ||
| 51 : | content type specific actions without requiring the user to manually define | ||
| 52 : | the content type for a given file. Content type detection is based on:</p> | ||
| 53 : | <ul> | ||
| 54 : | <li>file selection specifications</li> | ||
| 55 : | <li>file contents</li> | ||
| 56 : | </ul> | ||
| 57 : | <p>Content type determination based on file name/extension ("file selection | ||
| 58 : | specs") is the easiest one to compute. Each content type has a set of file | ||
| 59 : | selection specs associated to it. Determining the content type corresponding | ||
| 60 : | to a file selection spec is done by a simple lookup on the catalog. </p> | ||
| 61 : | <p>Content type determination based on file contents is more complex, and requires | ||
| 62 : | examining the contents. Since we are talking about an open set of possible content | ||
| 63 : | types, this examination implies in delegation to content type detectors contributed | ||
| 64 : | by other plug-ins (content describers).</p> | ||
| 65 : | <h2>Solution</h2> | ||
| 66 : | <h3>The proposed API</h3> | ||
| 67 : | <p>The proposed API contains 4 new interfaces in a new package called <code>org.eclipse.core.runtime.content</code>:</p> | ||
| 68 : | <ul> | ||
| 69 : | <li><code><a href="#IContentType">org.eclipse.core.runtime.content.IContentType</a></code></li> | ||
| 70 : | <li><code><a href="#IContentTypeManager">org.eclipse.core.runtime.content.IContentTypeManager</a></code></li> | ||
| 71 : | <li><code><a href="#IContentDescription">org.eclipse.core.runtime.content.IContentDescription</a></code></li> | ||
| 72 : | <li><code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code></li> | ||
| 73 : | </ul> | ||
| 74 : | <p>Following is a brief description for each of them. </p> | ||
| 75 : | <h4><code><a name="IContentType"></a>org.eclipse.core.runtime.content.IContentType</code></h4> | ||
| 76 : | <p>Represents a content type in the platform. <code>IContentType</code> instances | ||
| 77 : | are provided by the platform, built from extensions to the <code>org.eclipse.core.runtime.contentTypes</code> | ||
| 78 : | extension point. Relevant properties for <code>IContentType</code> are:</p> | ||
| 79 : | <ul> | ||
| 80 : | <li>unique id (example: org.eclipse.core.runtime.xml), which is based on the | ||
| 81 : | plug-in's unique identifier (the registry namespace);</li> | ||
| 82 : | <li>user-friendly name (example: Text, or XML document, or ZIP file, );</li> | ||
| 83 : | <li>file selection spec - comma-separated lists of associated file names and | ||
| 84 : | extensions;</li> | ||
| 85 : | <li>default charset (example: ISO-8859-1, for Java properties files);</li> | ||
| 86 : | <li>content describer (see <code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code>), | ||
| 87 : | a class that knows how to recognize if a given stream of bytes contains compatible | ||
| 88 : | to the content type, and how to extract other content-type specific information | ||
| 89 : | from the stream.</li> | ||
| 90 : | </ul> | ||
| 91 : | <p>Also, <code>IContentType</code> provides methods that check whether the given | ||
| 92 : | file name is matched by this content type file selection spec, or whether a | ||
| 93 : | content type is a subtype of another content type.</p> | ||
| 94 : | <h4><a name="IContentTypeManager"></a><code>org.eclipse.core.runtime.content.IContentTypeManager</code></h4> | ||
| 95 : | <p>Represents the content type registry. Provides methods for obtaining the content | ||
| 96 : | type associated to a file name, and for discovering the corresponding content | ||
| 97 : | type for a stream of bytes. <code>IContentTypeManager</code> allows clients | ||
| 98 : | to:</p> | ||
| 99 : | <ul> | ||
| 100 : | <li>retrieve the content type for a given id;</li> | ||
| 101 : | <li>retrieve a set of content types associated to a given file name;</li> | ||
| 102 : | <li>discover which content types recognize a given stream as a valid sample | ||
| 103 : | for the corresponding file format;</li> | ||
| 104 : | <li>obtain a description for a stream of bytes, including platform (such as | ||
| 105 : | encoding) and custom (content type specific) properties.</li> | ||
| 106 : | </ul> | ||
| 107 : | <h4><code><a name="IContentDescriber"></a>org.eclipse.core.runtime.content.IContentDescriber</code></h4> | ||
| 108 : | <p>Content-based content type detection and content description rely on specialized | ||
| 109 : | content detectors associated to content types. When a content type is contributed | ||
| 110 : | to the platform, a content describer class may be provided. Content describers | ||
| 111 : | are able to detect if a given stream of bytes is conformant to the content type | ||
| 112 : | file format, and may also be able to extract important properties from the contents, | ||
| 113 : | such as what charset was used to encode the contents (for text files), and any | ||
| 114 : | content type specific information that may be required.</p> | ||
| 115 : | <p>The main method in <code>IContentDescriber</code> is:</p> | ||
| 116 : | <p><code>int describe(InputStream contents, IContentDescription description, int | ||
| 117 : | optionsMask) throws IOException;</code></p> | ||
| 118 : | <p>The first thing implementations for this method must do is to check if the | ||
| 119 : | contents represent a valid sample for their corresponding content type file | ||
| 120 : | format. If not (or if cannot be determined), this method should exit immediately, | ||
| 121 : | returning <code>IContentDescription.INVALID</code> or <code>IContentDescription.INDETERMINATE</code>, | ||
| 122 : | depending on how strict the file format is. Otherwise, this method should return | ||
| 123 : | <code>IContentDescription.VALID</code>, but only after trying to provide all | ||
| 124 : | required information (according to the specified options, if any) by reading | ||
| 125 : | the contents and filling the <a href="#IContentDescription">content description</a> | ||
| 126 : | provided.</p> | ||
| 127 : | <p><strong><em>Note</em></strong><em>: it is essential that for this mechanism | ||
| 128 : | to work in a suitable manner, the execution of content describers by the platform | ||
| 129 : | should not cause the activation of the plugins providing them. In the Eclipse | ||
| 130 : | 3.0 runtime, plug-ins that have built-in bundle manifests will be able to selectively | ||
| 131 : | enable/disable auto-activation on a per-package basis (for more information, | ||
| 132 : | see <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52393">bug 52393</a>). | ||
| 133 : | Although this will not be enforced by the platform, content describers <strong>must</strong> | ||
| 134 : | be self-contained and not trigger auto-activation.</em></p> | ||
| 135 : | <h4><code><a name="IContentDescription"></a>org.eclipse.core.runtime.content.IContentDescription</code></h4> | ||
| 136 : | <p>Content descriptions are obtained by calling <code>IContentTypeManager#getDescriptionFor</code> | ||
| 137 : | method. A content description contains interesting information (such as encoding) | ||
| 138 : | about an arbitrary stream of bytes. These information are filled partially by | ||
| 139 : | the platform and partially by the content describer elected (if any).</p> | ||
| 140 : | <h3>Conflict resolution</h3> | ||
| 141 : | rchaves | 1.3 | <p>Content types are managed by the platform but plug-ins |
| 142 : | rchaves | 1.1 | are in charge of contributing content types. While this provides good flexibility, |
| 143 : | it also opens oportunities for conflicts. There are a few scenarios where conflicts | ||
| 144 : | may arise:</p> | ||
| 145 : | <ol> | ||
| 146 : | <li><em>two content types provided by independently developed plug-ins intended | ||
| 147 : | for the same file selection specification</em> - this may happen for popular | ||
| 148 : | file types that are not provided by the platform. In this case, one of the | ||
| 149 : | content types will be automatically made into an alias for the other. The | ||
| 150 : | intended effect is that content types/code depending on the content type transformed | ||
| 151 : | into an alias should still work, acting on the alias target instead. Content | ||
| 152 : | type priorities help deciding which content type should be preserved, and | ||
| 153 : | which ones made into aliases. If a plug-in provider feels it should be the | ||
| 154 : | official provider of a content type, a high priority should be assigned to | ||
| 155 : | it. Usually, "normal" priorities (the default value) should be used, | ||
| 156 : | meaning that if a similar high priority content type is available, it should | ||
| 157 : | be picked instead. </li> | ||
| 158 : | <li><em>two related (but different) content types that share a same file name/extension | ||
| 159 : | specification</em> - think of general XML documents and Ant build scripts | ||
| 160 : | (a sub-type of XML document, inheriting its file selection specifications). | ||
| 161 : | This is different than the previous case here at most one content type will | ||
| 162 : | be contributing file specs (the more basic type). In this scenario, for APIs | ||
| 163 : | that return content-types based exclusively on file name, the ancestor will | ||
| 164 : | be appear first in the returned array, since it is more general. Note that | ||
| 165 : | when a general type specifies a file extension for it to be associated with, | ||
| 166 : | and a subtype specifies a file name that has the same file extension, the | ||
| 167 : | more specific type will appear before the general one. For APIs that do content-based | ||
| 168 : | analysis, if both content type describers deem the contents as valid, the | ||
| 169 : | more specific content type will also appear first. For two sibling content | ||
| 170 : | types that deem a same set of bytes as valid, no ordering between them will | ||
| 171 : | be enforced (aliasing is not done since those types do not explicitly specify | ||
| 172 : | file specs - they inherit them instead).</li> | ||
| 173 : | <li><em>two completely unrelated content types that share a same file name/extension | ||
| 174 : | specification</em> - this is the more unlikely scenario (imagine an image | ||
| 175 : | file format and a word-processor file format sharing the same file extension). | ||
| 176 : | In this case, aliasing is not desirable (since the content types are fundamentally | ||
| 177 : | different). User intervention is required (by disabling one of the content | ||
| 178 : | types, for instance, and manually associating any editors, etc to the other | ||
| 179 : | content type) to avoid incorrect aliasing.</li> | ||
| 180 : | </ol> | ||
| 181 : | <h3>Frequently Asked Questions</h3> | ||
| 182 : | <ol> | ||
| 183 : | <li> How will plugin providers benefit from a central content type catalog? | ||
| 184 : | <p> Generally by using the same content type registry and sharing the same | ||
| 185 : | rchaves | 1.2 | concept of content/file type. Other examples are: |
| 186 : | rchaves | 1.1 | <ul> |
| 187 : | <li>a builder could use its well-known content type to filter out files | ||
| 188 : | whose names don't match with the content type file section spec;</li> | ||
| 189 : | <li>the user interface could know what editors to offer for a given file | ||
| 190 : | selected (associations between editors and content types should be kept | ||
| 191 : | separately from the content type catalog);</li> | ||
| 192 : | rchaves | 1.2 | </ul></p> |
| 193 : | rchaves | 1.1 | </li> |
| 194 : | <li><a name="FAQ-hierarchy"></a>Why are content types hierarchical? | ||
| 195 : | <p>To allow important properties to be inherited by new specialized content | ||
| 196 : | types: | ||
| 197 : | <ul> | ||
| 198 : | <li>the default charset</li> | ||
| 199 : | <li>text/binary nature</li> | ||
| 200 : | <li>content description</li> | ||
| 201 : | <li>associations defined externally by plug-ins (for instance, any editors | ||
| 202 : | associated with an ancestor should work with any descendants)</li> | ||
| 203 : | </ul></p> | ||
| 204 : | </li> | ||
| 205 : | <li> What happens if the base type for a new content type is not present in | ||
| 206 : | the platform (the plug-in that provides it is not available)? | ||
| 207 : | <p>The content type (and consequently any descendants) will be deemed | ||
| 208 : | invalid and ignored.</p> | ||
| 209 : | </li> | ||
| 210 : | <li><a name="FAQ-MIME"></a>Do Eclipse's content types have anything to do with | ||
| 211 : | IANA's MIME Media Types? | ||
| 212 : | <p>Not so far.</p> | ||
| 213 : | </li> | ||
| 214 : | <li>How can users customize the way content types are chosen? | ||
| 215 : | rchaves | 1.2 | <p>By: |
| 216 : | rchaves | 1.1 | <ul> |
| 217 : | <li>associating additional file specs to existing content types</li> | ||
| 218 : | <li>defining a content type as the default one for a given file spec (not | ||
| 219 : | supported yet)</li> | ||
| 220 : | <li>overriding content type defined attributes, such as default encoding | ||
| 221 : | (not supported yet)</li> | ||
| 222 : | rchaves | 1.2 | </ul></p> |
| 223 : | rchaves | 1.1 | </li> |
| 224 : | <li>Can plug-ins override the content type describer for an existing content | ||
| 225 : | type? | ||
| 226 : | <p>Not so far. It is up to the plugin provider to determine whether a content | ||
| 227 : | type describer will be provided.</p> | ||
| 228 : | </li> | ||
| 229 : | <li>Can two completely unrelated content types be associated with the same file | ||
| 230 : | spec? | ||
| 231 : | <p>What happens is that only one of the content types (arbitrarily selected) | ||
| 232 : | will be enabled. If only one of them is declared as high priority, it will | ||
| 233 : | be picked. Otherwise, one will be arbitrarily chosen by the platform. In | ||
| 234 : | either case, the others will be made into aliases for the elected content | ||
| 235 : | type.</p> | ||
| 236 : | </li> | ||
| 237 : | <!--li>How are conflicts (two different content types associated to the same file) | ||
| 238 : | prevented? | ||
| 239 : | <p>They are not. At least, not automatically. It is up to clients to decide | ||
| 240 : | what to do when more than one content type is offered by the platform. A | ||
| 241 : | client that does not care about which one is picked, will randomly choose | ||
| 242 : | one of them. A client that cares but does not know which one to choose may | ||
| 243 : | refuse to use any of them. User-guided code may ask the user what should | ||
| 244 : | be done. The content type chosen may be marked as the default one for the | ||
| 245 : | file spec, or the user may want to mark one as an alias for the other.</p> | ||
| 246 : | </li--> | ||
| 247 : | <li>What if a given file name is matched by two different file specs provided | ||
| 248 : | by two completely unrelated content types? | ||
| 249 : | <p> As seem above, the only way this can happen is when two <em>different</em> | ||
| 250 : | file specs (for instance, a file name and a file extension) accept the same | ||
| 251 : | file name (for instance, one content type is associated with a "xml" | ||
| 252 : | file extension, other is associated with a "plugin.xml" file name. | ||
| 253 : | ) File name specs have priority over file extension specs (so plugin.xml | ||
| 254 : | is a plugin manifest before being a XML document). The normal case is that | ||
| 255 : | the content type that defines a file name spec is based on the file type | ||
| 256 : | that defines a file extension spec (a plugin manifest is a kind of XML document). | ||
| 257 : | This ensures that actions applicable to general XML documents will be applicable | ||
| 258 : | to a plugin manifest.</p> | ||
| 259 : | </li> | ||
| 260 : | <li>What are content type aliases? | ||
| 261 : | <p>When a content type is marked as an alias for another content type (due | ||
| 262 : | to a file spec conflict), all of its properties are ignored, and any associations | ||
| 263 : | rchaves | 1.2 | with it will actually be made on the target type.</p></li> |
| 264 : | rchaves | 1.1 | <li>What are aliases for? |
| 265 : | <p>It is a mechanism to prevent conflicts. When multiple plugins contribute | ||
| 266 : | content types associated with the same file specs, we have a conflict. Conflicts | ||
| 267 : | are bad because introduce ambiguity (which one is the right content type?). | ||
| 268 : | Most of times when such conflicts arise, it is a case of independently developed | ||
| 269 : | plugins trying to contribute the same content type (semantically speaking). | ||
| 270 : | Aliasing between conflicting content types will be automatically created | ||
| 271 : | when necessary.</p> | ||
| 272 : | </li> | ||
| 273 : | <li>How do I prevent my specialized content-type to be disabled even if its | ||
| 274 : | parent is not available? | ||
| 275 : | <p> Sometimes a plugin A does not depend on plugin B, but declares a content | ||
| 276 : | type which is intended to be a specialization of another content type declared | ||
| 277 : | by B. To prevent the content type declared by A to be disabled: </p> | ||
| 278 : | <ol> | ||
| 279 : | <li>declare a low priority content type associated with the same file specs | ||
| 280 : | the intended parent is usually associated with (a placeholder);</li> | ||
| 281 : | <li>make your specialized content type have this placeholder as its base | ||
| 282 : | type;</li> | ||
| 283 : | </ol> | ||
| 284 : | <p>If the originally intended base type is available, your base type will | ||
| 285 : | be marked as just an alias, and your specialized content type will be properly | ||
| 286 : | attached to the official content type. Otherwise, the placeholder will be | ||
| 287 : | elected, and although things might not be as great as intended (actions | ||
| 288 : | associated to the original content type will not be available), your content | ||
| 289 : | type will still be enabled.</p> | ||
| 290 : | </li> | ||
| 291 : | <li>When should a file-association be contributed instead of declaring a new | ||
| 292 : | (derived) content type? | ||
| 293 : | <p>New content types should be created only if there is no existing content | ||
| 294 : | type with the semantics required. Otherwise, when only additional file specs | ||
| 295 : | must be provided, file associations are the way to go.</p> | ||
| 296 : | </li> | ||
| 297 : | <li>Are file specs inherited? | ||
| 298 : | <p> Only if none is specified in the sub type. </p> | ||
| 299 : | </li> | ||
| 300 : | <li>How does a client figure out whether a given file is a text file or not? | ||
| 301 : | <p>The proposed approach is to check if the file's content type is a kind | ||
| 302 : | rchaves | 1.2 | of the "org.eclipse.core.runtime.text" content type, which is |
| 303 : | intended to be the ancestor for all text oriented content types. If it turns | ||
| 304 : | out to be a very frequent idiom, we might consider proving a convenience | ||
| 305 : | API to do that.</p> | ||
| 306 : | rchaves | 1.1 | </li> |
| 307 : | <li>Do content types have to contribute content describers? | ||
| 308 : | <p> No, although if the file has a identifiable signature/format, it is recommended, | ||
| 309 : | because improves the overall quality of content-based content type lookups.</p> | ||
| 310 : | </li> | ||
| 311 : | </ol> | ||
| 312 : | rchaves | 1.5 | <p> <em>Note: comments are encouraged. Any questions/concerns not addressed here |
| 313 : | should be discussed in the platform-core-dev list, or bug <a | ||
| 314 : | href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>. </em></p> | ||
| 315 : | <h2>Addendum: issues to be addressed in the 3.1 cycle</h2> | ||
| 316 : | <p><font size="-1">Last modified: October 15th, 2004</font></p> | ||
| 317 : | <p>The solution described above was implemented and relatively succesful. Some | ||
| 318 : | components took advantage of the new content type infrastructure, but still | ||
| 319 : | in many cases file-association is being done in an ad-hoc manner. Also, no UI | ||
| 320 : | was provided for customizing content types (such as changing the default encoding, | ||
| 321 : | adding associations with files) so the user has no control on how the content | ||
| 322 : | type detection mechanism works. Thus, the main issues to be addressed in the | ||
| 323 : | 3.1 cycle are:</p> | ||
| 324 : | <ul> | ||
| 325 : | <li>ensure the content type framework works for clients that have not adopted | ||
| 326 : | it yet, or at least we understand why it is not (cannot be made) suitable | ||
| 327 : | to them. Examples are: Platform/UI, Platform/CVS, and products built on top | ||
| 328 : | of Eclipse.</li> | ||
| 329 : | <li>ensure users are granted appropriate means so they can customize the behavior | ||
| 330 : | of content type detection so things just work for them.</li> | ||
| 331 : | <li>ensure content type resolution works (or can be made to work) appropriately | ||
| 332 : | in setups where multiple independently developed products contribute conflicting | ||
| 333 : | content types. </li> | ||
| 334 : | </ul> | ||
| 335 : | <h3>Support more use cases</h3> | ||
| 336 : | <p>Ensure the Content Type works for the SDK plug-ins and for products built on | ||
| 337 : | top of Eclipse.</p> | ||
| 338 : | <p><strong>Platform/UI - file/editor association</strong></p> | ||
| 339 : | <p><strong>Platform/Team - binary vs ascii files</strong></p> | ||
| 340 : | <h3>Give users more power</h3> | ||
| 341 : | <p>Ensure users have means to customize how the content type detection works | ||
| 342 : | for them.</p> | ||
| 343 : | <h3>Improve conflict handling</h3> | ||
| 344 : | <p>Ensure content type detection works (or can be made to work) appropriately | ||
| 345 : | when incompatible products are deployed together.</p> | ||
| 346 : | rchaves | 1.1 | </body></html> |
| help@eclipse.org | ViewVC Help |
| Powered by ViewVC 1.0.3 |
