platform-core-home/documents/content_types.html
Parent Directory
|
Revision Log
Revision 1.6 -
(download)
(as text)
(annotate)
Fri Oct 15 17:53:54 2004 UTC (5 years, 1 month ago) by rchaves
Branch: MAIN
Changes since 1.5: +5 -0 lines
Fri Oct 15 17:53:54 2004 UTC (5 years, 1 month ago) by rchaves
Branch: MAIN
Changes since 1.5: +5 -0 lines
added TBDs
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Central content type catalog for Eclipse</title> <link rel="stylesheet" href="../default_style.css" type="text/css"> </head> <body text="#000000" bgcolor="#ffffff"> <h1>A central content type catalog for Eclipse</h1> <p><font size="-1">Last modified: May 12th, 2004</font> </p> <p><cite><strong>Plan item description:</strong> Content-type-based editor lookup. The choice of editor is currently based on file name pattern. This is not very flexible, and breaks down when fundamentally different types of content are found in files with undistinguished file names or internal formats. For example, many different models with specialized editors get stored in XML format files named *.xml. Eclipse should support a notion of content type for files and resources, and use these to drive decisions like which editor to use. This feature would also be used by team providers when doing comparisons based on file type. The several existing file-type registries in Eclipse should be consolidated. [Platform Core, Platform UI] [Theme: User experience] (bug <a href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>, <a href="http://dev.eclipse.org/bugs/show_bug.cgi?id=51791">51791</a>, <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52784">52784</a>) </cite></p> <p>This plan item is about two important features: </p> <ol> <li>a single content-type repository to be provided by Eclipse on top of which content-type related features provided by any plugins could be built upon, and</li> <li>a mechanism for automatically determining the content type given a file name and/or its contents.</li> </ol> <h2>Driving forces</h2> <ul> <li>the catalog must be extensible: plug-ins must be able to contribute new content types;</li> <li>content types must have an identity, a unique identifier by which they can be unambiguously retrieved from the catalog;</li> <li>content types should be hierarchical: new content types are very often specializations of existing ones (example, Ant Scripts and Plugin manifests are kinds of XML documents, XML is a kind of text document), so it should be possible for new content types to inherit interesting properties from existing ones (see <a href="#FAQ-hierarchy">FAQ</a>);</li> <li>content types have either a predominantly binary or text nature;</li> <li>content types are associated with specific file names/extensions;</li> <li>some level of automatic content/name based type discovery must be provided;</li> <li>existing plug-in-specific content type registry could be replaced/built upon the central catalog;</li> <li>encoding determination is strongly related concern and should be taken into consideration when sketching a solution.</li> </ul> <h3>On automatic content type detection</h3> <p>Content types determine many properties and actions related to files such as encoding, associated editors, etc. Automatic content type determination allows content type specific actions without requiring the user to manually define the content type for a given file. Content type detection is based on:</p> <ul> <li>file selection specifications</li> <li>file contents</li> </ul> <p>Content type determination based on file name/extension ("file selection specs") is the easiest one to compute. Each content type has a set of file selection specs associated to it. Determining the content type corresponding to a file selection spec is done by a simple lookup on the catalog. </p> <p>Content type determination based on file contents is more complex, and requires examining the contents. Since we are talking about an open set of possible content types, this examination implies in delegation to content type detectors contributed by other plug-ins (content describers).</p> <h2>Solution</h2> <h3>The proposed API</h3> <p>The proposed API contains 4 new interfaces in a new package called <code>org.eclipse.core.runtime.content</code>:</p> <ul> <li><code><a href="#IContentType">org.eclipse.core.runtime.content.IContentType</a></code></li> <li><code><a href="#IContentTypeManager">org.eclipse.core.runtime.content.IContentTypeManager</a></code></li> <li><code><a href="#IContentDescription">org.eclipse.core.runtime.content.IContentDescription</a></code></li> <li><code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code></li> </ul> <p>Following is a brief description for each of them. </p> <h4><code><a name="IContentType"></a>org.eclipse.core.runtime.content.IContentType</code></h4> <p>Represents a content type in the platform. <code>IContentType</code> instances are provided by the platform, built from extensions to the <code>org.eclipse.core.runtime.contentTypes</code> extension point. Relevant properties for <code>IContentType</code> are:</p> <ul> <li>unique id (example: org.eclipse.core.runtime.xml), which is based on the plug-in's unique identifier (the registry namespace);</li> <li>user-friendly name (example: Text, or XML document, or ZIP file, );</li> <li>file selection spec - comma-separated lists of associated file names and extensions;</li> <li>default charset (example: ISO-8859-1, for Java properties files);</li> <li>content describer (see <code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code>), a class that knows how to recognize if a given stream of bytes contains compatible to the content type, and how to extract other content-type specific information from the stream.</li> </ul> <p>Also, <code>IContentType</code> provides methods that check whether the given file name is matched by this content type file selection spec, or whether a content type is a subtype of another content type.</p> <h4><a name="IContentTypeManager"></a><code>org.eclipse.core.runtime.content.IContentTypeManager</code></h4> <p>Represents the content type registry. Provides methods for obtaining the content type associated to a file name, and for discovering the corresponding content type for a stream of bytes. <code>IContentTypeManager</code> allows clients to:</p> <ul> <li>retrieve the content type for a given id;</li> <li>retrieve a set of content types associated to a given file name;</li> <li>discover which content types recognize a given stream as a valid sample for the corresponding file format;</li> <li>obtain a description for a stream of bytes, including platform (such as encoding) and custom (content type specific) properties.</li> </ul> <h4><code><a name="IContentDescriber"></a>org.eclipse.core.runtime.content.IContentDescriber</code></h4> <p>Content-based content type detection and content description rely on specialized content detectors associated to content types. When a content type is contributed to the platform, a content describer class may be provided. Content describers are able to detect if a given stream of bytes is conformant to the content type file format, and may also be able to extract important properties from the contents, such as what charset was used to encode the contents (for text files), and any content type specific information that may be required.</p> <p>The main method in <code>IContentDescriber</code> is:</p> <p><code>int describe(InputStream contents, IContentDescription description, int optionsMask) throws IOException;</code></p> <p>The first thing implementations for this method must do is to check if the contents represent a valid sample for their corresponding content type file format. If not (or if cannot be determined), this method should exit immediately, returning <code>IContentDescription.INVALID</code> or <code>IContentDescription.INDETERMINATE</code>, depending on how strict the file format is. Otherwise, this method should return <code>IContentDescription.VALID</code>, but only after trying to provide all required information (according to the specified options, if any) by reading the contents and filling the <a href="#IContentDescription">content description</a> provided.</p> <p><strong><em>Note</em></strong><em>: it is essential that for this mechanism to work in a suitable manner, the execution of content describers by the platform should not cause the activation of the plugins providing them. In the Eclipse 3.0 runtime, plug-ins that have built-in bundle manifests will be able to selectively enable/disable auto-activation on a per-package basis (for more information, see <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52393">bug 52393</a>). Although this will not be enforced by the platform, content describers <strong>must</strong> be self-contained and not trigger auto-activation.</em></p> <h4><code><a name="IContentDescription"></a>org.eclipse.core.runtime.content.IContentDescription</code></h4> <p>Content descriptions are obtained by calling <code>IContentTypeManager#getDescriptionFor</code> method. A content description contains interesting information (such as encoding) about an arbitrary stream of bytes. These information are filled partially by the platform and partially by the content describer elected (if any).</p> <h3>Conflict resolution</h3> <p>Content types are managed by the platform but plug-ins are in charge of contributing content types. While this provides good flexibility, it also opens oportunities for conflicts. There are a few scenarios where conflicts may arise:</p> <ol> <li><em>two content types provided by independently developed plug-ins intended for the same file selection specification</em> - this may happen for popular file types that are not provided by the platform. In this case, one of the content types will be automatically made into an alias for the other. The intended effect is that content types/code depending on the content type transformed into an alias should still work, acting on the alias target instead. Content type priorities help deciding which content type should be preserved, and which ones made into aliases. If a plug-in provider feels it should be the official provider of a content type, a high priority should be assigned to it. Usually, "normal" priorities (the default value) should be used, meaning that if a similar high priority content type is available, it should be picked instead. </li> <li><em>two related (but different) content types that share a same file name/extension specification</em> - think of general XML documents and Ant build scripts (a sub-type of XML document, inheriting its file selection specifications). This is different than the previous case here at most one content type will be contributing file specs (the more basic type). In this scenario, for APIs that return content-types based exclusively on file name, the ancestor will be appear first in the returned array, since it is more general. Note that when a general type specifies a file extension for it to be associated with, and a subtype specifies a file name that has the same file extension, the more specific type will appear before the general one. For APIs that do content-based analysis, if both content type describers deem the contents as valid, the more specific content type will also appear first. For two sibling content types that deem a same set of bytes as valid, no ordering between them will be enforced (aliasing is not done since those types do not explicitly specify file specs - they inherit them instead).</li> <li><em>two completely unrelated content types that share a same file name/extension specification</em> - this is the more unlikely scenario (imagine an image file format and a word-processor file format sharing the same file extension). In this case, aliasing is not desirable (since the content types are fundamentally different). User intervention is required (by disabling one of the content types, for instance, and manually associating any editors, etc to the other content type) to avoid incorrect aliasing.</li> </ol> <h3>Frequently Asked Questions</h3> <ol> <li> How will plugin providers benefit from a central content type catalog? <p> Generally by using the same content type registry and sharing the same concept of content/file type. Other examples are: <ul> <li>a builder could use its well-known content type to filter out files whose names don't match with the content type file section spec;</li> <li>the user interface could know what editors to offer for a given file selected (associations between editors and content types should be kept separately from the content type catalog);</li> </ul></p> </li> <li><a name="FAQ-hierarchy"></a>Why are content types hierarchical? <p>To allow important properties to be inherited by new specialized content types: <ul> <li>the default charset</li> <li>text/binary nature</li> <li>content description</li> <li>associations defined externally by plug-ins (for instance, any editors associated with an ancestor should work with any descendants)</li> </ul></p> </li> <li> What happens if the base type for a new content type is not present in the platform (the plug-in that provides it is not available)? <p>The content type (and consequently any descendants) will be deemed invalid and ignored.</p> </li> <li><a name="FAQ-MIME"></a>Do Eclipse's content types have anything to do with IANA's MIME Media Types? <p>Not so far.</p> </li> <li>How can users customize the way content types are chosen? <p>By: <ul> <li>associating additional file specs to existing content types</li> <li>defining a content type as the default one for a given file spec (not supported yet)</li> <li>overriding content type defined attributes, such as default encoding (not supported yet)</li> </ul></p> </li> <li>Can plug-ins override the content type describer for an existing content type? <p>Not so far. It is up to the plugin provider to determine whether a content type describer will be provided.</p> </li> <li>Can two completely unrelated content types be associated with the same file spec? <p>What happens is that only one of the content types (arbitrarily selected) will be enabled. If only one of them is declared as high priority, it will be picked. Otherwise, one will be arbitrarily chosen by the platform. In either case, the others will be made into aliases for the elected content type.</p> </li> <!--li>How are conflicts (two different content types associated to the same file) prevented? <p>They are not. At least, not automatically. It is up to clients to decide what to do when more than one content type is offered by the platform. A client that does not care about which one is picked, will randomly choose one of them. A client that cares but does not know which one to choose may refuse to use any of them. User-guided code may ask the user what should be done. The content type chosen may be marked as the default one for the file spec, or the user may want to mark one as an alias for the other.</p> </li--> <li>What if a given file name is matched by two different file specs provided by two completely unrelated content types? <p> As seem above, the only way this can happen is when two <em>different</em> file specs (for instance, a file name and a file extension) accept the same file name (for instance, one content type is associated with a "xml" file extension, other is associated with a "plugin.xml" file name. ) File name specs have priority over file extension specs (so plugin.xml is a plugin manifest before being a XML document). The normal case is that the content type that defines a file name spec is based on the file type that defines a file extension spec (a plugin manifest is a kind of XML document). This ensures that actions applicable to general XML documents will be applicable to a plugin manifest.</p> </li> <li>What are content type aliases? <p>When a content type is marked as an alias for another content type (due to a file spec conflict), all of its properties are ignored, and any associations with it will actually be made on the target type.</p></li> <li>What are aliases for? <p>It is a mechanism to prevent conflicts. When multiple plugins contribute content types associated with the same file specs, we have a conflict. Conflicts are bad because introduce ambiguity (which one is the right content type?). Most of times when such conflicts arise, it is a case of independently developed plugins trying to contribute the same content type (semantically speaking). Aliasing between conflicting content types will be automatically created when necessary.</p> </li> <li>How do I prevent my specialized content-type to be disabled even if its parent is not available? <p> Sometimes a plugin A does not depend on plugin B, but declares a content type which is intended to be a specialization of another content type declared by B. To prevent the content type declared by A to be disabled: </p> <ol> <li>declare a low priority content type associated with the same file specs the intended parent is usually associated with (a placeholder);</li> <li>make your specialized content type have this placeholder as its base type;</li> </ol> <p>If the originally intended base type is available, your base type will be marked as just an alias, and your specialized content type will be properly attached to the official content type. Otherwise, the placeholder will be elected, and although things might not be as great as intended (actions associated to the original content type will not be available), your content type will still be enabled.</p> </li> <li>When should a file-association be contributed instead of declaring a new (derived) content type? <p>New content types should be created only if there is no existing content type with the semantics required. Otherwise, when only additional file specs must be provided, file associations are the way to go.</p> </li> <li>Are file specs inherited? <p> Only if none is specified in the sub type. </p> </li> <li>How does a client figure out whether a given file is a text file or not? <p>The proposed approach is to check if the file's content type is a kind of the "org.eclipse.core.runtime.text" content type, which is intended to be the ancestor for all text oriented content types. If it turns out to be a very frequent idiom, we might consider proving a convenience API to do that.</p> </li> <li>Do content types have to contribute content describers? <p> No, although if the file has a identifiable signature/format, it is recommended, because improves the overall quality of content-based content type lookups.</p> </li> </ol> <p> <em>Note: comments are encouraged. Any questions/concerns not addressed here should be discussed in the platform-core-dev list, or bug <a href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>. </em></p> <h2>Addendum: issues to be addressed in the 3.1 cycle</h2> <p><font size="-1">Last modified: October 15th, 2004</font></p> <p>The solution described above was implemented and relatively succesful. Some components took advantage of the new content type infrastructure, but still in many cases file-association is being done in an ad-hoc manner. Also, no UI was provided for customizing content types (such as changing the default encoding, adding associations with files) so the user has no control on how the content type detection mechanism works. Thus, the main issues to be addressed in the 3.1 cycle are:</p> <ul> <li>ensure the content type framework works for clients that have not adopted it yet, or at least we understand why it is not (cannot be made) suitable to them. Examples are: Platform/UI, Platform/CVS, and products built on top of Eclipse.</li> <li>ensure users are granted appropriate means so they can customize the behavior of content type detection so things just work for them.</li> <li>ensure content type resolution works (or can be made to work) appropriately in setups where multiple independently developed products contribute conflicting content types. </li> </ul> <h3>Support more use cases</h3> <p>Ensure the Content Type works for the SDK plug-ins and for products built on top of Eclipse.</p> <p>TBD</p> <p><strong>Platform/UI - file/editor association</strong></p> <p>TBD</p> <p><strong>Platform/Team - binary vs ascii files</strong></p> <p>TBD</p> <h3>Give users more power</h3> <p>Ensure users have means to customize how the content type detection works for them.</p> <p>TBD</p> <h3>Improve conflict handling</h3> <p>Ensure content type detection works (or can be made to work) appropriately when incompatible products are deployed together.</p> <p>TBD</p> </body></html>
| help@eclipse.org | ViewVC Help |
| Powered by ViewVC 1.0.3 |
