platform-core-home/documents/content_types.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.10 - (download) (as text) (annotate)
Thu Feb 17 22:35:15 2005 UTC (4 years, 9 months ago) by rchaves
Branch: MAIN
Changes since 1.9: +2 -1 lines
fixed typo
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <title>Central content type catalog for Eclipse</title>
  <link rel="stylesheet" href="../default_style.css" type="text/css">
</head>
<body text="#000000" bgcolor="#ffffff">
<h1>A central content type catalog for Eclipse</h1>
<p><font size="-1">Last modified: Feb 16, 2005</font></p>
<p><font size="-1"><strong>Note: this document has been updated/extended to depict 
  the solution actually implemented in the 3.0 release and changes made during 
  the 3.1 cycle. The <a href="http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/documents/content_types.html?rev=1.4"> 
  original proposal
  </a> written during the 3.0 development cycle, as well as <a href="http://dev.eclipse.org/viewcvs/index.cgi/platform-core-home/documents/content_types.html"> 
  all versions of this document</a> are also available</strong> (<img src="../images/changed.gif" width="12" height="12"> 
  marks interesting changes since the original proposal).</font></p>
<p><cite><strong>Plan item description:</strong> Content-type-based editor lookup. 
  The choice of editor is currently based on file name pattern. This is not very 
  flexible, and breaks down when fundamentally different types of content are 
  found in files with undistinguished file names or internal formats. For example, 
  many different models with specialized editors get stored in XML format files 
  named *.xml. Eclipse should support a notion of content type for files and resources, 
  and use these to drive decisions like which editor to use. This feature would 
  also be used by team providers when doing comparisons based on file type. The 
  several existing file-type registries in Eclipse should be consolidated. [Platform 
  Core, Platform UI] [Theme: User experience] (bug <a
 href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>, <a
 href="http://dev.eclipse.org/bugs/show_bug.cgi?id=51791">51791</a>, <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52784">52784</a>) 
  </cite></p>
<p>This plan item is about two important features: </p>
<ol>
  <li>a single content-type repository to be provided by Eclipse on top of which
  content-type related features provided by any plugins could be built upon, and</li>
  <li>a mechanism for automatically determining the content type given a file name and/or its 
  contents.</li>
</ol>
<h2>Driving forces</h2>
<ul>
  <li>the catalog must be extensible: plug-ins must be able to contribute new 
    content types;</li>
  <li>content types must have an identity, a unique identifier by which they can 
    be unambiguously retrieved from the catalog;</li>
  <li>content types should be hierarchical: new content types are very often specializations 
    of existing ones (example, Ant Scripts and Plugin manifests are kinds of XML 
    documents, XML is a kind of text document), so it should be possible for new 
    content types to inherit interesting properties from existing ones (see <a href="#FAQ-hierarchy">FAQ</a>);</li>
  <li>content types have either a predominantly binary or text nature;</li>
  <li>content types are associated with specific file names/extensions;</li>
  <li>some level of automatic content/name based type discovery must be provided;</li>
  <li>existing plug-in-specific content type registry could be replaced/built 
    upon the central catalog;</li>
  <li>encoding determination is strongly related concern and should be taken into 
    consideration when sketching a solution.</li>
</ul>
<h3>On automatic content type detection</h3>
<p>Content types determine many properties and actions related to files such as 
  encoding, associated editors, etc. Automatic content type determination allows 
  content type specific actions without requiring the user to manually define 
  the content type for a given file. Content type detection is based on:</p>
<ul>
  <li>file selection specifications</li>
  <li>file contents</li>
</ul>
<p>Content type determination based on file name/extension (&quot;file selection 
  specs&quot;) is the easiest one to compute. Each content type has a set of file 
  selection specs associated to it. Determining the content type corresponding 
  to a file selection spec is done by a simple lookup on the catalog. </p>
<p>Content type determination based on file contents is more complex, and requires 
  examining the contents. Since we are talking about an open set of possible content 
  types, this examination implies in delegation to content type detectors contributed 
  by other plug-ins (content describers).</p>
<h2>Solution</h2>
<h3>The proposed API</h3>
<p>The proposed API contains 4 new interfaces in a new package called <code>org.eclipse.core.runtime.content</code>:</p>
<ul>
  <li><code><a href="#IContentType">org.eclipse.core.runtime.content.IContentType</a></code></li>
  <li><code><a href="#IContentTypeManager">org.eclipse.core.runtime.content.IContentTypeManager</a></code></li>
  <li><code><a href="#IContentDescription">org.eclipse.core.runtime.content.IContentDescription</a></code></li>
  <li><code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code></li>
</ul>
<p>Following is a brief description for each of them. </p>
<h4><code><a name="IContentType"></a>org.eclipse.core.runtime.content.IContentType</code></h4>
<p>Represents a content type in the platform. <code>IContentType</code> instances 
  are provided by the platform, built from extensions to the <code>org.eclipse.core.runtime.contentTypes</code> 
  extension point. Relevant properties for <code>IContentType</code> are:</p>
<ul>
  <li>unique id (example: org.eclipse.core.runtime.xml), which is based on the 
    plug-in's unique identifier (the registry namespace);</li>
  <li>user-friendly name (example: Text, or XML document, or ZIP file, );</li>
  <li>file selection spec - comma-separated lists of associated file names and 
    extensions;</li>
  <li>default charset (example: ISO-8859-1, for Java properties files);</li>
  <li>content describer (see <code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code>), 
    a class that knows how to recognize if a given stream of bytes contains compatible 
    to the content type, and how to extract other content-type specific information 
    from the stream.</li>
</ul>
<p>Also, <code>IContentType</code> provides methods that check whether the given 
  file name is matched by this content type file selection spec, or whether a 
  content type is a subtype of another content type.</p>
<h4><a name="IContentTypeManager"></a><code>org.eclipse.core.runtime.content.IContentTypeManager</code></h4>
<p>Represents the content type registry. Provides methods for obtaining the content 
  type associated to a file name, and for discovering the corresponding content 
  type for a stream of bytes. <code>IContentTypeManager</code> allows clients 
  to:</p>
<ul>
  <li>retrieve the content type for a given id;</li>
  <li>retrieve a set of content types associated to a given file name;</li>
  <li>discover which content types recognize a given stream as a valid sample 
    for the corresponding file format;</li>
  <li>obtain a description for a stream of bytes, including platform (such as 
    encoding) and custom (content type specific) properties.</li>
</ul>
<h4><code><a name="IContentDescriber"></a>org.eclipse.core.runtime.content.IContentDescriber</code></h4>
<p>Content-based content type detection and content description rely on specialized 
  content detectors associated to content types. When a content type is contributed 
  to the platform, a content describer class may be provided. Content describers 
  are able to detect if a given stream of bytes is conformant to the content type 
  file format, and may also be able to extract important properties from the contents, 
  such as what charset was used to encode the contents (for text files), and any 
  content type specific information that may be required.</p>
<p>The main method in <code>IContentDescriber</code> is:</p>
<p><code>int describe(InputStream contents, IContentDescription description, QualifiedName[] 
  options) throws IOException;</code></p>
<p>The first thing implementations for this method must do is to check if the 
  contents represent a valid sample for their corresponding content type file 
  format. If not (or if cannot be determined), this method should exit immediately, 
  returning <code>IContentDescription.INVALID</code> or <code>IContentDescription.INDETERMINATE</code>, 
  depending on how strict the file format is. Otherwise, this method should return 
  <code>IContentDescription.VALID</code>, but only after trying to provide all 
  required information (according to the specified options, if any) by reading 
  the contents and filling the <a href="#IContentDescription">content description</a> 
  provided.</p>
<p><strong><em>Note</em></strong><em>: it is essential that for this mechanism 
  to work in a suitable manner, the execution of content describers by the platform 
  should not cause the activation of the plugins providing them. In the Eclipse 
  3.0 runtime, plug-ins that have built-in bundle manifests will be able to selectively 
  enable/disable auto-activation on a per-package basis (for more information, 
  see <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52393">bug 52393</a>). 
  Although this will not be enforced by the platform, content describers <strong>must</strong> 
  be self-contained and not trigger auto-activation.</em></p>
<h4><code><a name="IContentDescription"></a>org.eclipse.core.runtime.content.IContentDescription</code></h4>
<p>Content descriptions are obtained by calling <code>IContentTypeManager#getDescriptionFor</code> 
  method. A content description contains interesting information (such as encoding) 
  about an arbitrary stream of bytes. These information are filled partially by 
  the platform and partially by the content describer elected (if any).</p>
<h3>Conflict resolution</h3>
<p>Content types are managed by the platform but plug-ins are in charge of contributing 
  content types. While this provides good flexibility, it also opens oportunities 
  for conflicts. There are a few scenarios where conflicts may arise:</p>
<ol>
  <li><em>two content types provided by independently developed plug-ins intended 
    for the same file selection specification</em> - this may happen for popular 
    file types that are not provided by the platform. In this case, one of the 
    content types will be ellected arbitrarily. User intervention is required 
    (by disabling one of the content types, for instance) if automatic determination 
    does not choose the preferred content type If multiple content types can be 
    returned by the API, both will appear in the list.<img src="../images/changed.gif" width="12" height="12"> (see 
    bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) <font size="-1">
    </font></li>
  <li><em>two related (but different) content types that share a same file name/extension 
    specification</em> - think of general XML documents and Ant build scripts 
    (a sub-type of XML document, inheriting its file selection specifications). 
    This is different than the previous case here at most one content type will 
    be contributing file specs (the more basic type). In this scenario, for APIs 
    that return content-types based exclusively on file name, the ancestor will 
    be appear first in the returned array, since it is more general. Note that 
    when a general type specifies a file extension for it to be associated with, 
    and a subtype specifies a file name that has the same file extension, the 
    more specific type will appear before the general one. For APIs that do content-based 
    analysis, if both content type describers deem the contents as valid, the 
    more specific content type will also appear first. For two sibling content 
    types that deem a same set of bytes as valid, an arbitrary order will be chosen.</li>
  <li><em>two completely unrelated content types that share a same file name/extension 
    specification</em> - this is the more unlikely scenario (imagine an image 
    file format and a word-processor file format sharing the same file extension). 
    In this case, one of the content types will be ellected arbitrarily. User 
    intervention is required (by disabling one of the content types, for instance) 
    if automatic determination does not choose the preferred content type.<img src="../images/changed.gif" width="12" height="12"> (see 
    bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) <font size="-1"></li>
</ol>
<h3>Frequently Asked Questions</h3>
<ol>
  <li> How will plugin providers benefit from a central content type catalog? 
    <p> Generally by using the same content type registry and sharing the same 
      concept of content/file type. Other examples are: 
    <ul>
      <li>a builder could use its well-known content type to filter out files 
        whose names don't match with the content type file section spec;</li>
      <li>the user interface could know what editors to offer for a given file 
        selected (associations between editors and content types should be kept 
        separately from the content type catalog);</li>
    </ul></p>
    </li>
  <li><a name="FAQ-hierarchy"></a>Why are content types hierarchical? 
    <p>To allow important properties to be inherited by new specialized content 
      types: 
    <ul>
      <li>the default charset</li>
      <li>text/binary nature</li>
      <li>content description</li>
      <li>associations defined externally by plug-ins (for instance, any editors 
        associated with an ancestor should work with any descendants)</li>
    </ul></p>
    </li>
  <li> What happens if the base type for a new content type is not present in 
    the platform (the plug-in that provides it is not available)? 
    <p>The content type (and consequently any descendants) will be deemed invalid 
      and ignored.</p>
  </li>
  <li><a name="FAQ-MIME"></a>Do Eclipse's content types have anything to do with 
    IANA's MIME Media Types? 
    <p>Not so far. The main reason is that not every file format with a content 
      type in Eclipse will have a MIME type, so we could not use it as the main 
      association mechanism between content types and applications. We considered 
      keeping MIME-types as an optional property for each content type, and provide 
      a method findContentTypesByMIMEType (or something like that), but decided 
      removing it since there was no sound use case for that (and the idea in 
      the initial proposal was to keep only the essential stuff to avoid distractions).<font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font></p>
  </li>
  <li>How can users customize the way content types are chosen? 
    <p>By: 
    <ul>
      <li>associating additional file specs to existing content types</li>
      <li>defining a content type as the default one for a given file spec (not 
        supported yet)</li>
      <li>overriding content type defined attributes, such as default encoding 
        (not supported yet)</li>
    </ul></p>
    </li>
  <li>Can plug-ins override the content type describer for an existing content 
    type? 
    <p>Not so far. It is up to the plugin provider to determine whether a content 
      type describer will be provided.</p>
  </li>
  <li>Can two completely unrelated content types be associated with the same file 
    spec? 
    <p>One of the content types (arbitrarily selected) will be chosen as the preferred 
      one (the other will also be associated). Priorities are also taken into 
      account.</p>
  </li>
  <!--li>How are conflicts (two different content types associated to the same file) 
    prevented? 
    <p>They are not. At least, not automatically. It is up to clients to decide 
      what to do when more than one content type is offered by the platform. A 
      client that does not care about which one is picked, will randomly choose 
      one of them. A client that cares but does not know which one to choose may 
      refuse to use any of them. User-guided code may ask the user what should 
      be done. The content type chosen may be marked as the default one for the 
      file spec, or the user may want to mark one as an alias for the other.</p>
  </li-->
  <li>What if a given file name is matched by two different file specs provided 
    by two completely unrelated content types? 
    <p> As seem above, the only way this can happen is when two <em>different</em> 
      file specs (for instance, a file name and a file extension) accept the same 
      file name (for instance, one content type is associated with a &quot;xml&quot; 
      file extension, other is associated with a &quot;plugin.xml&quot; file name. 
      ) File name specs have priority over file extension specs (so plugin.xml 
      is a plugin manifest before being a XML document). The normal case is that 
      the content type that defines a file name spec is based on the file type 
      that defines a file extension spec (a plugin manifest is a kind of XML document). 
      This ensures that actions applicable to general XML documents will be applicable 
      to a plugin manifest.</p>
  </li>
  <li>What are aliases for? <font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font> 
    (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) 
    <p>It is a mechanism to prevent conflicts. When multiple plugins contribute 
      content types associated with the same file specs, we have a conflict. Conflicts 
      are bad because introduce ambiguity (what is the right content type?). Most 
      of times when such conflicts arise, it is a case of independently developed 
      plugins trying to contribute the same content type (semantically speaking). 
      Imagine a plug-in com.examples.foo that wants to be associated to the Java 
      Source content typ (org.eclipse.jdt.core.javaSource)e, provided by org.eclipse.jdt.core, 
      but that does not require org.eclipse.jdt.core to be present to work. Such 
      plug-in can contribute its own Java Source content type (com.examples.foo.java), 
      and mark it as an alias for the content type provided by JDT/Core. If JDT/Core 
      is present, the com.examples.foo.java will be omitted from the content type 
      registry, and all references to it will be automatically interpreted as 
      references to org.eclipse.jdt.core.javaSource instead.<font size="-1"></font></p>
  </li>
  <li>How do I prevent my specialized content-type to be disabled even if its 
    parent is not available? <font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font> 
    (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) 
    <p> Sometimes a plugin A does not depend on plugin B, but declares a content 
      type which is intended to be a specialization of another content type declared 
      by B. To prevent the content type declared by A to be disabled: </p>
    <ol>
      <li>declare an alias content type having the intended parent as its target 
        (a placeholder);</li>
      <li>make your specialized content type have this placeholder as its base 
        type;</li>
    </ol>
    <p>If the originally intended base type is available, your base type will 
      be marked as just an alias, and your specialized content type will be properly 
      attached to the intended content type. Otherwise, the placeholder will be 
      enabled, and although things might not be as great as intended (actions 
      associated to the original content type will not be available), your content 
      type will still be enabled.<font size="-1"></font></p>
  </li>
  <li>When should a file-association be contributed instead of declaring a new 
    (derived) content type? 
    <p>New content types should be created only if there is no existing content 
      type with the semantics required. Otherwise, when only additional file specs 
      must be provided, file associations are the way to go.</p>
  </li>
  <li>Are file specs inherited? 
    <p> Only if none is specified in the sub type. </p>
  </li>
  <li>How does a client figure out whether a given file is a text file or not? 
    <p>The proposed approach is to check if the file's content type is a kind 
      of the &quot;org.eclipse.core.runtime.text&quot; content type, which is 
      intended to be the ancestor for all text oriented content types. If it turns 
      out to be a very frequent idiom, we might consider proving a convenience 
      API to do that.</p>
  </li>
  <li>Do content types have to contribute content describers? 
    <p> No, although if the file has a identifiable signature/format, it is recommended, 
      because improves the overall quality of content-based content type lookups.</p>
  </li>
</ol>
<p> <em>Note: comments are encouraged. Any questions/concerns not addressed here 
  should be discussed in the platform-core-dev list, or bug <a
 href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>. </em></p>
<h2>Addendum: issues to be addressed in the 3.1 cycle<font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font></h2>
<p>The solution described above was implemented and relatively succesful. Some 
  components took advantage of the new content type infrastructure, but still 
  in many cases file-association is being done in an ad-hoc manner. Also, no UI 
  was provided for customizing content types (such as changing the default encoding, 
  adding associations with files) so the user has no control on how the content 
  type detection mechanism works. Thus, the main issues to be addressed in the 
  3.1 cycle are:</p>
<ul>
  <li>ensure the content type framework works for clients that have not adopted 
    it yet, or at least we understand why it is not (cannot be made) suitable 
    to them. Examples are: Platform/UI, Platform/CVS, and products built on top 
    of Eclipse.</li>
  <li>ensure users are granted appropriate means so they can customize the behavior 
    of content type detection so things just work for them.</li>
  <li>ensure content type resolution works (or can be made to work) appropriately 
    in setups where multiple independently developed products contribute conflicting 
    content types. </li>
</ul>
<h3>Support more use cases</h3>
<p>Ensure the content type support works for the SDK plug-ins and for products 
  built on top of Eclipse (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=78654">78654</a>).</p>
<p><strong>Platform/UI - file/editor association</strong></p>
<p>Content type-editor association is definitely the most important use case for 
  the content type support. The basic idea is that for a given file or stream 
  of data, the UI should be able to:</p>
<ol>
  <li>provide a default editor to be used when opening the file</li>
  <li>present any other editors that are also able to handle the file</li>
  <li>allow the user to pick a completely unrelated file not initially suggested</li>
  <li>remember the user decision (if the user wants so) so it becomes the default 
    for that file</li>
</ol>
<p> <strong>1</strong>, <strong>2</strong> and <strong>4</strong> are currently 
  supported by the existing file-editor association mechanism. <strong>3</strong> 
  is being requested by users, and it is orthogonal (as 4 is) to the content type 
  support provided by runtime.</p>
<p>Content types add a level of indirection between files and editors. At a first 
  glance, there is no reason why changing the default editor would affect what 
  content type is assigned to a file, so users should be able to pick up any editors 
  without affecting content type detection.</p>
<p><strong>Platform/Team - binary vs ascii files</strong></p>
<p>The Team plug-in keeps a catalog of file extensions and their expected content 
  type (either binary or ASCII). If content types were broadly adopted throughout 
  the rest of Eclipse (so that most files dealt with by users have a content type), 
  couldn't the Team plug-in use content type based encoding determination to figure 
  out a good default for this setting? (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=85490">85490</a>)</p>
<h3>Give users more power</h3>
<p>Ensure users have means to customize how the content type detection works for 
  them. Provide UI for content types. May provide some way of showing related 
  objects for a given content type (editors, views, comparators, etc). Users cannot 
  provide content type detection code, so user-defined content types would be 
  useful only for cases where content type detection is file name based (like 
  for non-formatted text files, such as source files, configuration files, etc).</p>
<p>A simple plug-in that allows users to add/remove file associations is available 
  in the <a href="../downloads.html">downloads page</a>.</p>
<h3>Improve content type determination</h3>
<p>Ensure content type detection works (or can be made to work) appropriately 
  when incompatible products are deployed together.</p>
<ol>
  <li><strong>project-specific content type resolution</strong> - to reduce chances 
    for collisions, there should be a way of constraining which content types 
    are available on a project basis. One possible solution would be to have content 
    type-nature associations. Natures could declare some preferred content types, 
    and those preferred content types would always win (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=69640">69640</a>)</li>
  <li><strong>user-defined content type determination - </strong>two plug-ins 
    provide completely unrelated content types A and B for files with extension 
    &quot;.foo&quot;. The current implementation will choose one of them (by priority, 
    depth, or arbitrarily if they have the same priority and depth). If A is automatically 
    chosen, the user might want to choose B instead as the default content type. 
    Instead of doing that on a file basis, it seems better to do it on a coarser 
    granularity, such as at the project level or at the workspace level (see bug 
    <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=85765">85765</a>). 
  </li>
  <li><strong>product defined policies for overriding content types</strong> - 
    a product may want to override some of the existing content type definitions. 
    Products should be able to do that in a way that would circumvent the regular 
    conflict resolution.</li>
</ol>
<p>See also corresponding PR <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=78654">78654</a> 
  - content type should be used universally</p>
</body></html>