78652 – content type should be bound hierarchically

Bug 78652 - content type should be bound hierarchically

Summary: content type should be bound hierarchically

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Compare (show other bugs)
Version:	3.1
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.1 M7
Assignee:	Andre Weinand
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:	86862
Blocks:
	Show dependency tree

Reported:	2004-11-15 14:35 EST by Kim Letkeman
Modified:	2005-05-08 10:19 EDT (History)
CC List:	3 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kim Letkeman

2004-11-15 14:35:30 EST

When we define a hierarchy of types, e.g. text->xml->EMF->EMX for Aurora 
models, we would like to be able to bind a default EMF merge facility in order 
to prevent any merge from happening in text mode, which is usually fatal to 
EMF models. We have a more sophisticated editor for our EMX files, which allow 
for diagram merging etc. But if we happen to start a merge of an EMX artifact 
in a product like RAD, which includes EMF merge but not EMX merge support, 
then we would like the content type hierarchy to automatically load EMF merge 
instead of text merge, which is what happens today.

Comment 1 Rafael Chaves

2005-01-12 13:47:33 EST

Is this the same use case as for bug 78653?

Anyway, if EMF cannot be made to know about the .emx extension, the only way to
fix this would be by user intervention (currently broken, see bug 72796).

This is something that is also applicable to the editor case: if the user wants
to manipulate a file, the user should be able to manually select any of the
existing tools s/he knows can do the the job (even if Eclipse does not know
about it). The user should be able to choose whether that selection should be
remembered for all files with that name pattern, for that file only, or not
remember at all. 

Kim H, this is something to have in mind during the new content type work in UI
land.

Comment 2 Kim Letkeman

2005-01-12 14:55:38 EST

   >Is this the same use case as for bug 78653?

No. That one was about getting a content-type in the first place using 
inspection (potentially) instead of name patterns. This one is about using the 
hierarchy of content-types in a useful way.

   >Anyway, if EMF cannot be made to know about the 
   >.emx extension, the only way to fix this would 
   >be by user intervention (currently broken, see 
   >bug 72796). <snip the next paragraph too>

None of this is relevant to this bug report as far as I can tell.

We have a content-type called EMF. We base it on XML/XMI, which itself is 
based on TEXT. This makes total sense when you look at the progrssive 
refinement of the file contents of each. When we define a new model file type 
(e.g. *.emx as a UML model) we then add a content-type called EMX, which is 
based on EMF. Again, this makes perfect sense when you look at the files.

What we want to be able to do:

Create a plugin set that binds all of our known content types to EMF and 
include them in the base platform. Bind default compare support to EMF type.

Create specialized client plugins that for products that are based on the 
platform, with specialized compare support that exists only in that product. 
These plugins redefine the relevant content-type definition for an artifact 
(e.g. *.EMX) -- in this case to EMX type.

Dynamically process the content type hierarchy such that the same *.EMX 
artifact is considered EMF type in products that do not contain the 
specialization plugins and EMX type in products that do contain the 
specialization.

Why do we need this?

EMF files are notoriously easy to corrupt when performing a text merge. EMF 
cannot open corrupted files, and the complexity of the EMF structures 
eliminates the ability of most people to "fix up" the merge failures as is 
frequently required and so easily accomplished in Java.

Instead, we have a basic layer of EMF compare support that can handle almost 
any meta-model through EMF reflection and provides a great deal of anti-
corruption protection. This basic level of handling can be easily bound to any 
EMF artifact by defining it as EMF content-type. This content-type is 
mandatory minimum for *all* modeling artifacts in all products that use the 
Aurora platform. Thus, we theoretically cannot corrupt any modeling artifact 
in an Aurora-based product.

But ... the reality today is that a modeling artifact in a product that does 
not contain the specific support for that type *does not* drop down to EMF 
content-type bindings, but rather jumps straight into text merge, which is 
immediately fatal to the artifact.

So we've had to simply fail the merges when the wrong product is installed and 
a merge is triggered. This prevents the text merges from corrupting the 
artifacts.

Summarizing: Since the content-types form a hierarchy, it would make a lot of 
sense to provide a consistent binding mechanism that respected the hierarchy 
and allowed binding of different levels of compare support (and later, 
editors) at each level. This would really help those of us who work on complex 
hierarchies of data formats that *must never* be loaded into a text 
compare/merge.

I hope I documented that clearly enough. It's quite a complex topic.

Comment 3 Rafael Chaves

2005-01-12 17:37:01 EST

The solution (applicable to editors/views/etc) to me looks somewhat like this:

1) the EMF project provides an EMF content type
2) RAD provides a compare editor associated to the base EMF content type
3) the Aurora product provides an EMX content type, which is an specialization
of the basic content type provided by EMF
4) the Aurora product provides a compare editor for the EMX content type
5) A RAD user gets an EMX file in their workspace. No plug-in installed has ever
heard of EMX. The default for an unknown type is the Text compare editor. The
user *knows* what an EMX file is, so it asks Eclipse to the .emx file extension
to the existing EMF content type. The user tries to compare the file again, and
now the basic EMF compare editor is picked instead.

The missing pieces here: 
- EMF does not contribute any content types
- there is no UI for associating file name patterns to existing content types
- bug 72796 should be fixed

Comments?

Comment 4 Kim Letkeman

2005-01-12 18:25:08 EST

A couple of details:

1) Aurora provides all EMF compare support at this time. Thus, we contribute 
our base assembly to RAD et al. The specialization is done with client (of the 
Aurora base compare support) plugins that sit inside the product (RSA, RSM) 
assemblies.
2) Aurora base assembly defines the EMF content type today.
3) RSA/RSM product assembly defines EMX content type today as a specialization 
of EMF content type.
4) We bind our default compare viewers to EMF type.
5) We bind our specialized UML2 compare viewers to EMX type (I think we 
actually call the EMX type "modeler", but I don't want to confuse the issues.)

When RSA sees any artifact, it can handle them perfectly because it is a 
superset of RSM and RAD and therefore has all the bindings.

When RSM sees artifacts created by RAD, it can do nothing with them because it 
does not contain the specialized plugins for RAD artifacts.

When RAD sees artifacts created by RSM, it can do nothing with them as well.

Ideally, we need a way to bind a file spec (by content token or file pattern) 
to both the EMF and the EMX content types. Obviously, when present, the EMX 
binding takes precedence because it is the specialization. Basically, we need 
the content-type hierarchy to behave polymorphically.

Important: the user cannot be relied upon to handle this reliably. Because any 
error is fatal to the data, we must make this automagic. So I would like to 
see this handled in the registration XML code and the binding search mechanism.

I agree that 72796 is a key component of this requirement, but I would want to 
ensure that this fix was integrated with the registration code in plugin.xml. 
I.e. I don't want to write code somewhere to do this unless absolutely 
necessary. Also, we need the "add filespecs" to any content-type to also 
allow "add filespecs" to more than one content-type is they are in the same 
inheritance tree. This allows the polymorphism I am asking for.

Comment 5 Rafael Chaves

2005-02-24 18:22:13 EST

Ok, first of all, sorry for taking so long to get back to you on this. I thought
it was more complicated than it actually is, but it seems that what you want is
already supported. The content type matching mechanisms are already "polymorphic".

There are a few solutions to the problem ("1" being the most recommended one):

1) move the EMX type to the Aurora layer. The compare/merge support provided by
Aurora which is associated to the EMF content type should be automatically picked.

2) (M5 only) define a trimmed-down EMX content type in Aurora that extends the
EMF type, and make it an alias for the actual EMX type (provided by RSM?). Note
that this does not introduce any dependency from Aurora to RSM. This can be
useful if you cannot just change the id for the EMX content type (content type
ids implicitly derive from the namespace where they are declared).For an example
of how to write an alias content type, check the
org.eclipse.core.runtime.properties content type.

3) associate the EMF content type to *.emx as well. This is the worse solution
since it will make any other EMF subtypes to compete for *.emx files (although
they would probabably refuse those files during the content analysis stage).

Comment 6 Kim Letkeman

2005-02-24 22:46:14 EST

Ok ... so let's assume I choose 1 (it is the only solution that looks 
plausible anyway.) I assume what you mean is:

1) Define the EMF content type in our base Aurora (or use the one that the EMF 
team will no doubt define.)
2) Also define the specialized EMX content type in our base Aurora.
3) Bind our default merge support against both. 
4) In the modeler layer, bind a new set of advanced editors against EMX 
content type.

This is in fact exactly what we want to be able to do.

Now ... my question is whether the newer compare support bindings against EMX 
type will override the default bindings in the lower layer?

Comment 7 Rafael Chaves

2005-02-25 10:20:00 EST

> 3) Bind our default merge support against both

Actually, you could bind it only to the EMF content type.

> Now ... my question is whether the newer compare support
> bindings against EMX type will override the default 
> bindings in the lower layer?

Andre, this is a question for you: given two content types A and B (B extends
A), and two extensions to any content-type aware extension point provided by
compare C1 (associated to A) and C2 (associated to B). For an A file, it is
clear that C1 is the only one eligible. But for a B file, which one is chosen (I
would expect C2)?

Comment 8 Rafael Chaves

2005-02-28 13:35:33 EST

Since every client of the content type infrastructure that keeps its own
registry of related objects (editors, compare mergers/viewers/..., etc) will
have to do the same work (content type hierarchy traversal), I believe this
warrants new API from Core. I opened an enhancement request for that (bug
86862), will mark the current one as depending on it.

Comment 9 Rafael Chaves

2005-03-22 14:23:19 EST

Please ignore comment 8 above. There is no compelling reason at the moment for
publishing that API in runtime, so the work done in bug 86862 is going to be
reverted. Implementing the same behavior in Compare would be trivial.

Comment 10 Rafael Chaves

2005-03-28 11:49:56 EST

Andre, are there plans to address this in the 3.1 timeframe?

Comment 11 Andre Weinand

2005-03-28 15:18:52 EST

Rafael, can you provide a short summary what I should implement for Compare?
Thanks.

Comment 12 Rafael Chaves

2005-03-28 15:50:19 EST

This might not even be an issue. You have to tell me if it is. The following is
what I asked in comment 7:

Consider two compare editors C1 and C2, C1 is associated to content type A, and
C2 is associated to content type B. B is a based on A. For a file whose content
type is A, there is no issue, C1 is the only eligible compare editor. For a file
whose content type is B, both C2 and C1 are eligible. But C2 should be the one
picked by your compare editor resolution, since it is directly associated to B.

To ensure that, the algorithm I suggest for figuring out what is the best
compare editor for a given file would be:
1) the associated compare editor taking into account the legacy file name based
associations
2) if none could be found, the compare editor *directly* associated with the
file content type (C2 in the example)
3) if none could be found, the associated compare editor honoring taking into
account the legacy file extension based associations
4) if none could be found, editors *indirectly* associated with the content type
- for this, you would have to walk up the content type tree until you can find
one content type for which there is an associated compare editor.

Comment 13 Kim Letkeman

2005-03-28 17:56:54 EST

We are experimenting with the content-type hierarchy and have noticed some 
behaviors that make things kind of complex for us. Examples follow: 

A) Bindings in the same hierarchy with a more specialized version:

text
 |
 |---XML
      |
      |---EMF  --> *.dnx
           |
           |--- MODEL  --> *.emx, *.dnx

By binding the dnx files (model diagrams) at both levels, we get basic tree-
merge support in our core assembly and advanced diagram based support in the 
RSx product line where the advanced presentation layer exists.

What we are finding is that the decisions are being made in favour of the more 
generic layer. This is the opposite of what we want, where the more 
specialized merge support and editors should be used if present. 

B) Bindings is separate hierarchies:

text
 |
 |---XML ----------------------------------- Version Picker  --> *.dnx (older)
      |
      |---EMF  --> *.dnx
           |
           |--- MODEL  --> *.emx, *.dnx

We have an older version of the dnx format and were hoping to use file 
inspection to distinguish between them. Here, we wanted to use file inspection 
to determine when the oder format was at hand so we could version pick it (we 
can't really process it safely in our current support.)

So our preference for an algorithm would be something like this:

1) Search the registry for all content-types that match the filename pattern 
(I assume that this is done today.)
2) Take the most specialized content-type IN EACH SEPARATE HIERARCHY. In 
example 1, we are left with MODEL, which is correct outright. In example 2, we 
are left with MODEL and Version Picker.
3) If we still have more than 1, use file inspection (e.g. by calling an API 
in the description class?) to take the first "yes". A well designed set of 
content types will answer yes only once. This also provides a near fool-proof 
method for distinguishing between content-types provided by different people 
against the same extensions.
4) If none answer yes on inspection, search up the tree IN EACH HIERARCHY and 
try all of those .... keep going until you get a yes or there is nothing 
explicitly defined.
5) Note that any content-type that does not bind the inspection capability 
would have to be considered a perfect match and should be taken as a "yes". 
This opens the door to ambiguity, but that is no surprise with the content-
type area anyway.

Some suggestions: 

1) I believe that we need to call a file inspection interface (if bound) for 
all content types. I.e. this should be an explicit interface denoted in the 
content-type binding and should always be called. If the answer is "no", then 
simple processing up the tree will likely arrive at text at some point and 
take care of the problem. If it does not, then signal to the user that the 
content-type is unknown. If a content type exists for the file pattern that 
does not have a bound inspection class, then the file is deemed to be that 
content type notwithstanding the other processing mentioned here.

2) I think we need a way to specify that a content-type is structured data and 
MAY NOT be processed as text. I would suggest the ability to denote processing 
of EXPLICIT content types only ... similar to the EMF and MODEL types for dnx 
files in both examples above. This is a fundamental requirement for EMF based 
content types as text merges are fatal very often.

3) I think we need to process explicit content-types differently from implicit 
ones. I.e. we need to allow for explicit content types anywhere in the content 
type hierarchy and try all of them first. Then move up the content type tree 
and find a better match. We certainly would find this very convenient to 
provide different levels of protection and processing for our structured types.

Comment 14 Rafael Chaves

2005-03-28 18:24:29 EST

Kim, we need to be sure whether we are talking about content type matching
problems (which would belong to the Platform/Runtime component) and Compare
editor selection problem in a scenario like I describe in comment 12 (what this
issue is actually about).

There is also the possibility of you misunderstanding how the content type
matching support works. If you need any clarifications on that, feel free to
send a message to me or to the platform-core-dev list. For instance, you should
make sure to check in your tests what the inferred content type is (right click
on file, properties..., info page) and if whether what you expect. If not, it
might be a bug in content type determination algorithm or a problem in your
content type definition.

Comment 15 Rafael Chaves

2005-04-18 14:23:27 EDT

Kim, is this still an issue for you? If so, have you read my comments (comment
14) above? Before this can be addressed, we need to understand what the problem
you are seeing is (and whether is a real bug in Core/Compare or just a user error).

Comment 16 Rafael Chaves

2005-04-21 14:30:08 EDT

Re: comment 13 - a post in the newsgroups helped identifying a problem that may
be the cause for what you are seeing. See bug 92270.

Comment 17 Kim Letkeman

2005-04-21 15:00:15 EDT

Yes, we found that one too ... in fact, we have been considering getting 
together for a discussion on the whole hierarchy processing issue. But we have 
found workarounds for most of our stuff, so it is not urgent. 92270 should 
definitely be addressed though, because defaulting to the less specialized 
type's viewers is counter-intuitive. We really feel that the hierarchy should 
be processed from the bottom (more specialized) up, selecting the viewers as 
they are found.

If that is fixed, then most of this issue disappears, but there is still the 
problem where we bind our content type into two spots in the content-type 
hierarchy that are not on the same branch. When this happens, it appears that 
the level within the trees factor in, even though there is no relationship to 
level within branch at all.

Comment 18 Rafael Chaves

2005-04-21 15:10:34 EDT

Deciding between unrelated branches is indeed an issue. Please open a PR against
Platform/Runtime for that issue describing your scenario.

Comment 19 Andre Weinand

2005-05-08 10:19:28 EDT

I've verified that the content type algorithm (after some cleanup) behaves as described in comment #2.

As an example I've created a new plugin that defined a new contenttype "FooBar" with the extension 
"foobar". FooBar is based on the java Property contenttype. The plugin does not define a FooBar 
compare.

If I compare two newly created files "f1.foobar" and "f2.foobar", I get the Java Property compare.
If I compare "f1.foobar" and a Java property file "p.properties", I get the Java Property compare.
If I compare "f1.foobar" and a plain text file, I get Text compare.

If I add another plugin that registers a specific compare editor for FooBar files,
comparing "f1.foobar" and "f2.foobar" opens FooBar compare.