[imp-dev] An acceptible semantics for value annotations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[imp-dev] An acceptible semantics for value annotations

From: Jurgen Vinju <Jurgen.Vinju@xxxxxx>
Date: Mon, 19 Jan 2009 15:35:54 +0100
Delivered-to: imp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/imp-dev>
List-help: <mailto:imp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/imp-dev>, <mailto:imp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/imp-dev>, <mailto:imp-dev-request@eclipse.org?subject=unsubscribe>

Hi Imps,

...and then came the details of annotations... The PDB value API supports arbitrary values to be annotated
with arbitrary other values. The intention is to allow independent/orthogonal extensions to a data-structure that
represents source code. At CWI we have good experience with such a system (ATerm library), allowing tools to
share data without interference - but not without some issues.

In the context of the PDB these issues are more serious and therefore I am restricting the use of annotations
severely, namely INode (typed trees). Here is the motivation, for your reference. Don't read it unless you are interested.

These are observations (O), problematic implications (P), and choices (C):
O: an annotated value is different from the otherwise equal non-annotated value.
P: there are two notions of equality: with and without annotations (which can be confusing in itself)
P: a choice must be made which notion of equality guides the semantics of set, rel and map operators (e.g. union/intersection).
C: we may choose for full equality (not ignoring annotations)
     P: this can be implemented efficiently
     P: annotations are no longer orthogonal. One has to know that annotations are present to decide how to use union/intersection/equality.
     O: This is confusing and contradicting the requirements.
     O: Annotations may be embedded deeply in a set element, changing its identity at the top level.
     P: this may lead to subtle bugs.
C: we may choose for partial equality (ignoring annotations)
     P: this would require more time to compute than the previous.
   P: the algebraic properties of set/rel/map operators will change (union would not be commutative!) I.e. {1} union {1@x} = {1}, but {1@x} union {1} = {1}?
     C: we may try to "union annotations" for elements when trying to compute union or intersection
        O: Annotations may be embedded deeply in a set element, changing its identity at the top level.
        P: this would be too slow (I tried)
     O: Annotations may be embedded deeply in a set element, changing its identity at the top level.
     P: this prohibits straightforward reasoning about set and relation computations and may therefore introduce subtle bugs.
   O: since IValues are immutable, annotations dissappear as soon as any operator is applied
   P: for sets, lists, maps, etc, this means that annotations will not last long unless the programmer "propagates" annotations manually
   P: this may lead to subtle bugs in which an annotation is expected, but has dissappeared due to some unforseen/new/external intermediate computation.
   O: trees have few mutation-like operators, and the existing ones can easily propagate annotations automatically (like setChild).

***In short: annotations introduce a second notion of equality which implies (as far as I can see) hard to trace semantics for pdb.values.
Some would say it implies a serious loss of "declarativeness". It could also imply serious efficiency loss, depending on which choices one makes.
Furthermore, annotations dissappear quickly on anything but a tree, which is mainly due to immutability of values.

Therefore I have limited the notion of annotations to ITree and INode:
Opinion: Like XML attributes identify XML nodes, it would not be surprising that annotations would change
       the identity of a tree node in pdb.values. This is unlike integers, for which it is very surprising if the annotated number 1 is not equal to the not-annotated number 1.
       Similarly for annotated doubles, sets, maps and relations identity changes due to annotations are surprising, but not for trees.
Observation: It is (still) the case that annotations need to be declared before use. I.e. annotations are part of the contract of a tree data-structure and are therefore
       visible and tractable in some manner. This mitigates the problem of having "hidden annotations" which change the identity of a tree. Therefore I am not allowing annotations
       on any ITree, just on INodes (typed trees), such that annotations are always part of a contract (schema) for a certain data-type.
C: I have chosen for full equality (not ignoring annotations)
      P: this can be implemented efficiently
      P: this introduces the need for a "equal modulo annotations" method on ITree and a recursive "remove all annotations" method.
      P: this keeps the algebraic properties of union and intersection and friends, which is the main reason to chose for this option.
C: I have NOT chosen for partial equality (ignore annotations)
      P: this would have hindered efficiency
      P: this would have allowed annotation obliviousness (which is good, but tough luck)
      P: this would have changed the algebraic properties of union and intersection and friends (i.e. not commutative anymore)
           which is the main reason not to chose for this option.
O: trees can be embedded in other "container" structures, such as tuples, sets, maps, and relations.
O: and, identity is transitive: a container that contains an embedded annotated tree is different from the container that contains the not-annotated tree
P: this may be hard to trace, but it seems that this is the price to pay for having annotations at all: annotations simply change identity.

A prime example of annotated ITree's would be AST's annotated with type or source code location information. Another example would be graph nodes annotated with
visual attributes such that they can be rendered nicely. Such applications can easily be serviced using the current limitation.

Last note: to program safely the "pragmatic programmer" that uses ITree and INode should always assume that annotations may be present on the tree data that
his code consumes.

I will commit the changes shortly.

Cheers,

Jurgen

Prev by Date: Re: [imp-dev] BigIntegers for pdb.values
Next by Date: [imp-dev] Fwd: PDBPlugin.getLanguageID()
Previous by thread: [imp-dev] BigIntegers for pdb.values
Next by thread: [imp-dev] Fwd: PDBPlugin.getLanguageID()
Index(es):
- Date
- Thread

Breadcrumbs