Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[cme-dev] question on the expected mutual consistency of project classpaths


Hi All,

As most (if not all) of you know, we're in the middle of redesigning the Conman loaders.  One issue affecting the redesign (and the design of an appropriate Conman model) is the question of the mutual consistency of classpaths in an Eclipse workspace.  (For purposes of building a concern model I believe we can focus just on the classpaths that are associated with particular projects, ignoring runtime classpaths, classpaths that may be embedded in particular files, etc.)

The key consistency issue is whether references to classes will be resolved in the same way in all projects, i.e., according to all classpaths in the workspace.  There is a more formal and precise statement of what this means below, but a basic manifestation of the issue is whether a reference from a class A in one project to a class B in another project will be resolved to the same declaration of class B according to the classpaths in two different projects.

This is an important issue for which we need to have a good understanding, the sooner the better.

The classpath-consistency condition has an effect on how we model relationships in Conman.  If all references are resolved in a uniform way across the workspace, then we can put all of the relationships for all projects into one big pool without worrying about sorting them by project (or classpath).  This is what we do now, by putting all relationships into the concern space directly.  If we can do this, it will save time in computing the relationships and space in storing them.  It also allows us to present a relatively simplified view of the workspace to the user (the view we present now).  This condition is assumed, in effect, by the current loaders and query mechanisms, and it probably affords some simplification (although perhaps minor) in the implementation of these components.

On the other hand, this condition is not required by Eclipse, we don't know whether typical users will typically assume this condition holds or will be taking advantage of the more general Eclipse semantics, and we have no idea as to whether our current workspace (or typical workspaces) will observe the condition.  We would also have to create some means of verifying the condition (which we expect would operate mainly as a stand-alone utility).

If we cannot safely assume that the condition holds (or if we just want to model the more general Eclipse semantics), then we would have to compute a set of relationships relative to the classpath for each project and keep those separately (or be able to sort them out).  This has a higher cost in terms of computation and storage.  It could complicate the views of the workspace and the formulation of queries against it (which would have to accommodate an element of project relativity).  The impact on the loaders would be minor (less than the impact of changes that we've incorporated on a regular basis); my understanding is that future work on query implementations could accommodate this change readily but that existing implementations would have to be re-engineered to some extent.  On the other hand, it if is not safe to assume the condition, and we do not contextualize the relationships, then the information and views we provide for a workspace will be wrong in general and the utility of the environment will be compromised.

Two additional notes:
  • Even if the classpath-consistency condition is a safe assumption in many cases, we probably still need the capability to subdivide a workspace (or our views of the workspace) into some form of working sets.  Even people who habitually work with sets of Eclipse projects that have consistent classpaths may still work with several such sets in a single workspace.  Accommodating multiple working sets is likely to raise some of the issues that contextualizing relationships would raise, e.g., the need to accommodate contextualized views and queries.  However, it would probably not entail some others, such as the performance and storage costs for multiple alternative sets of relationships.
  • Most of the changes we might make to the loaders now can probably accommodate contextualization later without too much additional work, e.g., it should not be difficult to change where the loaders store relationships.  However, we need to be careful that other work we may do does not depend to heavily on this assumption if the assumption is not expected to hold in the future.  Also, for the sake of correctness and usefulness of the environment, we need to have some idea of whether classpath consistency holds now.

So, we're wondering whether classpath consistency reflects reality in the perception and use of Eclipse and whether it would be natural or onerous to require it of our users.  We don't have the breadth of perspective to judge this by ourselves.  Please let us know what you think!

Thanks,

Stan

P. S.  Here's is a formulation of the classpath-consistency condition that Harold Ossher worked up after talking with Bill Harrison:
  • An artifact is uniquely identified by a name pair, (d, n), where d is a "disambiguator" and n is the artifact's selfIdentiryingName.
    • There are many possibilites for the disambiguator, and the rest of this analysis is neutral to them. E.g.:
      • a container, like an Eclipse project or special kind of concern, whose contents are guaranteed to have unique (non-duplicated) selfIdentifyingNames.
      • location (disk address, canonocal path, ...)
      • It is likely true that disambiguators are / have associated with them mappings from names to locations; i.e, given a disambiguator and a name, a unique disk address or whatever can be found for an artifact with that selfIdentifyingName
      • A classpath P can be considered to be a sequence of (d, n) paris
        • It is usually expressed at a higher level, such as a sequence of directories or jar files (perhaps a sequence of d's), but since names are unique within these elements so order within them doesn't matter, it boils down to a sequence of (d, n) pairs without loss of generality.
        • A classpath implies a function: P(n) = the first (d', n') in P such that n' = n
        • An artifact (d, n) contains names of things it refers to. The set of these names is denoted ref(d, n).
          • It is assumed that all relationships loaded by conman loaders are determined by examining the ref sets for loaded artifacts (there might be complex computation involved, but this is the root source of the information)
          • Given these definitions and assumptions, the following is a statement of the "compatible classpath" restriction we have talked about for some time:
          forall P1, P2, d, n:
                  (d, n) in P1 and (d, n) in P2 implies           // the same artifact on two classpaths implies
                          forall n' in ref(d, n): P1(n) = P2(n)        // that all names referenced in that artifact are resolved to the same artifact in both classpaths
          • Checking the above condition requires detailed, expensive examination of all the artifacts. A conservative approximaiton is given by the following:
          forall P1, P2, d, n1, n2:
                  (d, n1) in P1 and (d, n2) in P2 implies                                                         // d has some presence in both P1 and P2 implies
                          forall n occurring after the first occurrence of d in both P1 and P2: P1(n') = p2(n1)        // any later-occurring name is resolved uniformly
          This check can be performed by examining the names of all classes on each classpath but not their contents


Back to the top