Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[cme-dev] Thoughts on rearchitecting the Ant loader (and other XML-artifact loaders)


A. S.  Apologies for the long message.  There are some conclusions at the end.  Details in the middle.

Hi All,

At OOPSLA last week Andy and I had a conversation about loaders.  He said he'd be curious to see whether the new loader architecture would work with the Ant loader.  (It currently works on Eclipse core resources, JDT resources, CIT type universes, java.io.File entities, and combinations thereof .)  Andy wondered in particular whether the new architecture might provide a way to improve performance.  Conveniently, I was on the verge of looking at the Ant loader anyway, so I've done a little of that.

Before I get into technical details, I should warn you that I have little background in parsers and have never written an implementation of the Visitor pattern.  Nevertheless, I think I have  a general grasp of what goes on in the Ant loader.

Unfortunately, I don't think there's a natural fit between the Ant (and XML/parser-based loaders in general) and the new "resource-oriented" loader architecture (for lack of a better term).  For that reason, I don't believe that the new loader architecture offers any quick fixes for improving performance in the Ant loader.  However, since both loaders are made of software, there are things we can do to adapt parser-based loaders to the resource-oriented model, and it may be possible to derive some benefits from that.  How and what (and whether) remain to be determined.

Regarding the resource-oriented architecture:  This was devised to introduce a systematic architecture into a subset of loaders that lacked one.  The architecture seems fairly natural to me:  It takes advantage of OO hierarchies, it provides particular types of loaders for particular types of artifacts, and it has a number of good software engineering qualities:  modularity, understandability, reconfigurability, extensibility, flexibility, etc.    A significant characteristic of the resource-oriented loaders is that the artifacts they load belong to name spaces--instances of most of the types of artifacts can be requested directly by name, or belong to some artifact that can be requested by name, and these names can often be obtained from outside of the artifact model, without necessarily loading or traversing the model (e.g., pathnames in a file system, or space-qualified type names in CIT).  These loaders also take advantage of the fact that (at least within a particular artifact type model) it is usually possible to navigate readily from one element to related elements of various sorts.

I can't really compare the performance of the resource-oriented loaders with the original loaders, as the new loaders generally don't load relationships yet (and how to best do that is still an open issue, which is tied in to the issue of how to treat relationships in CIT).  In any case, perhaps the most significant opportunity for performance enhancement with the new loaders may be in the flexibility they afford to load elements selectively.  This is not a big advantage if you want to load a whole model at once but potentially a great advantage if you want to work just with selected elements (or selected kinds of elements).

Regarding the Ant loader (and XML/parser-styled loaders):  The Ant loader has a reasonable architecture that seems systematic and appropriate for an XML-based language.  Thus there seems to be no solely architectural reason to replace it.  Ant loading involves parsing the input (Ant/XML) file and then visiting the nodes in the parsed structure and creating concern-model elements for them.  It does not seem (as far as I know) that the elements in the input file, or in the structure parsed from it, are natively addressable by name, although you can navigate up and down in the parsed structure (i.e., DOM or XML tree).  So the parser-style loader seems naturally suited to loading a whole artifact model top-down, but not naturally suited to loading arbitrary elements from the middle of a model.  Also, there seems to be less incentive (and perhaps less sense) to having specialized types of loaders for specialized types of elements, e.g., one for top-level Ant files, one for Ant targets, one for Ant tasks, etc.  This strikes me as so because a) all types of Ant artifact are represented as elements in an XML file (i.e., they're all made out of the same stuff), and b) I don't think it will be likely that people will want to load, say, Ant tasks outside of the loading of an Ant file, so there would be little call for an Ant-task loader that could be reused in various other sorts of loader.  (Although I could be wrong about any of this.)

On the other hand, it might be the case that we would want particular sorts of Visitor for particular sorts of node, depending on what we wanted to do with the nodes in the concern model.  (Actually, we can consider the parser to be an artifact-model constructor, the traverser to be a loader driver, and the visitors to be artifact-element-to-concern-model-element mappers.)

Peri has speculated that the main performance drag for the Ant loader is probably the parsing, which is the part that is most outside of our purview.  I wonder if, for Ant files, it's much of a problem in practice.  That is, it seems to me that there are relatively few Ant files and that they're of small to moderate size, so that the loading of Ant files may take up only a small fraction of our overall load time for a workspace in any case.  (I suppose we could try to measure this.)  So we may not want to worry about this for the sake of Ant files specifically, but performance may be an issue as we incorporate more types of XML artifact and larger XML artifacts.

Peri also pointed out to me something that I hadn't been aware of, namely, that there is (or is planned) a "CIT-mini" (or "mini-CIT"?) that (either alone or by specialization) would accommodate Ant and (possibly) other XML-based languages.  Anyway, the general idea is that we could put Ant (and other XML-based) languages behind a CIT interface.  That would hide the parsing of the XML file and presumably require that some form of direct access by something (like a name) be supported.  If that were done, then the new, resource-oriented loader architecture would apply naturally.  This would not obviate the need for parsing.  It might provide greater flexibility in the construction of concern models, which might enhance performance when loading can be managed in stages or selectively.  This, I believe, and the other advantages that might accrue from being behind a CIT interface, would be the main reasons to try to shift Ant (and XML) loading in the direction of the new loader architecture.

(I see that there was at least a start at creating what looks like a CIT representation for ANT, but that seems to have been abandoned several months ago.  I don't know the story of this.  Also, note that providing an Ant implementation for CIT doesn't imply a commitment to any particular loader architecture--there may be multiple styles of CIT loader.)

By the way, the Ant loader works as-is when the workspace is otherwise loaded by loaders with the new architecture, so compatibility of the loader architectures isn't an issue.

We should appreciate that the parse step is something that at least some of the other sorts of artifacts we load don't have to undergo.  We may think of the Ant file as an artifact, but from the perspective of the concern model the Ant file is really a package or archive that contains the real artifacts (tasks, targets, etc.).  Parsing is needed to expose those artifacts, which can then be loaded.  For CIT loaders on Jikes or Shrike something similar happens, although the granularity of "parsing" in those cases may vary.  In contrast, when loading through java.io.File, the artifacts of interest to loaders (files and directories) are directly at hand.  And when loading Eclipse core or JDT resources, Eclipse takes care of the loading of the models, so again the artifacts of interest to loaders are directly at hand.  In these cases we don't have to do any parsing preparatory to loading.

I'm not sure where this leaves us, but I wanted to start a discussion.  My conclusions so far (tentative as they may be):
  • I don't think that there is a compelling reason to change the Ant loader specifically at this time (but someone should correct me if I'm wrong)
  • We should begin now to worry about problems with performance and/or flexibility for the loading of XML- based artifacts in general, in order to smooth the way for people who may want to extend the CME by adding such artifacts
  • If there are good reasons to put Ant (or a generic XML loader?) behind a CIT interface, then we might want to try that (it could be an informative experience)
  • Switching the architecture of XML loaders to the new "resource-oriented" style is not likely to improve performance (I don't believe), except in special cases of staged or selective loading (the need for which remains to be established)
  • If we're concerned about the performance of parser-style XML loaders, anything that we can do to improve that in the current parser-style loader architecture is probably a good idea (e.g., can we make them amenable to staged or selective loading?)

Comments and questions welcomed!

Thanks,

Stan




Back to the top