[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[news.eclipse.technology.ldt] Re: Beyond textual represenations...

Mike Kaufman wrote:
Hi Guillaume -

I think that the reason the primary representation of programs as text files is because it is simple and readable by humans.

I also don't understand what benefit a binary storage form would have in making it easier to bring new languages online in eclipse.

I don't necessary mean a binary storage form. XML is quite suitable to represent graphs when accompanied by some linking dialects (like XLink). It's human readable, and there are good databases and query languages for XML.
But let's forget XML (which still is a notation) and just say that I want the basic substrate for a computational system's 'source code' to be an Abstract Syntax Tree/Graph.
Textual form is limiting in that it needs to be parsed (the compiler will eventually need an AST anyway). Parsing becomes an obstacle when the languages become complex, and can impose artificial limitations on what can be expressed in the language.
For instance, the Maya language (http://www.cs.utah.edu/~jbaker/maya/) is a kind of meta language: it is all of Java plus a means to dynamically and selectively redefine the language. For instance, you can express that in a given scope, the '->' character sequence is a lexical token, and assign a meaning to that token. Or you can redefine the semantics of existing syntactic constructs. The issue is that Maya's author had to invent new parsing techniques to deal with that stuff, for things conceptually as simple as matching braces in the context of a dynamically defined grammar. And there are case that are simply not feasible. This kind of problem is absent in an AST: provided adequate authoring tool, the developper can specify the bounds of a code 'subgraph'.
Moreover, a tree or a graph is more the natural form of source code: this is well illustrated by the extremely frequent usage of tools that *simulate* graph views of textual source code: package explorer, class hierarchy view, method overriding view...


So, back to your point: what would be the benefits of a framework that provides a substrate for graph-based source code when it comes to implementing tooling for new languages?
- All languages can be translated into an AST/G (well, I think so...) Having a common substrate for all possible languages is certainly an interesting feature when it comes to tools interoperability. Same philosophy as in TPTP with the Common Base Event.
- Basing the framework on AST/Gs instead of text would force us to rethink how to specify the means of interaction with the source code (think autocompletion, source generation like getters/setters...). This means that it would be necessary to define "languages designed to express languages and their IDE tooling", like Chris Daly mentioned. Of course it means more work for ldt, but then much, much less for new language implementors.


As a side effect, I think that it would foster the development of radically new languages, languages that are completely impractical to develop now because of the lack of this infrastructure.

Regards,
Guillaume


Mike Kaufman BEA Systems, Inc.


"Guillaume Pothier" <gpothier@xxxxxxxxx> wrote in message news:d1fm9d$93j$1@xxxxxxxxxxxxxxxxxx


I am really glad that the LDT project has been created.
Many of the posts to this newsgroup discuss ASTs and parsers, assuming that a programming language's primary representation will always be its textual form.


As far as I am concerned, I anticipate the day when the primary from of a programming language is its AST, or even better, Abstract Syntax Graph. Given the currently available computing power of the machines used by developpers today, I think it is no luxury to start thinking of replacing the traditional bunch of source files stored in a hierarchical filesystem by a graph of objects backed by some database system.

There are a few obvious benefits to do so, even in the simple case of Java-only developments:
- No more source code formatting hell and obnoxious merge conflicts caused by formatting differences between developpers.
- Elimination of the parsing step from the compilation


I think the first thing LDT should provide is a new storage metaphor based on ASTs (or ASGs). Of course, this representation issue does not solve all the problems LDT proposes to deal with, but it would be a great foundation for implementing new language support.

Regards,
Guillaume Pothier