[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[news.eclipse.technology.ldt] Re: Beyond textual represenations...

Hi Guillaume -

I agree with Chris Daly's comment on another post that this idea is out of 
scope for LDT.

But continuing the discussion...  Maybe I'm unimaginative, or maybe I'm 
really dense, but I still don't see why this would ever be useful or 
desirable.

You claim that the benefits of storing an AST on disk instead of source code 
are the following (correct me if I'm wrong):

1.  The machine would no longer have to parse the source to get an AST, 
since the AST would already exist on disk.  You claim this is useful because 
many logical views over the source code operate on an AST instead of source.

2.  There are a class of languages for which parsing can be problematic, 
citing Maya as an example.

3.  Having a common substrate for all possible languages increases tools 
interoperability.

4.  storing programs as ASTs  "..would force us to rethink how to specify 
the means of interaction with the source code...", implying "...that it 
would be necessary to define 'languages designed to express languages and 
their IDE tooling' "...


Let me address each of these four points in turn.

Regarding 1,  who cares?  We already know how to parse source code into an 
AST.  Moreover, if you ever want to read the source code, you would need to 
translate the AST back into source, where, presumably, someone would edit 
the text, at which point you would need to parse to build the AST again.  So 
persisting an AST doesn't free you from the need to parse.

Regarding 2,   I'll admit that dynamically redefining grammars in specific 
context sounds like an interesting idea.  But what problem does it solve? 
How does Maya let one express something more elegantly or efficiently when 
compared to vanilla java?   Perhaps more to the point, it sounds like the 
new parsing techniques necessary to make Maya work were already figured out, 
while it isn't at all clear that the user-interface for tools operating 
direcly on an AST is well-defined.  And if the user ever changes source 
code, you would need to parse to build your AST.  So you still have to write 
a parser.

Regarding 3,  this common substrate is what the LDT is going to try to do. 
But the plan is to address it at an API level.  Not a serialized on-disk 
format.

Regarding 4,  I think having a high-level language to express how tooling 
should be generated is a good and interesting idea.  However, I'm not clear 
on how persisting ASTs forces this to happen.

Now, lets consider what you would lose if you were to persist ASTs.   First, 
you would be forcing users to either use the eclipse toolset, or to 
understand the serialized AST format.  If someone wants to read the 
persisted format outside of eclipse, they need to understand this format, 
and I don't think it is fair to ask people to do that.  So think about how 
frustrated people will be when the want to use another IDE, or 
emacs/vi/notepad.  Also, what about diff tools?  Source control tools? 
Command-line text-processing tools such as perl, wc, grep,...?  What about 
integrating projects into pre-existing build systems?

I guess the point is that storing programs as text is really useful and 
really *simple*.  This is probably an interesting research idea, but in my 
personal opinion, I don't think it is appropriate in the LDT.


Mike Kaufman
BEA Systems, Inc.



"Guillaume Pothier" <gpothier@xxxxxxx> wrote in message 
news:d1gkac$ppn$1@xxxxxxxxxxxxxxxxxx
> Mike Kaufman wrote:
>> Hi Guillaume -
>>
>> I think that the reason the primary representation of programs as text 
>> files is because it is simple and readable by humans.
>>
>> I also don't understand what benefit a binary storage form would have in 
>> making it easier to bring new languages online in eclipse.
>
> I don't necessary mean a binary storage form. XML is quite suitable to 
> represent graphs when accompanied by some linking dialects (like XLink). 
> It's human readable, and there are good databases and query languages for 
> XML.
> But let's forget XML (which still is a notation) and just say that I want 
> the basic substrate for a computational system's 'source code' to be an 
> Abstract Syntax Tree/Graph.
> Textual form is limiting in that it needs to be parsed (the compiler will 
> eventually need an AST anyway).




> Parsing becomes an obstacle when the languages become complex, and can 
> impose artificial limitations on what can be expressed in the language.
> For instance, the Maya language (http://www.cs.utah.edu/~jbaker/maya/) is 
> a kind of meta language: it is all of Java plus a means to dynamically and 
> selectively redefine the language. For instance, you can express that in a 
> given scope, the '->' character sequence is a lexical token, and assign a 
> meaning to that token. Or you can redefine the semantics of existing 
> syntactic constructs. The issue is that Maya's author had to invent new 
> parsing techniques to deal with that stuff, for things conceptually as 
> simple as matching braces in the context of a dynamically defined grammar. 
> And there are case that are simply not feasible. This kind of problem is 
> absent in an AST: provided adequate authoring tool, the developper can 
> specify the bounds of a code 'subgraph'.
> Moreover, a tree or a graph is more the natural form of source code: this 
> is well illustrated by the extremely frequent usage of tools that 
> *simulate* graph views of textual source code: package explorer, class 
> hierarchy view, method overriding view...
>
> So, back to your point: what would be the benefits of a framework that 
> provides a substrate for graph-based source code when it comes to 
> implementing tooling for new languages?
> - All languages can be translated into an AST/G (well, I think so...) 
> Having a common substrate for all possible languages is certainly an 
> interesting feature when it comes to tools interoperability. Same 
> philosophy as in TPTP with the Common Base Event.
> - Basing the framework on AST/Gs instead of text would force us to rethink 
> how to specify the means of interaction with the source code (think 
> autocompletion, source generation like getters/setters...). This means 
> that it would be necessary to define "languages designed to express 
> languages and their IDE tooling", like Chris Daly mentioned. Of course it 
> means more work for ldt, but then much, much less for new language 
> implementors.
>
> As a side effect, I think that it would foster the development of 
> radically new languages, languages that are completely impractical to 
> develop now because of the lack of this infrastructure.
>
> Regards,
> Guillaume
>
>>
>> Mike Kaufman
>> BEA Systems, Inc.
>>
>>
>> "Guillaume Pothier" <gpothier@xxxxxxxxx> wrote in message 
>> news:d1fm9d$93j$1@xxxxxxxxxxxxxxxxxx
>>
>>>I am really glad that the LDT project has been created.
>>>Many of the posts to this newsgroup discuss ASTs and parsers, assuming 
>>>that a programming language's primary representation will always be its 
>>>textual form.
>>>
>>>As far as I am concerned, I anticipate the day when the primary from of a 
>>>programming language is its AST, or even better, Abstract Syntax Graph. 
>>>Given the currently available computing power of the machines used by 
>>>developpers today, I think it is no luxury to start thinking of replacing 
>>>the traditional bunch of source files stored in a hierarchical filesystem 
>>>by a graph of objects backed by some database system.
>>>
>>>There are a few obvious benefits to do so, even in the simple case of 
>>>Java-only developments:
>>>- No more source code formatting hell and obnoxious merge conflicts 
>>>caused by formatting differences between developpers.
>>>- Elimination of the parsing step from the compilation
>>>
>>>I think the first thing LDT should provide is a new storage metaphor 
>>>based on ASTs (or ASGs). Of course, this representation issue does not 
>>>solve all the problems LDT proposes to deal with, but it would be a great 
>>>foundation for implementing new language support.
>>>
>>>Regards,
>>>Guillaume Pothier
>>
>>