Bug 225994

Summary:	Add a extension point and support for Contributing Parsers for specific Partions
Product:	[WebTools] WTP Source Editing	Reporter:	David Carver <d_a_carver>
Component:	wst.sse	Assignee:	wst.sse <wst.sse-inbox>
Status:	NEW ---	QA Contact:	Nick Sandonato <nsand.dev>
Severity:	enhancement
Priority:	P3	CC:	david_williams, gregory.amerson, jin.phd, raghunathan.srinivasan, thatnitind, zulus
Version:	3.0	Keywords:	helpwanted, investigate
Target Milestone:	Future
Hardware:	PC
OS:	All
Whiteboard:

Description David Carver

2008-04-07 12:44:30 EDT

3.0M6
Eclipse 3.4M6

Currently the way the Structured Source Editor seems to be setup, the Tokenizers need to know the complete language in order to handle and edit the document, and to provide the necessary region identifiers. It would be nice if we could generalize this through an extension point for SEE that would indicate by content type, what Partitions should get which parsers, so that the regions can be contributed by the specific parsers.

Let's take an example editor, like the HTML editor. Unless I'm totally missing how this works (and that could be the case), then the HTML editor must understand both the HTML, CSS, and Script Partitions and have a parser internally defined to handle these partions and generate the appropriate region/node information. However this ties the implementation directly to that particular editor.

What if special handling for a new Micro-format was needed with out a grammar for it, and special handling needed to happen for particular regions within that Micro-format. It's not uncommon for HTML now a days to have a mixture of HTML, Microformats, CSS, Scripts, etc all in one file.

The XML editor is another example. We have grammar content assistance contributed through the xml catalogs and DTDS, but for XML grammars like XInclude, the grammar only has a portion of the functionality. Specific functionality and parsing for XPointer or XPath needs to be provided as well, which requires to parse the Xpath expression as if it were a Script tag from HTML that contain java script, it needs special handling.

The idea here would be to have parsing take place based on content-type or a user specified class. This would allow an editor like XQuery that contains XQuery syntax, but also XML support to have functionality for both, and an adopter could potentially add support for parser specific functionality and content assistance as well.

If this functionality is already there, it doesn't seem to be documented in a clear enough fashion. Maybe an article needs to be written for Eclipse Corner that shows how to do it with the existing API and extension points?

I don't expect this for 3.0m7 or 3.0, but it would make my life much simplier in XSL Tooling.

Comment 1 Nitin Dahyabhai

2008-04-07 17:33:20 EDT

Actually, it works the other way around for us.  The tokenizers parse through
the source exactly once and everything else is built on top it their output. 
The partitions created by the partitioner aren't part of a in-memory model that
stays updated, they're instead created on-the-fly based on the text region
information (this causes the results from
StructuredTextPartitioner*.getPartition(int) to not fully comply with the
contract, a low-cost solution for which we don't yet have).

A partitioner along these lines would still need information about what
constitutes an edge between two different partition types, and would need an
appropriate state table to transition between them at the right
times--essentially duplicating what we have already done with the source
parser.  And then you'd have to resolve the problems faced by needing to
incrementally reparse that document as it's edited.

For script tags, the HTML partitioner is smart enough to recognize the script
tag, read its type and language values, and generate the partition's type based
on those values.  You can see the XML partitioner doing something similar in
StructuredTextPartitionerForXML.getPartitionType(ITextRegion, int) so that the
partition type of a Processing Instruction's content varies with the specified
target.

I suppose we could try something like allowing the region factory to call
another tokenizer (or whatever) on the text contained by regions of specific
contexts and return different implementation classes when needed.  This skips
over the more complicated issues with optimizing the reparsing.  An example
would be taking the text of an attribute and running it through another
(generated) parser to detect XPath expressions, and when one was found, return
a subclass of AttributeValueRegion encapsulating more information than normal
(even if it's just a boolean saying "hey! there's an expression here!").  A
partitioner could then make use of this information.  The interaction between
the tokenizer and region factory amounts to one line of code, so it would be
one place to start.

You know, in theory.

Comment 2 David Carver

2008-04-07 17:51:46 EDT

Yeah...unfortunately, I learned this weekend that it was parsing..regions...partitions... and that the parsing was hard coded to a particular editor.   It gets even trickier with XML parsing, in that you may have multiple namespaces that each may need to have some parser that handles specific content beyond that which is provided by the base XML parser.   XSL's xpath being one example, xinclude and Xpointer being another.

For XSL, there are only three type of attributes I want the XSL Partition to appear in, but currently when the Partitions are set, I haven't found a way to get the namespace that the particular xml tag or attribute resides in.   Consider that for XSL it might also be nice to have CSS and Script editing support included based on the content-type for a particular region...if it could just reuses portions of the existing editors through some sort of extension point, you could get a very powerful and feature rich editor with little extra work by the adopter.

Comment 3 Nitin Dahyabhai

2011-06-07 16:46:51 EDT

Just for those stumbling across this bug...

(In reply to comment #2)
> Yeah...unfortunately, I learned this weekend that it was
> parsing..regions...partitions... and that the parsing was hard coded to a
> particular editor.

It's actually tied to a org.eclipse.core.contenttype.contentTypes extension, as driven by the model loader that's associated to it.