Bug 42253 - [plan][dom/ast] Make AST more robust against syntax errors
Summary: [plan][dom/ast] Make AST more robust against syntax errors
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: 3.2 M5   Edit
Assignee: David Audel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 44983 97820 110322 120870 (view as bug list)
Depends on:
Blocks: 44964 53243 93141
  Show dependency tree
 
Reported: 2003-08-29 10:10 EDT by Martin Aeschlimann CLA
Modified: 2006-02-15 12:04 EST (History)
11 users (show)

See Also:


Attachments
statements recovery (199.31 KB, patch)
2005-12-16 12:15 EST, David Audel CLA
no flags Details | Diff
Released code (122.75 KB, patch)
2006-01-17 13:44 EST, David Audel CLA
no flags Details | Diff
Released code (50.04 KB, patch)
2006-02-08 07:50 EST, David Audel CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2003-08-29 10:10:38 EDT
20030829

Altough the AST is designed to only guarantee completness on compilable source
Quick fixes and other code generation functionality gain a lot when the AST is
robust against some smaller errors, similar to code assist.

The following cases would nice if handled better:

	private void foo(String c) {
		fo(null,);
		c.toString();
	}
The comma in the method invocation blocks the generation of all other statements.

	private void foo(String c) {
		fo(null)
		c.toString();
	}
The missing semicolon also prevents all other statements to be in the AST.
Comment 1 Olivier Thomann CLA 2003-09-02 12:34:40 EDT
The parser jumps over methods that contain syntax error in order to be able to
generate statements for other methods. We don't have any recovery at the
statement level for now.
Comment 2 Olivier Thomann CLA 2003-10-16 07:53:39 EDT
*** Bug 44983 has been marked as a duplicate of this bug. ***
Comment 3 Philipe Mulet CLA 2004-05-13 07:53:47 EDT
Post 3.0
Comment 4 Philipe Mulet CLA 2004-07-22 12:06:13 EDT
reopening
Comment 5 Philipe Mulet CLA 2004-07-22 12:07:18 EDT
*** Bug 53243 has been marked as a duplicate of this bug. ***
Comment 6 Philipe Mulet CLA 2004-11-09 12:25:06 EST
This is an optional item for M4.
We should investigate some statement recovery.
Comment 7 Philipe Mulet CLA 2005-03-17 03:39:14 EST
Deferring.
Comment 8 Philipe Mulet CLA 2005-03-17 03:40:02 EST
David - pls document where you ended up and attach patch.
Comment 9 Philipe Mulet CLA 2005-04-07 08:01:43 EDT
Deferring post 3.1
Comment 10 David Audel CLA 2005-05-02 09:06:54 EDT
*** Bug 93141 has been marked as a duplicate of this bug. ***
Comment 11 David Audel CLA 2005-06-01 09:01:05 EDT
*** Bug 97820 has been marked as a duplicate of this bug. ***
Comment 12 Philipe Mulet CLA 2005-09-22 10:45:29 EDT
Reopening for consideration within 3.2 cycle.
Comment 13 Dirk Baeumer CLA 2005-09-22 10:54:46 EDT
JDT/UI scenarios:

- statement recovery
  
  foo() ==> foo();

- method invocation recovery

  foo(10, ==> foo(10) or foo(10, null, ...) where null is the default value for
  the missing parameters.

  Having this recovery allows selecting 10 and extract it to a local or method
  without completing the invocation.

- anonymous class recovery

Runnable runnable= new Runnable() {
   ru<Ctrl+Space>
}

  should still return node and binding for anonymous class
Comment 14 Dirk Baeumer CLA 2005-09-22 10:54:47 EDT
*** Bug 110322 has been marked as a duplicate of this bug. ***
Comment 15 Markus Keller CLA 2005-09-28 11:55:18 EDT
Other cases where I often miss an AST:

- Expression that is not a valid ExpressionStatement, e.g.:
            array[1];
            "Hello" + "World";
            fField;
Especially for the first two examples, it would be nice to have an AST in order
to apply the "Assign to new local variable" quick fix.

- missing conditions:
        if () { // -> recover to "if (false)"
            System.out.println("hello");
        }
        for() { -> recover to "for(;;)"
            System.out.println(i + " is " + array[i]);
        }
Comment 16 Martin Aeschlimann CLA 2005-11-23 09:15:50 EST
A scenario that I often have:
I started typing something, but realize that I have to do something else e.g. a line above. The code fragment (here 'file.get') is still around and prevents that errors are shown on the line above (getCanonicalFile would return a 'File') and and that I can use a quick fix that would fix the variable type.

As a result I first have to fix the fragment (remove it or complete it) before I can use the quick fix.

Here it would make sense to hava heuristic that treat 'file.get' as a complete statement, for example because it is on its own line.

public void foo() {
  String file= new File("x.txt").getCanonicalFile();
  file.get

  Map map= new HashMap();
  return map;
}
Comment 17 Ismael Juma CLA 2005-11-23 09:32:28 EST
The scenario on comment #16 is also very common for me.
Comment 18 David Audel CLA 2005-12-16 12:15:10 EST
Created attachment 31884 [details]
statements recovery

I have an implementation of statements recovery. This implementation isn't complete and i must test it more deeply to be sure of his reliability.



The current steps to parse a file are
1 - parse top level structure (diet parse)
2 - if there was a syntax error perform a recovery parse of the top level structure
3 - for each methods/initiliazers body (method parse):
    A - parse statements of the method/initiliazer
    B - if there was a syntax error in the method/initiliazer stop to parse this method

The new steps to parse a file are
1 - parse top level structure (diet parse)
2 - if there is a syntax error perform a recovery parse of the top level structure
3 - for each methods/initiliazers body (method parse):
    A - parse statements of the method/initiliazer
    B - if there was a syntax error in the method/initiliazer perform a recovery parse similar to the top level structure recovery parse
    C - if there was a recovery parse perform a statments recovery parse

The new steps are B and C.
B recover only local types and members of local types.
C recover only statements and not local types and members of local types.
This two step are seperate to be able to keep the same type structure as the diet parse

B use the same mechanism as our current recovery
C use the result of the syntax diagnosis to add or remove suggested tokens.
The suggested non terminals symbols are replaced by a predefined set of terminal tokens (eg Block is replaced by '{' '}').
Comment 19 David Audel CLA 2006-01-17 13:43:49 EST
I released the statement recovery algorythm in HEAD. Currently this new behavior is disabled, so the option JavaCore.COMPILER_STATEMENTS_RECOVERY must be set to ENABLED to use it (It's a temporary option).

As said in comment 18, the statement recovery use the result of the syntax error diagnosis and apply the suggestion of the diagnosis.
In the released code all the diagnosis are not applied because the mapping between suggested non-terminal symbols and a list of terminal symbols is not complete.

I tested performances of the Parser. It is near 0% slower than before when there is no syntax error. And it is at worst 300% slower when there is a lot of errors and statements.
it is 300ms to parse a type with one method which contain 20000 simple statements and 100 syntax errors instead of 100ms before.
Comment 20 David Audel CLA 2006-01-17 13:44:19 EST
Created attachment 33154 [details]
Released code
Comment 21 David Audel CLA 2006-01-18 07:19:27 EST
I added a new API for DOM ASTNode flag to identify recovered nodes: ASTNode.RECOVERED
It is a flag constant indicating that a node or a part of this node is recovered from source that contains a syntax error detected in the vicinity.

class X extends Y{
  void foo() {
    bar()
  }
}

In this example the following nodes will be flag as RECOVERED
MethodInvocation:bar
MethodDeclaration:foo
TypeDeclaration:X
CompilationUnit
Comment 22 Martin Aeschlimann CLA 2006-01-18 08:12:26 EST
In the ASTRewrite we need to know which nodes are recovered, as rewriting these nodes might be special.
I would therefore prefer the flags only on the really recovered nodes. To ask a parent if it cotains some recovered node is interesting, but it should be possible to know if its the node that is recovered, or just one of the children.

It also seems to me that we have to be careful not to break clients that assume that every node in the AST is complete. What about having a flag in the ASTParser to enable recovered nodes? That's also important performance wise. Some clients are maybe only interested in correct ASTs, and as they would never use a recovered AST, they don't want to pay any performance penalty.
Comment 23 David Audel CLA 2006-01-18 10:42:29 EST
reply to comment 22

The reason why i flag as RECOVERED a node and his enclosing nodes is because when a a part of source code is syntaxically incorrect all enclosing code parts are also syntaxically incorrect. So if a part of a node is recovered, the node itself is recovered.
But i understand your need to recognize the deeper recovered part of a node. I investigate how to improve this flag.
I could:
   - change nothing =)
or - add a different flag for enclosing nodes and "really" recovered node.
or - flag only the "really" recovered node and let the client visit the DOM AST to recognize enclosing recovered nodes.


Client already have to manipulate 'incorrect' node because we give an ast even if there is a syntax error in a type declaration or a method declaration. By 'incorrect' i mean a node that does not really represent the content of the source file. Another point is that all nodes created by statement recovery are complete but they can have an empty string as name. And your are right, performance could be a problem.
Is your suggestion to add ASTParser.setStatementRecovery(boolean enabled) and disable statement recovery by default? if i do that clients won't have recovered statements for free but i am agree that we will avoid to potentially harm a client.
Comment 24 David Audel CLA 2006-01-20 07:09:43 EST
I added the API method ASTParser#setStatementsRecovery(boolean enabled) to enable statements recovery in ASTParser. It's disabled by default.
Comment 25 David Audel CLA 2006-01-23 05:07:38 EST
I added the API method ICompilationUnit#reconcile(int astLevel, boolean forceProblemDetection, boolean enableStatementsRecovery, WorkingCopyOwner owner, IProgressMonitor monitor) to enable/disable statements recovery during reconcile.
Comment 26 Dani Megert CLA 2006-01-23 05:15:40 EST
see also bug 124662.
Comment 27 David Audel CLA 2006-02-08 07:50:08 EST
I changed the way to flag nodes as RECOVERED. 

Now only nodes which contains added/removed/replaced tokens are flagged. A parent of these kind of node isn't flagged.

eg.

class X extends Y{
  void foo() {
    bar() // thw missing part is ";"
  }
}

Only the ExpressionStatement node is flagged.

Sometimes our heuristic can't recognize the really recovered node, so in this case all potentially recovered nodes are flagged.

eg.

class X extends Y{
  void foo() {
    for(int i // the missing part is ";;);"
  }
}

Only the last EmptyStatement and the ForStatement should be flagged but our heuristic fail to find them so Block, ForStatement, VariableDeclarationStatement, VariableDeclarationFragment and EmptyStatement are flagged. I will try to improve our heuristic.

This behavior is caused by the fact that only our Scanner know what tokens are recovered not our Parser. Make our Parser aware of these recovered tokens would be too heavy (in term of changes and perfomance).
Comment 28 David Audel CLA 2006-02-08 07:50:48 EST
Created attachment 34338 [details]
Released code
Comment 29 David Audel CLA 2006-02-08 09:15:53 EST
I think we can consider this bug as fixed.
If problems are found, they should be write in their own bug report.

Close as FIXED
Comment 30 David Audel CLA 2006-02-08 11:27:39 EST
*** Bug 120870 has been marked as a duplicate of this bug. ***
Comment 31 Frederic Fusier CLA 2006-02-15 12:04:09 EST
Verified for 3.2 M5 using build I20060215-0010.

I've reassigned 3 JDT/UI bugs as they were wrongly set as duplicate of this one although they were depending on:
  - bug 44964
  - bug 53243
  - bug 93141

I've also noticed a sample where robust AST could be improved:
  - comment 15 - missing conditions:
    the ForStatement is empty although it should not