Bug 509519 - [PERFO] Seems like Tree content creation can get into a n^n (or similar) complexity in some cases
Summary: [PERFO] Seems like Tree content creation can get into a n^n (or similar) comp...
Status: REOPENED
Alias: None
Product: Sirius
Classification: Modeling
Component: Tree (show other bugs)
Version: 4.1.0   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: Next   Edit
Assignee: Pierre-Charles David CLA
QA Contact: Florian Barbin CLA
URL:
Whiteboard: backport
Keywords: performance, triaged
Depends on: 530557
Blocks: 510519 510507 510508 510520
  Show dependency tree
 
Reported: 2016-12-20 06:15 EST by Cedric Brun CLA
Modified: 2018-09-07 10:38 EDT (History)
2 users (show)

See Also:


Attachments
projects to import to reproduce the problem (9.55 KB, application/zip)
2016-12-20 06:15 EST, Cedric Brun CLA
no flags Details
Updated sample project to reproduce the problem (13.09 KB, application/zip)
2018-02-09 10:54 EST, Pierre-Charles David CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cedric Brun CLA 2016-12-20 06:15:30 EST
Created attachment 265972 [details]
projects to import to reproduce the problem

- Import the attached projects (there is a Viewpoint Specification Projet and a Modeling project with a single empty Ecore)
- Make sure the "Perfo Tree" viewpoint is enabled and create a new Tree representation on the EPackage
- Right click in the tree on "Create Many EClasses"
-> the code which is launched is in perfo.tree.design.Services.createMany(EPackage) and basically create 20K new EClasses each having 10 attributes, this should end up being a tree with 20K*10=>200K tree items.
-> the UI thread is blocked and CPU goes crazy for a fairly long time, I gave up after 5 minutes

No doubt there is n^n algorithm working its magic here. 

First hint: after a bit of digging, commenting out: 

willBeExpandedOnSelection(RefreshPlan) to always return false makes the whole tree to be created in about 10 seconds
Comment 1 Pierre-Charles David CLA 2016-12-30 09:38:58 EST
I've started to analyze this further. Results so far:
* CreatedTreeItem.willBeExpandedOnSelection() will invoke InternalTransaction.getChangeDescription().getObjectsToDetach() once for every CreatedTreeItem, even though the result is global to the whole transaction and its computation known to be relatively costly (see https://bugs.eclipse.org/bugs/show_bug.cgi?id=460206#c4). Its not a complex algorithm by itself, but it re-creates copies of the relevant collections on each invocation, so thousands of invocations which each will copy thousands of items start to accumulate. I have a patch to cache this result at the level of the refresh's GlobalContext so it is computed only once.
* TreeUIUpdater.analyseExpansionStateOfCreatedChildren() will be invoked on every newly created element, and is recursive. I'm not sure *why* I saw it appear as much in the example, which is not very deep in terms of containment, but it looks like we can easily avoid some uneeded recursive calls on already handled elements by using the result of toExpands.add(dTreeItem). I'll post the patch too, but this one is more risky and would require more analysis to make sure this does not change the behavior.

With the two patches above, and changing the action to create "only" 5000 classes with 5 attritubes, I was able to get a YourKit trace in reasonable time (around 1min I believe). The overall picture, focusing on the CreateToolItemAction.run() method:
* 58% of the total time is spent in pre-commit. As expected ~99% of this is in DTreeRefresh.refresh(). The good news is that most of the time (~80%) is spent in expression evaluation, probably the "aql:self.eAllContents()->select(c | c.abstract)" semanticCandidateExpression. 1) it's out of scope here, and 2) using eAllContents() instead of eClassifiers creates a lot of churn building a collections with EClasses and their EAttributes, even though we don't care at all about EAttributes here.
* 37% of the total time is spent in post-commit. Most of it is spent in SelectDRepresentationElementsListener.resourceSetChanged(ResourceSetChangeEvent). About 2/3 of that is spent in EcoreUtil.isAncestor() called from isViewWithNewSemanticTarget(). I'll have to analyze further if/how this can be optimized and report here my findings. Note that this part is not tree-specific, so any improvement here will benefit other dialects.
Comment 2 Pierre-Charles David CLA 2016-12-30 10:04:32 EST
(In reply to Pierre-Charles David from comment #1)
> * 58% of the total time is spent in pre-commit. As expected ~99% of this is
> in DTreeRefresh.refresh(). The good news is that most of the time (~80%) is
> spent in expression evaluation, probably the
> "aql:self.eAllContents()->select(c | c.abstract)"
> semanticCandidateExpression. 1) it's out of scope here, and 2) using
> eAllContents() instead of eClassifiers creates a lot of churn building a
> collections with EClasses and their EAttributes, even though we don't care
> at all about EAttributes here.

Just to confirm this hypothesis: on a scaled-down version of the example (5K EClasses × 5 EAttributes):
* aql:self.eAllContents()->select(c | c.abstract): 2 943ms
* aql:self.eClassifiers->select(c | c.abstract): 4ms

I'll be using the optimized version for the rest of the analysis to avoid polluting the measures by unrelated overheads.
Comment 3 Eclipse Genie CLA 2017-01-02 05:50:39 EST
New Gerrit change created: https://git.eclipse.org/r/87865
Comment 4 Eclipse Genie CLA 2017-01-02 05:50:41 EST
New Gerrit change created: https://git.eclipse.org/r/87864
Comment 5 Eclipse Genie CLA 2017-01-02 05:50:43 EST
New Gerrit change created: https://git.eclipse.org/r/87866
Comment 6 Pierre-Charles David CLA 2017-01-02 05:58:14 EST
I've pushed 3 separate patches on Gerrit. Here are the performance improvements of each.

The baseline is current master (0448c14e4), using the attached sample project with 2 modifications:
* Semantic Candidate Expression changed to: aql:self.eClassifiers->select(c | c.abstract)
* Model size scaled down to 5k EClass with 5 EAttributes each.

Using this, current master takes ~110s (1min 50s), measured manually from the time of the action invocation to the time where all the new elements are visible in the tree *and* the new root elements (the EClasses) are selected.

* With https://git.eclipse.org/r/#/c/87864/ "Make sure getObjectsToDetach() is called only once": 38s
* With https://git.eclipse.org/r/#/c/87865/ "Avoid recursive calls for already handled elements in TreeUIUpdater": 31s
* With https://git.eclipse.org/r/#/c/87866/ "Invert costly test to improve performance in common cases": 26s

So overall a speedup of 75% in this case. 26s in the last case is still a lot. I'll make a new YourKit analysis of where time goes once all patches are applied.
Comment 7 Pierre-Charles David CLA 2017-01-02 06:00:26 EST
I'm not sure if this has any significant impact, but note that the measure above were made with "Link with Editor" disabled in the "Model Explorer" view.
Comment 8 Pierre-Charles David CLA 2017-01-02 07:39:39 EST
With all the optimizations mentioned above enabled, the CreateToolImtemAction.run() time is spent like this:
* 39% in the actual TreeDialectServices.refresh(), which happens in pre-commit. It looks like the way we handle refreshStyle() could be optimized for newly created elements: we first create and attach a blank style, and then refresh it. Because it's blank, the refresh will touch many of its features, and because it's already attached to the model, we pay the cost of the ChangeRecorder recording all of these changes. Not sure how much this would help overall.
* 27% in SelectDRepresentationElementsListener.analyseNotifications(), most of which (18% of the total CreateToolImtemAction.run() time) is spent in EcoreUtil.isAncestor().

However CreateToolImtemAction.run() now only takes 26% of the total time of the operation (I'm ignoring the time spent by the UI in Thread.sleep()).
* 24% is spent in AbstractTreeViewer.creatChildren(). This is JFace code called by us from TreeUIIpdaterRunnable.run() and there's probably not much we can do to make this faster.
* 15% is spent in JFace code for the Model Content view (calls end up in SiriusCommonContentProvider).

Calls to EcoreUtil.isAncestor() in SelectDRepresentationElementsListener.analyseNotifications() are now the single largest time wasters that we have some control on, but they only represent 6% of the total time of the operation. The picture may be different on other OSes where SWT/JFace are faster (I'm under Linux), but in the end creating dozens of thousands of SWT tree items will probably always take significant time. Going significantly faster would probably mean not creating/expanding/selecting the tree items corresponding to the newly created elements.
Comment 10 Pierre-Charles David CLA 2017-05-21 14:50:23 EDT
Closing, even though not all candidate patches have been merged. The one that has been already significantly improves the situation.
Comment 11 Florian Barbin CLA 2017-05-22 09:01:34 EDT
The initial scenario is still KO with the 5.0.0.201705191401 Sirius version.
After having performed the "Create Many EClasses" action, the UI Thread is blocked. I waited more than 5 minutes and I finally killed the process.
Comment 12 Pierre-Charles David CLA 2017-08-31 10:00:53 EDT
https://bugs.eclipse.org/bugs/show_bug.cgi?id=493241 has a patch that may be interesting for this. Or maybe not; I'm linking it for reference.
Comment 13 Pierre-Charles David CLA 2018-02-09 10:54:02 EST
Created attachment 272613 [details]
Updated sample project to reproduce the problem