Bug 456324 - Reduce the scope of the cross-references maintained by Sirius
Summary: Reduce the scope of the cross-references maintained by Sirius
Status: NEW
Alias: None
Product: Sirius
Classification: Modeling
Component: Core (show other bugs)
Version: 1.0.0   Edit
Hardware: All All
: P2 normal (vote)
Target Milestone: ---   Edit
Assignee: Project inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: triaged
Depends on:
Blocks: 456350
  Show dependency tree
 
Reported: 2014-12-30 05:06 EST by Pierre-Charles David CLA
Modified: 2015-12-15 04:11 EST (History)
3 users (show)

See Also:


Attachments
Traces of inverse reference requests during the Sirius automated tests (25.54 KB, text/x-log)
2015-02-23 11:59 EST, Pierre-Charles David CLA
no flags Details
Traces containing classes of keys added into the inverse reference map during the Sirius automated tests (9.34 KB, text/x-log)
2015-03-19 05:14 EDT, Mickael LANOE CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre-Charles David CLA 2014-12-30 05:06:13 EST
Currently Sirius installs an ECrossReferencesAdapter on all the models in the session (technically a LazyCrossReferenceAdapter but it inherits from ECrossReferencesAdapter). This is costly in several ways:
- ECrossReferencesAdapter installs itself on whole hierarchies of containment, which prevents partial loading even if the storage backend supports it.
- It maintains potentially huge map(s) of EObjects in memory.
- It is yet another adapter installed on each EObject, which is notified of every model change.

We can certainly do better by analyzing the actual requirements and providing an optimized implementation for these and only these. We may need to change the contract to allow for more efficient operation in the common cases, with explicit opt-in needed for more complex (and costly) services.
Comment 1 Pierre-Charles David CLA 2014-12-30 11:39:29 EST
A first observation: at the highest level, what we actually need is: a fast (O(1)) way to answer the question: "give me all the elements in the session (or another smaller scope) who reference this specific EObject".

A second observation is that in a Sirius session we can distinguish 3 kinds of resources:
1. Representation resources (aird).
2. VSM resources referenced by the representations (odesign).
3. Semantic resources (user-controlled).

We mostly control the metamodels of the first two (with a few exception: feature extensions, GMF Notation models, contributed dialects), so we should be able to specify exactly where this kind of information is needed, and either:
* add EOpposites in the metamodels to make these inverse references navigable and maintained directly by EMF. This has often bad consequences in terms of modularity so it should be a last resort.
* customize the cross-reference adapter to only maintain inverse references for the cases which are actually needed.

In the case of VSMs, maybe a further optimisation could be done by noting that they are immutable when loaded in a session. They only "change" when a new version is re-loaded from disk, so maybe a post-load computation is possible without the cost of adding adapters to every element.

In the case of semantic models, we need full coverage by default as we know nothing about them and need the information to ensure consistency. However we should provide APIs/extensions points to allow client code to provide custom implementation optimized for their case.
Comment 2 Pierre-Charles David CLA 2015-01-12 08:09:41 EST
See https://git.eclipse.org/r/39390 for a first draft of a debug action to analyze and report what kinds of inter-resource references exist in a session.
Comment 3 Pierre-Charles David CLA 2015-02-23 03:29:34 EST
Running the full JUnit test suite (which may not cover all or our code) with an instrumented version of the session's ECrossReferenceAdater give these numbers:
* calls concern 85 distinct types (EClass) of arguments (i.e. the types of elements "x" when asking "who points to x"), only considering arguments from Sirius's own metamodels and ignoring calls on semantic elements.
* the results of these calls concern 103 distinct EReferences from the Sirius metamodels (again, ignoring the EReferences from the semantic metamodels).
* a quick grep on our ecores gives an estimation of a little more than 500 EReferences defined in our metamodels.

More analysis is required, but it looks like we could avoid maintaining inverse references for a large part of the references in the models from Sirius itself. We'll also need to do some memory usage measures to evaluate how much this would allow us to gain.
Comment 4 Pierre-Charles David CLA 2015-02-23 11:59:10 EST
Created attachment 251030 [details]
Traces of inverse reference requests during the Sirius automated tests

The attached log was produced by instrumenting SessionLazyCrossReferencer.getInverseReferences(EObject object, boolean resolve) and then executing the complete Sirius automated test suites. The raw log show 949648 calls to the method, so the attached file is a reduced version produced by:

  cat *.log | grep "^### " | sort -u | grep -Ev '\((uml|ecore|component|migrationmodeler|interactions|docbook)'

The grep part ignores the call to getInverseReferences() where the "eobject" parameter is from one of the metamodels used as semantic models by our tests. Here we are just concerned with request for inverse references which point *to* elements from the VSMs and/or aird.

The resulting summary shows 246 distinct signatures, for example:

  ### getInverseReferences(description.TableCreationDescription): description.RepresentationElementMapping::detailDescriptions tool.ToolSection::ownedTools

Which means there was a request for references towards an instance of TableCreationDescription, which return references to it from two distinct EReferenced.

Further analysis:

  % cat xrefs_traces.log | cut -d':' -f1 | sort -u | wc -l
  69

So it looks like only 69 distinct types (from Sirius, in the context of our tests) are ever subject to a getInverseReferences() query.

Further,

  % cat xrefs_traces.log | cut -d':' -f2- | sed -e 's/ /\n/g' | sort -u | wc -l
  58

Shows that there are only 58 distinct kinds of references pointing to these in the results.

Obviously this is just a first guess based on what is executed in our tests, but in this particular scope it looks like we could get away with keeping track of only these ~60 references (in the VSMs and aird) and produce the exact same results.
Comment 5 Mickael LANOE CLA 2015-03-19 05:14:03 EDT
Created attachment 251731 [details]
Traces containing classes of keys added into the inverse reference map during the Sirius automated tests

The file "inverse-map-key-classes.log" contains classes of keys added into the inverse reference map during the Sirius automated tests.

There are thow parts into the file:

- The first part contains unused classes.
- The second part contains used classes tagged by the call to methods getNonNavigableInverseReferences(EObject, boolean) and getInverseReferences(EObject, boolean).

Some classes from "description" packages of Sirius meta-models are not used and may not be added into the inverse reference map.
Comment 6 Eclipse Genie CLA 2015-03-19 11:33:35 EDT
New Gerrit change created: https://git.eclipse.org/r/44185
Comment 8 Cedric Brun CLA 2015-03-20 04:36:10 EDT
> * customize the cross-reference adapter to only maintain inverse references
> for the cases which are actually needed.
> 

What about a dedicated EAnnotation on the EReference to express it shoul not be cross-referenced ? That way it could be used for Semantic Models too without having to introduce an explicit coupling to Sirius. BTW I could tool that support directly in EcoreTools quite easily, all I need is a constant key.
Comment 9 Eclipse Genie CLA 2015-03-20 04:59:33 EDT
New Gerrit change created: https://git.eclipse.org/r/44227
Comment 10 Pierre-Charles David CLA 2015-03-20 05:10:16 EDT
(In reply to Cedric Brun from comment #8)
> > * customize the cross-reference adapter to only maintain inverse references
> > for the cases which are actually needed.
> > 
> 
> What about a dedicated EAnnotation on the EReference to express it shoul not
> be cross-referenced ? That way it could be used for Semantic Models too
> without having to introduce an explicit coupling to Sirius. BTW I could tool
> that support directly in EcoreTools quite easily, all I need is a constant
> key.

I'd rather have a runtime-configurable blacklist of EReferences, instead of a static hard-coded version. Probably relying on a specific session option (see bug #456326).

The current implicit contract of the semantic cross-referencer is that is cross-references everything, so even if we prove that Sirius itself does not need some EReferences to be maintained, there may be existing client code (e.g. Capella) which relies on this information to be available in their context, so we can not hard-code the fact that we remove information that existed before.

Now that is not incompatible with putting information in EAnnotations, which could be used to determine the default value of the blacklist, but it should be configurable so that clients which need inverse cross-references we don't do not have to install their own separate ECrossReferenceAdapter to get them.

So the idea would be:
* Determine which EReferences in our metamodels take significant space in the ECrossReferenceAdapter *and* are not needed by Sirius itself.
* Mark these with a new EAnnotation indicating they are not to be cross-referenced by default, and document the EAnnotation so that clients can also use it if they want.
* Provide an API to compute the default "blacklist" using this statically defined information.
* Make our ECrossReferenceAdapter configurable using such a Set<EReference> as a blacklist of EReferences to ignore.
* Add a session option so that clients can specify an explicit blacklist. If none is provided, the default value used would be computed using the static information from the API mentioned above. Otherwise client code can use that API to get what would be the default, and add/remove other stuff if they want.
Comment 12 Mickael LANOE CLA 2015-03-20 05:58:20 EDT
Here below are unused map entries (EClass) for which no code requested inverse references for this big modeling projects:

https://github.com/cbrun/org.eclipse.ecoretools.performance/tree/master/plugins/org.eclipse.emf.ecoretools.design.tests.perf/projects/reverse-sirius-10times

Unused map entries sorted by number of references:

org.eclipse.sirius.diagram.description.AdditionalLayer <- 21484 refs
org.eclipse.sirius.diagram.description.NodeMapping <- 13013 refs
org.eclipse.sirius.diagram.description.DiagramDescription <- 10743 refs
org.eclipse.sirius.diagram.description.Layer <- 10742 refs
org.eclipse.sirius.diagram.description.style.WorkspaceImageDescription <- 10739 refs
org.eclipse.sirius.diagram.description.style.BundledImageDescription <- 2262 refs
org.eclipse.sirius.diagram.description.ContainerMapping <- 696 refs
org.eclipse.sirius.diagram.description.style.FlatContainerStyleDescription <- 653 refs
org.eclipse.sirius.viewpoint.description.SemanticBasedDecoration <- 605 refs
org.eclipse.sirius.diagram.description.style.EdgeStyleDescription <- 471 refs
org.eclipse.sirius.diagram.description.EdgeMapping <- 467 refs
org.eclipse.sirius.viewpoint.description.SystemColor <- 90 refs
org.eclipse.sirius.viewpoint.description.UserFixedColor <- 24 refs
org.eclipse.sirius.diagram.description.tool.DirectEditLabel <- 15 refs
org.eclipse.sirius.diagram.description.tool.ContainerDropDescription <- 11 refs
org.eclipse.sirius.diagram.description.style.CenterLabelStyleDescription <- 6 refs
org.eclipse.sirius.diagram.description.tool.DeleteElementDescription <- 3 refs
org.eclipse.sirius.table.metamodel.table.description.ElementColumnMapping <- 3 refs
org.eclipse.sirius.table.metamodel.table.description.CrossTableDescription <- 2 refs
org.eclipse.sirius.table.metamodel.table.description.FeatureColumnMapping <- 2 refs
org.eclipse.sirius.viewpoint.description.tool.PasteDescription <- 2 refs
org.eclipse.sirius.viewpoint.description.style.LabelBorderStyleDescription <- 2 refs
org.eclipse.sirius.diagram.description.style.BeginLabelStyleDescription <- 2 refs
org.eclipse.sirius.viewpoint.description.Viewpoint <- 2 refs
org.eclipse.sirius.table.metamodel.table.description.EditionTableDescription <- 2 refs
org.eclipse.sirius.table.metamodel.table.description.IntersectionMapping <- 2 refs
org.eclipse.sirius.viewpoint.DRepresentationContainer <- 2 refs
org.eclipse.sirius.viewpoint.description.SytemColorsPalette <- 1 refs

28 map entries and 72046 references to these entries are stored and unused in the cross referencer for this big modeling project.

Warning: the trace available here https://bugs.eclipse.org/bugs/attachment.cgi?id=251731 was used to determine that an entry is used or not. 
This trace was generated from Sirius automated tests which offers a good code coverage, but does not guarantee that an entry will never be used.
Comment 13 Pierre-Charles David CLA 2015-04-20 09:13:01 EDT
More measures and analyses would be required to evaluate precisely the effective gain we can expect. We would also need to add an "compatibility mode" for existing client code which may assume that the cross-referencer from Sirius indexes everything. Moving to 3.1.
Comment 14 Pierre-Charles David CLA 2015-04-29 11:06:11 EDT
FWIW, the patch used to produce the traces attached in comment 4 is at https://git.eclipse.org/r/#/c/46679/
Comment 15 Pierre-Charles David CLA 2015-06-23 10:07:01 EDT
From the initial work on this, the benefits look less important than expected, so moving defering to later with reduced priority.
Comment 16 Pierre-Charles David CLA 2015-12-15 04:11:01 EST
Moving out of the 4.0 scope for now, along with all the other issues which were there "by default". This does not mean some of these will not be re-integrated at some point, but for now these issues are not part of the roadmap for 4.0.

If you feel strongly about this removal from 4.0 and/or are ready to sponsor the corresponding work, feel free to comment.