Bug 457758 - Investigate making transient some EReferences in the DTable model
Summary: Investigate making transient some EReferences in the DTable model
Status: NEW
Alias: None
Product: Sirius
Classification: Modeling
Component: Table (show other bugs)
Version: 2.0.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance, triaged
Depends on:
Blocks:
 
Reported: 2015-01-16 16:07 EST by Cedric Brun CLA
Modified: 2018-03-02 04:52 EST (History)
2 users (show)

See Also:


Attachments
yourkit analyse (22.82 KB, image/png)
2018-03-02 04:52 EST, Pierre Guilet CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cedric Brun CLA 2015-01-16 16:07:39 EST
The DTable model uses a lot of disk space in a .aird file when it is serialized. A lot being quite easily 10Mbs for a fairly big table. Furthermore as column/lines are added, columnxlines cells might be added too, it is not a linear growth (even when the table is sparse, many tables don't have so many empty cells when their content is computed) 

This ticket is not about rethinking the table model with this in mind, but more about seeing if we can fairly easily bring some improvements in the existing one.

a DCell has no information which is "user provided". It's not like a diagram node for instance where the position, size and other information only lives in the representation model.

a DLine/DColumn on the other has visibility flags, or might have user defined width.

The DCell then looks like a nice candidate for being transient all the time. 
This is quite easy from a development perspective and would lead to the following changes :
- when opening a table in the editor it has to be refreshed anyway, even if the "refresh on opening/automatic refresh" is disabled or all the cells will be missing. As they would be transient it would not be a problem 
-cross-referencing results searching for DCells will depend on the fact that a table has been refreshed or not. A DCell representing an intersection might not be found as displayed by the table until the table actually gets opened. We could mitigate that by, for instance, maintainning a list of "displayedElements" instead but would loose part of the gain we will get by making the cells transient. The only place I could think of which might need this information is regarding the representation decorators and navigation menus. We should probably check if its really using the DCells
- clients code using the part of the model might get surprised to find that there is no DCell in the table because they did not call the refresh.

I tried to do just the metamodel change: 
As expected the tool behavior feels identical when you're in "refresh when opening" mode. Global refresh is forced for all the table manipulations anyway.

A table with representing about 5000 EClasses and their content, just their name (only one column then),that makes 20655 DCells.  

the .aird file containing only this table before the change:
26.6Mb (and that's with RGBValues being a datatype)

After the metamodel change of making transient all the EReferences targetting DCell : 15.5Mb file size on the disk.
Comment 1 Eclipse Genie CLA 2015-03-03 09:34:36 EST
New Gerrit change created: https://git.eclipse.org/r/43080
Comment 2 Esteban DUGUEPEROUX CLA 2015-03-03 09:50:06 EST
This is a list of reference to DCell :

- DLine.cells : containment references
- DLine.orderedCells : already transient
- DColumn.cells
- DColumn.orderedCells : already transient

DLine.cells and DColumn.cells have a EOpposite DCell.line/column which must also be transient.
Comment 3 Pierre-Charles David CLA 2015-03-23 12:22:09 EDT
Note that some client code currently relies on being able to "parse" the aird outside of the context of a Sirius session (or even EMF) to identify the cells style & content. I'm not sure if/how we could make this change optional to get the benefits for the vast majority of users who don't care without breaking this corner case.
Comment 4 Pierre-Charles David CLA 2015-06-23 10:27:54 EDT
Moving to 4.0. We may want to reformulate the problem as: "how to reduce the size of the serialization of a typical DTable?". Making the whole content transient is one ("all or nothing") approach, but has some drawbacks (the cost of "refresh on open", the fact that the information which used to be in the aird isn't anymore).

Another approach could be to reorganize the metamodel in a less "naive" (and less natural) way, and/or to introduce appropriate EDataTypes to reduce the overhead of the XMI encoding.
Comment 5 Pierre-Charles David CLA 2015-12-15 04:11:32 EST
Moving out of the 4.0 scope for now, along with all the other issues which were there "by default". This does not mean some of these will not be re-integrated at some point, but for now these issues are not part of the roadmap for 4.0.

If you feel strongly about this removal from 4.0 and/or are ready to sponsor the corresponding work, feel free to comment.
Comment 6 Pierre Guilet CLA 2018-03-02 04:51:13 EST
I realized some benchmarks to see the interest of these patch.

I have a table with 20,000 lines and 1 column.

The session opening takes:


22s with the patch
36s without the patch

The table opening takes:

66s with the patch
61s without the patch

The sessions saving takes:
25s with the patch
56s without the patch

The refresh done after modification of the semantic model outside of the Sirius session takes:

60s with the patch
60s without the patch


The times seems a little bit long for the refresh.
After a yourkit analyse without filters joint to this ticket we see some SIrius method as hot spot. But in fact it is not what takes time. Indeed if we comment this method the refresh time is the same.

I suspect that it is the table cells UI refresh that takes time and/or the semantic resource serialisation but it does not appears on yourkit for unknown reasons
Comment 7 Pierre Guilet CLA 2018-03-02 04:52:32 EST
Created attachment 272972 [details]
yourkit analyse