Bug 564327 - Position handling in document is a bottle neck for large docs with many positions
Summary: Position handling in document is a bottle neck for large docs with many posit...
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 4.16   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform-Text-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-16 05:34 EDT by Sebastian Zarnekow CLA
Modified: 2020-06-16 05:34 EDT (History)
0 users

See Also:


Attachments
Test case (9.82 KB, application/octet-stream)
2020-06-16 05:34 EDT, Sebastian Zarnekow CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Zarnekow CLA 2020-06-16 05:34:06 EDT
Created attachment 283296 [details]
Test case

Large documents that a large number of positions will suffer from poor editing experience due to the position handling in the document.

The document maintains two sorted lists (by start offset, by end offset) of positions what are populated one-by-one and the lookup has basically a fall-back to a complete scan of the lists.
While these lists / the algorithm serves its purpose, it imposes also some performance penalties that can be avoided. The attached test illustrates the runtime behaviour with large documents (numbers below).

I'd like to propose an extension to the IPositionUpdater that basically shifts the entire responsibility for storing / maintaining the positions of a given category into the updater. Implementations, that guarantee non-overlapping positions (e.g. the highlighter in JDT) can exploit more efficient data structures and algorithms to update the positions on document changes.
It would also be worthwhile to have IDocument.addPositions(String, List<Position>) and IDocument.removePositions respectively to allow to trigger bulk changes and avoid the unnecessary costs of repetitive insertion sorting into the positions lists.

I'm willing to work on this and provide a Gerrit, if you consider this interesting.

The promised numbers to illustrate the urgency of this.

#of positions|category|action
50000|highlighting|Add (ms): 31
50000|highlighting|Replace (ms): 9
50000|highlighting|Remove (ms): 256
290164|highlighting|Add (ms): 70
290164|highlighting|Replace (ms): 7
290164|highlighting|Remove (ms): 11497
50000|__dflt_position_category|Add (ms): 20
50000|__dflt_position_category|Replace (ms): 102
50000|__dflt_position_category|Remove (ms): 518
290164|__dflt_position_category|Add (ms): 58
290164|__dflt_position_category|Replace (ms): 4009
290164|__dflt_position_category|Remove (ms): 15886
50000|highlighting|Add (ms): 13
50000|highlighting|Replace (ms): 5
50000|highlighting|Re-Add (ms): 10
290164|highlighting|Add (ms): 100
290164|highlighting|Replace (ms): 3
290164|highlighting|Re-Add (ms): 32

Please note that the position update is triggered synchronously with the document event on the UI thread.
- Add is a loop that adds # positions.
- Replace is a document change that deletes 50% of the document content.
- Remove is a loop over all positions to remove the deleted positions from the document.
- Re-Add: same effect as "Remove", but remove the entire category and re-add only the positions, that are not deleted. I put this here to show the potential that even the most naive, custom position management has on the runtime.