Bug 535879

Summary: [performance] Loading a large tasklist.xml takes a long time
Product: z_Archived Reporter: Alexei Trebounskikh <alexei.trebounskikh>
Component: MylynAssignee: Alexei Trebounskikh <alexei.trebounskikh>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: P1    
Version: 3.23   
Target Milestone: 3.24   
Hardware: All   
OS: All   
See Also: https://git.eclipse.org/r/124568
https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.tasks.git/commit/?id=3e12a215f3e68d8cf11ccf84196afb42178f2cfa
https://git.eclipse.org/r/124753
https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.tasks.git/commit/?id=9e86a1bfe0b4aca074e4c9cc30bea67209a1ceab
https://git.eclipse.org/r/124919
https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.tasks.git/commit/?id=501815298751560cc2776f061f97cedbea102322
Whiteboard:

Description Alexei Trebounskikh CLA 2018-06-13 20:45:49 EDT
Loading a large (250MB+, 188K tasks) taskList.xml takes a very long time (~6 minutes), even when unmatched task list is fairly small in comparison (~9K tasks).

Loading gets progressively slower with larger data sets (50+ minutes)
Comment 1 Eclipse Genie CLA 2018-06-14 18:37:30 EDT
New Gerrit change created: https://git.eclipse.org/r/124568
Comment 2 Alexei Trebounskikh CLA 2018-06-14 18:46:23 EDT
Pushed a review. The loading time reduced from ~6 minutes down to 13 seconds
Comment 3 Alexei Trebounskikh CLA 2018-06-14 21:50:24 EDT
Background:

I profiled the loading of a large (~250MB, 188068 tasks, 496 query results, 8011 unmatched tasks) task list, and it turned out that nearly 95% (!) of the 6 minutes it took Mylyn to load it was spent inside AbstractTaskContainer.internalRemoveChild method. 

Stepping through SaxTaskBuilder in the debugger, turned out the following was happening:

1. Tasks were built and added to TaskList via TaskList.addTask(ITask), thus depositing them into UnmatchedTaskContainer
2. Task categories and queries were added to task list
3. Information about task-container relationships (e.g. subtasks, tasks belonging to queries) was stored inside SaxTaskBuilder. 
3. After the entire task list was loaded, task relationships were processed and tasks added to appropriate containers within task list by TaskList.addTask(ITask, AbstractTaskContainer)

The problem was, in this case TaskList.addTask(ITask, AbstractTaskContainer) needed to remove the task from the container it already belonged, UnmatchedTaskContainer - for every task that belonged to a query, or another task (usually, the majority of tasks). Removal from a container with many children is incredibly expensive, as it uses CopyOnWriteArrayList. 

This fix aims to significantly reduce the number of internalRemoveChild calls during the loading of XML, by effectively reversing the order in which tasks are added to the task list: tasks are added to containers first, and the rest of the tasks are added to unmatched container at the end - without altering the behavior of TaskList at runtime.

New class LazyTransferList implements ITransferList and delegates most calls to original TaskList, with exception of addTask(task) and addTask(task, container) methods. The former now only stores the task internally, while the latter delegates to TaskList.addTask(task, container) after making sure the container (if it is also a Task) is added to TaskList. LazyTransferList's commit() method called by SaxTaskBuilder at the end calls TaskList.addTask(task) for any unmatched tasks. 

NOTE: As task-subtask relationship is processed by SaxTaskBuilder first, tasks that have subtasks will still be added to unmatched container first, and will be removed later if they belong to other containers (e.g. query or task category) i.e. current behavior won't change for such tasks.
Comment 4 Sam Davis CLA 2018-06-15 13:00:50 EDT
Thanks for the very clear and detailed explanation!
Comment 6 Eclipse Genie CLA 2018-06-19 14:39:23 EDT
New Gerrit change created: https://git.eclipse.org/r/124753
Comment 8 Sam Davis CLA 2018-06-19 14:49:58 EDT
Thanks very much for the contribution, Alexei!
Comment 9 Eclipse Genie CLA 2018-06-22 15:53:32 EDT
New Gerrit change created: https://git.eclipse.org/r/124919