Bug 535879 - [performance] Loading a large tasklist.xml takes a long time
Summary: [performance] Loading a large tasklist.xml takes a long time
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: 3.23   Edit
Hardware: All All
: P1 enhancement (vote)
Target Milestone: 3.24   Edit
Assignee: Alexei Trebounskikh CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-13 20:45 EDT by Alexei Trebounskikh CLA
Modified: 2018-06-22 16:14 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexei Trebounskikh CLA 2018-06-13 20:45:49 EDT
Loading a large (250MB+, 188K tasks) taskList.xml takes a very long time (~6 minutes), even when unmatched task list is fairly small in comparison (~9K tasks).

Loading gets progressively slower with larger data sets (50+ minutes)
Comment 1 Eclipse Genie CLA 2018-06-14 18:37:30 EDT
New Gerrit change created: https://git.eclipse.org/r/124568
Comment 2 Alexei Trebounskikh CLA 2018-06-14 18:46:23 EDT
Pushed a review. The loading time reduced from ~6 minutes down to 13 seconds
Comment 3 Alexei Trebounskikh CLA 2018-06-14 21:50:24 EDT
Background:

I profiled the loading of a large (~250MB, 188068 tasks, 496 query results, 8011 unmatched tasks) task list, and it turned out that nearly 95% (!) of the 6 minutes it took Mylyn to load it was spent inside AbstractTaskContainer.internalRemoveChild method. 

Stepping through SaxTaskBuilder in the debugger, turned out the following was happening:

1. Tasks were built and added to TaskList via TaskList.addTask(ITask), thus depositing them into UnmatchedTaskContainer
2. Task categories and queries were added to task list
3. Information about task-container relationships (e.g. subtasks, tasks belonging to queries) was stored inside SaxTaskBuilder. 
3. After the entire task list was loaded, task relationships were processed and tasks added to appropriate containers within task list by TaskList.addTask(ITask, AbstractTaskContainer)

The problem was, in this case TaskList.addTask(ITask, AbstractTaskContainer) needed to remove the task from the container it already belonged, UnmatchedTaskContainer - for every task that belonged to a query, or another task (usually, the majority of tasks). Removal from a container with many children is incredibly expensive, as it uses CopyOnWriteArrayList. 

This fix aims to significantly reduce the number of internalRemoveChild calls during the loading of XML, by effectively reversing the order in which tasks are added to the task list: tasks are added to containers first, and the rest of the tasks are added to unmatched container at the end - without altering the behavior of TaskList at runtime.

New class LazyTransferList implements ITransferList and delegates most calls to original TaskList, with exception of addTask(task) and addTask(task, container) methods. The former now only stores the task internally, while the latter delegates to TaskList.addTask(task, container) after making sure the container (if it is also a Task) is added to TaskList. LazyTransferList's commit() method called by SaxTaskBuilder at the end calls TaskList.addTask(task) for any unmatched tasks. 

NOTE: As task-subtask relationship is processed by SaxTaskBuilder first, tasks that have subtasks will still be added to unmatched container first, and will be removed later if they belong to other containers (e.g. query or task category) i.e. current behavior won't change for such tasks.
Comment 4 Sam Davis CLA 2018-06-15 13:00:50 EDT
Thanks for the very clear and detailed explanation!
Comment 6 Eclipse Genie CLA 2018-06-19 14:39:23 EDT
New Gerrit change created: https://git.eclipse.org/r/124753
Comment 8 Sam Davis CLA 2018-06-19 14:49:58 EDT
Thanks very much for the contribution, Alexei!
Comment 9 Eclipse Genie CLA 2018-06-22 15:53:32 EDT
New Gerrit change created: https://git.eclipse.org/r/124919