Bug 445802 - [Progress] [Performance] Heavy progress reporting degrades performance
Summary: [Progress] [Performance] Heavy progress reporting degrades performance
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 4.4   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: 4.7 M2   Edit
Assignee: Mikaël Barbero CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
: 489779 497529 (view as bug list)
Depends on: 475747
Blocks: 498042 501434 506562
  Show dependency tree
 
Reported: 2014-10-02 12:12 EDT by Sebastian Zarnekow CLA
Modified: 2016-10-26 11:18 EDT (History)
11 users (show)

See Also:


Attachments
Spreadsheet comparing the performance of various patches (13.21 KB, application/vnd.oasis.opendocument.spreadsheet)
2016-08-25 02:49 EDT, Stefan Xenos CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sebastian Zarnekow CLA 2014-10-02 12:12:54 EDT
This is a follow up for bug 438648.

In Xtext, we used to report every file name to the progress monitor that is passed into the Eclipse builder. This lead to performance problems because our build started to wait for the UI to be upgraded.

What we did to overcome the issue can be observed in this file:
https://git.eclipse.org/r/#/c/30041/6/plugins/org.eclipse.xtext.builder/src/org/eclipse/xtext/builder/builderState/AbstractBuilderState.java

Instead of reporting all built files, we provide (poor) feedback by introduces the arbitrary notion of chunks. Only report every 20th or 50th step.

I created a small isolated project on Github that illustrates the impact of heavy progress reporting:
https://github.com/szarnekow/monitor-demo

Of course that one is artifical since the it is unlikely that the Xtext build reported 100.000.000 events to the monitor, but uses a rather deep layer of submonitors with children etc. Still the example illustrates that an operation that does almost take no time, becomes significantly slower if the the monitor is used to report fine grained progress.

I tested this with 3.8 and 4.4 and I'm happy to see that the ProgressMonitorDialog in 4.4 is working much better. The Job view in 3.8 was faster as it is now, though. Maybe you can find something when profiling the example.

Hardware and OS flags are set correct. This is probably different on windows or linux.
Comment 1 Lars Vogel CLA 2015-05-13 09:35:24 EDT
Sorry for keep moving that around.
Comment 2 Stefan Xenos CLA 2015-08-24 17:30:31 EDT
Hey, the problem with heavy progress reporting is actually a very old (known) performance issue, and it was actually one of the reasons for the creation of SubMonitor.

SubMonitor adds optimizations for the worked(...) and newChild(...) methods that allow them to be invoked heavily without a performance penalty. I'll try to dig up the original bug with the benchmarks.

What it lacks is similar optimizations for isCanceled() and setTaskName(...). I notice that none of the attached test cases cover the use of worked(...) without isCanceled -- and I'd bet that that will perform quite well.

Sergey Prigogin and I have been discussing offline an approach to cancellation that would address the bottleneck in isCanceled. I think the solution is pretty good -- I'll post details later.

However, that would leave setTaskName.

Sebastian: would a fast isCanceled(), newChild(), and worked() be enough for you or do you require a fast setTaskName, too?
Comment 3 Stefan Xenos CLA 2015-08-24 17:33:36 EDT
bug 157124 has the benchmarks for SubMonitor.worked() and SubMonitor.newChild().
Comment 4 Stefan Xenos CLA 2015-08-24 18:43:54 EDT
I've recorded our thoughts for fixing the isCanceled problem in bug 475747.
Comment 5 Stefan Xenos CLA 2015-08-24 18:48:01 EDT
We can track the problem to setTaskName here.

For setTaskName, it could work like this:

The root level progress monitor (in the Jobs framework) can record the time at which each setTaskName was invoked by a child monitor. If a child monitor attempts to call setTaskName more than once every N milliseconds (I'd say N=500 would work), it will suppress passing the call up to the UI. If two or more calls are made within that period, it will wait until N milliseconds have elapsed and then pass along the most recent task name to the UI.
Comment 6 Lars Vogel CLA 2015-08-25 04:52:53 EDT
(In reply to Stefan Xenos from comment #3)
> bug 157124 has the benchmarks for SubMonitor.worked() and
> SubMonitor.newChild().

Reading this bug made me aware the first time of SubMonitor instead of SubProgressMonitor. I guess I never saw the suggestion in the Javadoc of SubProgressMonitor. Created Bug 475767 to deprecate it to help people to use the right thing.
Comment 7 Sebastian Zarnekow CLA 2015-08-25 11:08:25 EDT
(In reply to Stefan Xenos from comment #2)
> Sebastian: would a fast isCanceled(), newChild(), and worked() be enough for
> you or do you require a fast setTaskName, too?

I think all methods on the PM interface should be fast and I would never expect that proper progress reporting would have a serious impact on the performance of the app. So yes, please make isCanceled, newChild, worked and setTaskName all fast :)
Comment 8 Stefan Xenos CLA 2015-09-08 19:17:19 EDT
I spent the weekend looking at the isCanceled problem. Patch attached to bug 475747.

I've added a new split() method to SubMonitor which is similar to newChild(...) but it also performs periodic cancellation checks. Using the new split method in place of newChild + isCanceled makes the whole job run roughly twice as fast.

I also came across the need for bug 476924 while working on this.

Once that goes in, the only remaining slow operations on SubMonitor will be the ones that take strings.
Comment 9 Sebastian Zarnekow CLA 2015-09-09 02:03:42 EDT
(In reply to Stefan Xenos from comment #8)
> I spent the weekend looking at the isCanceled problem. Patch attached to bug
> 475747.
> 
> I've added a new split() method to SubMonitor which is similar to
> newChild(...) but it also performs periodic cancellation checks. Using the
> new split method in place of newChild + isCanceled makes the whole job run
> roughly twice as fast.
> 

I understand that the normal coding pattern will still have an impact on the performance? Code that is supposed to run on older target platforms cannot use #split thus it wouldn't really help in our case.
Comment 10 Lars Vogel CLA 2015-09-09 03:21:50 EDT
(In reply to Sebastian Zarnekow from comment #9)
> I understand that the normal coding pattern will still have an impact on the
> performance? Code that is supposed to run on older target platforms cannot
> use #split thus it wouldn't really help in our case.

I think that is for all fixes the case, they only affect / work for the releases afterwards.
Comment 11 Sebastian Zarnekow CLA 2015-09-09 04:19:57 EDT
(In reply to Lars Vogel from comment #10)
> (In reply to Sebastian Zarnekow from comment #9)
> > I understand that the normal coding pattern will still have an impact on the
> > performance? Code that is supposed to run on older target platforms cannot
> > use #split thus it wouldn't really help in our case.
> 
> I think that is for all fixes the case, they only affect / work for the
> releases afterwards.

My point is, that we cannot use the new method since the code would no longer run on older platforms. If the implementation would have been fixed with the existing API, at least users with new eclipse versions would benefit from that. Right now we will still need to use ugly workarounds and cannot call #split or whatever new methods will be introduced.
Comment 12 Lars Vogel CLA 2015-09-09 04:38:33 EDT
(In reply to Sebastian Zarnekow from comment #11)
> (In reply to Lars Vogel from comment #10)
> > (In reply to Sebastian Zarnekow from comment #9)
> > > I understand that the normal coding pattern will still have an impact on the
> > > performance? Code that is supposed to run on older target platforms cannot
> > > use #split thus it wouldn't really help in our case.
> > 
> > I think that is for all fixes the case, they only affect / work for the
> > releases afterwards.
> 
> My point is, that we cannot use the new method since the code would no
> longer run on older platforms. 

You could potentially use a code switch, something like if (version of bundle>4.6) { instantiate code that works their} else {use workaround code}. 

Definitely not nice though.
Comment 13 Stefan Xenos CLA 2015-09-09 04:45:09 EDT
> My point is, that we cannot use the new method since the code
> would no longer run on older platforms

Well, you can - sort of. You could do what we've done, which is copy the new version of SubMonitor into an internal package and use that internally until you're ready to drop support for the older platform.

Alternatively, we could change the behavior of newChild to include cancellation checks like split() currently does. I was concerned that someone might argue that this would be a breaking change and that it might be hard to get the patch in with this approach... but it's cleaner IMO.

I can rework the patch to do the checks in newChild and we can see what the Equinox folks think of it. If there's resistance, we can always fall back to the current patch and use a copy of SubMonitor internally.
Comment 14 Stefan Xenos CLA 2015-09-09 04:46:27 EDT
We should probably be having this discussion in bug 475747.
Comment 15 Lars Vogel CLA 2015-10-15 10:02:21 EDT
(In reply to Sebastian Zarnekow from comment #0)
> I created a small isolated project on Github that illustrates the impact of
> heavy progress reporting:
> https://github.com/szarnekow/monitor-demo

Fixed via Bug 475747. See for performance benchmarks https://bugs.eclipse.org/bugs/show_bug.cgi?id=475747#c10

*** This bug has been marked as a duplicate of bug 475747 ***
Comment 16 Stefan Xenos CLA 2015-10-15 11:40:51 EDT
Not a duplicate of bug 475747. The tests attached here demonstrate performance problems in both cancellation checks and in rapid calls to setTaskName/setSubtaskName.

Bug 475747 addresses the cancellation problem but rapid calls to setTaskName are still an issue.
Comment 17 Lars Vogel CLA 2015-10-23 15:53:46 EDT
Stefan, I assume you do not plan to fix that for M3 anymore. Please move back if I'm incorrect.
Comment 18 Stefan Xenos CLA 2015-10-23 20:45:45 EDT
Yeah, I'm not sure when I'll have a chance to come back to this. I'm travelling over the next month and won't have any free weekends to work on platform stuff.
Comment 19 Lars Vogel CLA 2016-01-25 04:55:59 EST
Mass move to M6
Comment 20 Lars Vogel CLA 2016-03-08 03:02:43 EST
Sergey any update on this bug? Stefan assigned it to you but I have not seen any update since.
Comment 21 Stefan Xenos CLA 2016-03-10 01:31:25 EST
Back to me, I guess. :-)
Comment 22 Stefan Xenos CLA 2016-03-30 11:52:29 EDT
Related to bug 489779.
Comment 23 Lars Vogel CLA 2016-03-30 12:05:58 EDT
*** Bug 489779 has been marked as a duplicate of this bug. ***
Comment 24 Stefan Oehme CLA 2016-03-30 12:48:33 EDT
Regarding calls to #subTask: The throttling logic must be chosen such that the user will not see stale information if a subtask takes a long time.

For instance "Just drop subTask calls if the last call was displayed less then X ms ago" will not work. It could drop a subTask call which then takes very long to complete. While that subTask is running, the UI would still show the last displayed subTask, confusing the user about the source of the delay.

In Buildship we currently use an AtomicReference that stores the last logged subTask. Every 500ms another Thread fetches that message and pushes it to the UI.
Comment 25 Stefan Xenos CLA 2016-03-30 12:57:41 EDT
Our preferred algorithm would be "delay the update if the last update was within N milliseconds". When the time elapses, we'd display whatever the most recent message is. So some updates would get dropped but they would always be intermediate values and you could guarantee that whatever you're looking at was posted within the last N milliseconds.

However: that's contingent upon where the bottleneck is and whether or not we can find a way to fit this algorithm into the existing code without breakage. I suspect we can... but it's possible we may have to fall back to a less-desirable implementation like the one you mention in comment 24 if all else fails.
Comment 26 Stefan Xenos CLA 2016-03-30 13:05:38 EDT
> that's contingent upon where the bottleneck is 

Specifically: our algorithm requires the ability to schedule asynchronous work on the same thread at a later time, so it can only help us if the main bottleneck is on the UI thread. If there's a lot of slow stuff in the background thread (like synchronization in the jobs framework), we might not be able to use it.
Comment 27 Lars Vogel CLA 2016-03-30 15:23:04 EDT
I hope this can be done for M7, Stefan, please adjust target if that is not realistic.
Comment 28 Stefan Xenos CLA 2016-03-30 15:58:46 EDT
> I hope this can be done for M7, Stefan,

Yep! I'll move this up the list. At least, I'll investigate if the algorithm Sergey and I planned to use actually works.
Comment 29 Stefan Xenos CLA 2016-03-30 18:30:50 EDT
I did some quick profiling.

In the case of the UseMonitor test (the one that uses a ProgressMonitorDialog), it already runs pretty fast (6s) but we could speed it up by an additional 40% by using the algorithm I described above. That would eliminate the time spent in the UI thread, and cut the runtime down to about 4s.

In the case of UseJobMonitor (21s), the vast majority of time (17s) is going into this method:

ProgressManager$JobManager.setTaskName(), in the call to refreshJobInfo.

This is a much more important use-case, and our algorithm won't work for it since the bottleneck is in the background thread. I took a look at the code there and it's going to require some very careful rewriting which will take some time.

I had thought I'd try to fix this quickly today, but I think I'll have to allocate a 2-3 days for it later in April.
Comment 30 Stefan Xenos CLA 2016-04-22 14:07:03 EDT
I'm going to look into this today. Either I'll squish the bug or move it to 4.7.
Comment 31 Stefan Xenos CLA 2016-04-22 23:24:57 EDT
Results of today's analysis. (I'm mainly recording this here so I don't forget later).

Whenever a job changes state, ProgressManager builds a JobInfo struct and fires it as an event to three IJobProgressManagerListener instances. The events are fired synchronously on the same thread that each modified jobs runs on, so each of these listeners is receiving events on multiple threads at once.

This means that each listener needs to handle its own thread synchronization, which they do using volatile members, synchronized collections, synchronized blocks, etc. That synchronization accounts for a lot of the runtime (I suspect it accounts for almost all of it)... and the fact that the events are all synchronous makes them hard to throttle.

Fortunately, none of this is API so it can all be discarded and rewritten without breakage. A much more efficient model would be the one used by Eclipse databinding -- where all events are fired asynchronously on the same thread. In that case, listeners can rely on this fact and don't need any internal synchronization.

There are exactly three listeners -- in AnimationManager, ProgressViewUpdator, and FinishedJobs. AnimationManager is the slowest one due to its use of a SynchronizedSet and all it seems to be doing is computing whether or not there are any jobs running.

I believe it may be possible to have ProgressManager expose its mutable state using databinding observables and to rewrite the three listeners as ISideEffects. I'll write up more specifics once I've looked into it in more depth.

Any sort of fix here is going to mean significant change to (and possibly a full rewrite of) ProgressManager, so it will have some risk.
Comment 32 Stefan Xenos CLA 2016-04-25 11:58:33 EDT
Too high risk for 4.6 M7. Moving to 4.7.
Comment 33 Lars Vogel CLA 2016-08-17 05:35:15 EDT
FYI - From Bug 499791 I learned that we have AccumulatingProgressMonitor in JFace, maybe its code can be generalized?
Comment 34 Lars Vogel CLA 2016-08-17 07:24:06 EDT
*** Bug 497529 has been marked as a duplicate of this bug. ***
Comment 35 Stefan Xenos CLA 2016-08-17 12:48:08 EDT
Re: comment 23

AccumulatingProgressMonitor implements the algorithm from comment 25. It only works if the progress monitor it wraps is on the UI thread (IProgressMonitor is single-threaded).

Unfortunately, this won't work since the progress monitor it wraps is on a different thread and the reasons described in comment 29. As far as I can tell, we need to fix the root cause here rather than working around it by invoking setTaskName less frequently.

I see no reason why the root cause can't be fixed completely... but I've been trying to stay focused on my JDT core work until it's done.
Comment 36 Stefan Xenos CLA 2016-08-17 13:00:49 EDT
*** Bug 499791 has been marked as a duplicate of this bug. ***
Comment 37 Rob Stryker CLA 2016-08-17 16:55:09 EDT
From what I can tell, AccumulatingProgressMonitor is only used or instantiated by ModalContext. 

For example, ProgressMonitorDialog has:

	public void run(boolean fork, boolean cancelable,
			IRunnableWithProgress runnable) throws InvocationTargetException,
			InterruptedException {
		// etc etc sic
		ModalContext.run(runnable, fork, getProgressMonitor(), 
				getShell().getDisplay());
		// etc etc sic
	}


This makes sure to use the AccumulatingProgressMonitor. Since ProgressMonitorJobsDialog is a subclass, new ProgressMonitorJobsDialog(etc).run(etc) will also benefit from these improvements, and several UI classes (like CopyProjectAction) do make use of this. 

What doesn't make use of this is when ProgressManager simply calls dialog.show(job, shell). 

So, looking at IProgressService, the two methods have very different behavior.

run(boolean fork, boolean cancelable, IRunnableWithProgress runnable) creates a dialog of type ProgressMonitorJobsDialog,  where the bug is not visible, however, the user cannot send it the job to the background and must instead look at it. Only the subclass has the opportunity to send it to background

showInDialog instead uses ProgressMonitorFocusJobDialog  (note the 'focus') which doesn't seem (to me) to work with the run(etc) method, and instead requires use of the show method. 

tl;dr,  a workaround kinda exists if you have a runnable you just want to run, and that is fast. But if you want to launch the dialog for a pre-existing job, your options are limited.
Comment 38 Rob Stryker CLA 2016-08-17 17:18:34 EDT
It's worth noting that most dialogs have the new BlahDialog().run(fork, cancel, runnable) pattern working, but this pattern fails for ProgressMonitorFocusJobDialog because of the need for a Job to be passed in. In fact, it NPE's if you try to use that pattern. I understand it's not designed to be used that way, but, it could be. 

I've made a small example of a class extending ProgressMonitorFocusJobDialog with some small changes that could be made to make this pattern work. I've tested it, and it is fast, making use of the Accumulated monitor thing in jface. 

https://paste.fedoraproject.org/409902/71468297/

It's 80 lines but really it's only like 6 lines of functional code. The basic addition you'll see is run(boolean, boolean, Job), which will link up the job's progress monitor with a new stub runnable that's created for this task and passed to the super-class implementations of run(bool bool runnable), which uses the accumulator. 

This code isn't able to be compiled unless contributed to the progress packages. Otherwise, it needs to be done via reflection... which is kinda annoying ;)   But that's life I guess.
Comment 39 Stefan Xenos CLA 2016-08-17 18:32:09 EDT
Rob, although that pattern is related to the performance of progress reporting, I'm using this bug to track, specifically, the problem related to rapid calls to setTaskName in Jobs (which is the last remaining problem in the benchmarks Sebastian originally attached to this bug).

Could I trouble you to file another bug to track your additional optimizations to ProgressMonitorDialog so that this doesn't get sidetracked?
Comment 40 Stefan Xenos CLA 2016-08-17 18:36:19 EDT
Re: comment 38

It would also be helpful if you could describe more specifically what problem your patch is meant to solve. From what I can tell, the patch wouldn't address the problems demonstrated in comment 0.
Comment 41 Lars Vogel CLA 2016-08-19 04:12:00 EDT
Changing assignee to Mikael, based on https://bugs.eclipse.org/bugs/show_bug.cgi?id=470175#c42
Comment 42 Rob Stryker CLA 2016-08-22 17:59:18 EDT
Stefan: 
   I did open an issue, but it was closed as duplicate.  https://bugs.eclipse.org/bugs/show_bug.cgi?id=499791


However, the small file I linked fixes the case of new ProgressMonitorFocusJobDialog().run(fork, cancel, runnable)   when that runnable makes heavy use of setTaskName. 

I will try to make a more clear bugzilla with a more clear patch. I just need to find the eclipse ui source to do so ;)
Comment 43 Stefan Xenos CLA 2016-08-22 18:08:03 EDT
Re: comment 42

Yes, bug 499791 seems to be a dupe of this one. I guess I just don't see how your class addresses these issues.

Here are the steps to set yourself up with the UI source code:

https://wiki.eclipse.org/Platform_UI/How_to_Contribute
Comment 44 Rob Stryker CLA 2016-08-22 19:06:48 EDT
Hey guys:  

First, let me say I don't think it's appropriate to open a new bug, because the issue my patch is trying to solve is exactly that which you are trying to:  rapid calls to setTaskName making performance horrific. The thing is, this stuff was already fixed in ProgressMonitorDialog. I know, because I opened one of the bugs pointing it out, and I thought it was fixed until we found a slightly different workflow where it fails. 

Here's a better patch to the actual ProgressMonitorFocusJobDialog.  I hope it's understandable. 

Diff:  https://paste.fedoraproject.org/412680/71906366/
Complete file:  https://paste.fedoraproject.org/412679/90633714/

What I've essentially done is tried to convert the show(Job, shell) method to use the superclass run(etc) method.  Why? Because the superclass run(etc) method already uses ModelContext, which uses the AccumulatingProgressMonitor, and is already extremely efficient. 

I've left some sysout / printstacktrace in there, because I'm not sure how you'd like to handle them...  but this patch basically stops trying to re-invent the wheel, and use what the superclass already has solved for us. 

The most controversial part will be that I'm basically making a new runnable to use in the run(fork, cancel, runnable) call, and that runnable will join the job itself after setting a new wrapper monitor as the listener. 

The main difference here is instead of the blocking progress monitor, which you had here before my patch, trying to decide when and how to update the UI with various asyncExec calls etc,  we instead pass that off to the superclass and the very very efficient run(bool, bool, runnable) method. 

The wrapper prog-mon, in this case, isn't merely a listener. It's job is to listen to the job and report the same progress to the runnable's monitor. 

I've tested this in a few situations and it works as expected. I see absolutely no reason why it should fail at all, but perhaps some of you are smarter than me and will find small issues (or  big ones) with it. 



Again, I don't think it's appropriate to open a new bug. This patch will fix the issue of excessive calls to subTask or setTaskName or ANY of the methods in the progress monitor.
Comment 45 Rob Stryker CLA 2016-08-22 19:09:15 EDT
I can make a gerrit request for it if you think that's more appropriate... but i wanted to let you see it first. Let me know what you prefer.
Comment 46 Stefan Xenos CLA 2016-08-22 20:24:37 EDT
I haven't run your patch yet, but I'm very skeptical of it:

1. It doesn't make any changes to ProgressManager, which is where the biggest bottleneck is.

2. It removes all the asyncExec calls, which appear to be necessary to respect the threading contracts on the input monitor. Without them, all the progress monitor methods would be invoked on the job's thread, which would violaate its API contracts - unless there's something else there I'm missing which would cause progress monitor invocations to occur on the UI thread. If this works at all, I'm guessing it's because of issue 3.

3. It looks to me as though the call to join is occurring in the UI thread, which is effectively causing the UI to block on the background thread.

4. The use of job.join seems undesirable. We only have two threads of interest here - the UI thread and the background job being run - and neither of them should be blocking on the other.


So, I'd suggest testing this with the UseJobMonitor benchmark from comment 0. Please confirm that:
- progress is reported smoothly throughout the entire job
- the UI continues to paint and respond to user input while the job runs
- the root monitor is being invoked on the UI thread
- the Job is running in a background thread
- the UI is not blocked at any point
- the patch improves performance by at least 80% (my measurements in comment 29 predict we should be able to achieve this easily).

I suspect that multiple of the above items may fail. I think this patch may indeed make the job run faster by about 40%, but that it will do so by blocking the UI thread completely and suppressing all progress reporting. If you can confirm that you've tested the above, I'll try to reproduce your findings.

I don't think that patches that only apply to the progress dialog are likely to fix this bug. I'd recommend focusing on ProgressManager, which is where the profiler says the bottleneck is.
Comment 47 Stefan Xenos CLA 2016-08-22 20:29:37 EDT
Re: comment 44

Additional: Please explain which thread you intend to block with the call to job.join() and why you're blocking it.
Comment 48 Mikaël Barbero CLA 2016-08-23 04:52:39 EDT
FYI, I've started to work on the issue based on Stefan's analysis in comment #31. More on that later today.
Comment 49 Rob Stryker CLA 2016-08-23 10:35:33 EDT
Stefan:  I strongly suggest you try the patch. You will see a vast improvement in speed, and it is safe. I will address your points one by one. 

1) The bottleneck IS NOT IN ProgressManager. You think it is, but ProgressManager is not firing swt events, so it is not the bottleneck. ProgressManager.showInDialog(etc) simply opens ProgressMonitorFocusJobDialog. The rest of ProgressManager is simply adding listeners and moving information from one progress monitor to the other.  The bottleneck is in ProgressMonitorFocusJobDialog. 

2) re: the removal of asyncExec calls, My patch causes the dialog to make use of the superclass's run(bool, bool, runnable) call, which uses AccumulatingProgressMonitor, and AccumulatingProgressMonitor has the asyncExecs. And it does them more efficiently. 

3) Re: the job.join() being run in the UI thread, it is not. You can see that the diff has the following line:  

 ProgressMonitorFocusJobDialog.this.run(true, true, getRunnableForJob(job));

This means we are forking, and so it is not being run in the UI thread. A simple test and breakpoint will confirm this for you. 

4) Re: two threads of interest, in my patch we have three, because we are forking. The Job is the first, and it is running independently. The forked thread is spawned by dialog.run(bool, bool, runnable) via ModalContext, and the third is the UI thread. The runnable, a new third thread, simply joining the job, is safe, will not interfere here, will expire when the job is canceled, will not cause a deadlock (as far as I can tell), and simply acts as a way to keep the run(bool, bool, runnable) from completing too early. We want run(bool, bool, runnable) to stay until the job is complete, canceled, or errored. Joining the job is the best way to do it.  


I strongly suggest you at least test this patch. You will see a vast improvement and I really believe there are no problems with it. I'm sure you get requests to test crap patches all the time, and I wouldn't ask you to test it if I didn't already test it and feel comfortable that it's a good enough candidate for you to spend just a bit of time on.
Comment 50 Rob Stryker CLA 2016-08-23 10:37:59 EDT
I will work on testing the benchmark today and get back to you.
Comment 51 Stefan Xenos CLA 2016-08-23 11:13:42 EDT
> The bottleneck IS NOT IN ProgressManager. You think it is
> but ProgressManager is not firing swt events, so it is not the bottleneck

I think it is because jvisualvm said it is. It also said that SWT events didn't account for much of the runtime... but I'm open to the possibility that I misinterpreted the output. If your patch behaves as well as you claim, that would be pretty strong evidence of this.

> AccumulatingProgressMonitor has the asyncExecs.

Sounds reasonable.

> Re: two threads of interest, in my patch we have three, because we
> are forking.

I guess by threads of interest, I meant that there's only two threads that need to be there. We shouldn't be creating and intentionally blocking a third thread just so that we can have the side-effect of creating an AccumulatingProgressMonitor. If we believe that AccumulatingProgressMonitor is a good approach here, we should do that directly rather than relying on an undocumented side-effect of creating a third thread.

Anyway, it sounds like you've thought through the most important potential problems. I'll give it a try.
Comment 52 Rob Stryker CLA 2016-08-23 12:21:09 EDT
I've run all the tests, and, as you probably expected, my patch has 0 effect, because none of the tests are testing my usecase. None of the tests are testing the performance of ProgressMonitorFocusJobDialog. I will make a PR to the perf test with my usecase added. 

The tests are currently testing two things:  1) ProgressMonitorDialog.run(etc)  which already is performing quickly, and 2)  scheduling a job (and thus watching the progress in the status bar or the progress view). 

None of these tests are testing IProgressService.showInDialog(shell, job). 

So yeah... the progress view and the status bar probably need additional fixes similar to AccumulatingProgressMonitor, and my patch explicitly fixes only the case of IProgressService.showInDialog(shell, job) (which is not represented in the current perf project in comment #0)
Comment 53 Rob Stryker CLA 2016-08-23 13:31:35 EDT
I'll be making a PR to the perf tests so you can replicate, but, this is what I'm seeing in my two new test cases with my patch.


New Tests: using 1,000,000 iterations
a) Without patch
   6) Show With verbose job monitor
         - progress reported smoothly:  fail
         - UI continues to paint:  fail
         - root monitor is invoked in ui thread: idk
         - UI is not blocked at any point:  fail
         - patch perf stats:   UI freezes, but eventually returns
		UseJobMonitorShow Duration: 79308
		UseJobMonitorShow Time Until Msg Box Open: 156906
                ie, even after the job has technically finished, the UI is still 77 seconds behind and takes 77 seconds to catch up and open the dialog

   7) Show With job monitor
         - progress reported smoothly:  pass
         - UI continues to paint:  pass
         - root monitor is invoked in ui thread:  idk
         - UI is not blocked at any point:  pass
         - patch perf stats:
		UseJobMonitorNoTaskNameShow runnable Duration: 18
		UseJobMonitorNoTaskNameShow Time Until Msg Box Open: 32
		Basically, the nearly job completes before the progress dialog needs to open

b) With Patch
   6) Show With verbose job monitor
         - progress reported smoothly:  pass
         - UI continues to paint:  pass
         - root monitor is invoked in ui thread: idk
         - UI is not blocked at any point:  pass
         - patch perf stats:
		UseJobMonitor Duration: 597
		UseJobMonitor Time Until Msg Box Open: 597
   7) Show With job monitor
         - progress reported smoothly:  pass
         - UI continues to paint:  pass
         - root monitor is invoked in ui thread:  idk
         - UI is not blocked at any point:  pass
         - patch perf stats:
		UseJobMonitor Duration: 45
		UseJobMonitor Time Until Msg Box Open: 55

This shows a 99.3% improvement for usecase 6. It also shows that the messagebox now opens immediately when the job is finished, and the UI is never "behind".  

Usecase 7 went from 18 and 32 ms to 45 and 55 ms. I would argue that the low numbers here make it statistically insignificant, but you could theoretically call this a ~100% degredation ;) 

To be clear,  I believe my patch does not solve all use cases. It is only one small subset of the problems... but a 99.3% improvement in speed is admirable at least. 


If I scale up the iterations here to 300,000,000,  I can compare it with the existing tests I ran:

My New Test #6  @ 300,000,000 iterations:
   UseJobMonitorShow Duration: 70758
   UseJobMonitorShow Time Until Msg Box Open: 70758   

   1) Without Monitor
         - duration 112 ms
   2) With verbose monitor in dialog
                UseMonitor runnable Duration: 17556
                UseMonitor time until msg box open: 17556
   3) With monitor in dialog
		UseMonitorNoTaskName runnable Duration: 2419
		UseMonitorNoTaskName time until msg box open: 2420
   4) With verbose job monitor
		UseJobMonitor Duration: 60613
		UseJobMonitor Time Until Msg Box Open: 60613
   5) With job monitor
		UseJobMonitorNoTaskName runnable Duration: 6978
		UseJobMonitorNoTaskName Time Until Msg Box Open: 6978

So basically, this now puts us on par with the existing With verbose job monitor test.
Comment 54 Rob Stryker CLA 2016-08-23 13:40:24 EDT
I also think that since my issue is with the ProgressMonitorFocusJobDialog specifically, it should be a different bug. I think it's unrelated to your general issue here. My patch has no effect on the issues listed here, and I had to make new example usecases to demonstrate it.  So with your permission, Stefan, I think i should open another bug for my issue and you guys should continue tracking the rest of them here.
Comment 55 Rob Stryker CLA 2016-08-23 13:50:19 EDT
I've made changes to the perf demo app. OP can find the PR at https://github.com/szarnekow/monitor-demo/pull/1/files  and accept it if he wishes. 

Otherwise, others can find it over at https://github.com/robstryker/monitor-demo in the master branch.
Comment 56 Stefan Xenos CLA 2016-08-23 14:04:47 EDT
Re: comment 54

Yes, please open another bug for your performance enhancement, Rob... and thank you for helping with this. I have some further comments about the patch but we'll follow up in that bug.

FYI, it's easier for other folks to try and review your patch if you submit it in gerrit. Here's how:

https://wiki.eclipse.org/Platform_UI/How_to_Contribute
Comment 57 Stefan Xenos CLA 2016-08-23 14:07:21 EDT
Actually, I just reopened bug 499791. I think your test case proves that it's not a dupe of this.
Comment 58 Rob Stryker CLA 2016-08-23 15:16:37 EDT
As a last comment on this issue for now,  it might be prudent to have ProgressManager use something similar to AccumulatingProgressMonitor. And once you decide you're willing to accumulate and ignore some events, it might be prudent to think how often you really do need to send updates to the UI for any issues. 

Current tests that I ran with 3,000,000 iterations were at most 60 seconds (ignoring the horrific performance in my bug). This means the UI is getting asked to make 50000 updates per second. 

It's arguable that the human eye probably needs no more than 500 frames per second. If you were to update the UI at most 500 times per second, via a custom progress monitor that throttles the UI requests, you could cut the time it takes to complete by 99%.
Comment 59 Stefan Xenos CLA 2016-08-23 19:22:39 EDT
> it might be prudent to have ProgressManager use something similar
> to AccumulatingProgressMonitor

Yes, those are all good points. This is exactly what I described in comment 25. Comment 26 describes the conditions under which this sort of algorithm works, and comment 29 explains why it is difficult for this specific bottleneck. Basically, the thread doing the most expensive stuff has no equivalent of asyncExec that executes code on the same thread at a later time.

Moving this computation into the UI thread so that we can make use of the databinding APIs (which provide something very much like asyncExec) is what comment 31 recommends.
Comment 60 Lars Vogel CLA 2016-08-24 05:51:07 EDT
(In reply to Rob Stryker from comment #58)
> As a last comment on this issue for now,  it might be prudent to have
> ProgressManager use something similar to AccumulatingProgressMonitor. And
> once you decide you're willing to accumulate and ignore some events, it
> might be prudent to think how often you really do need to send updates to
> the UI for any issues. 

Sounds reasonable. Would you be available to upload a Gerrit review for this suggestions? I'm sure Stefan will welcome help for this issue.

> It's arguable that the human eye probably needs no more than 500 frames per
> second. 

+1
Comment 61 Mikaël Barbero CLA 2016-08-24 06:30:04 EDT
(In reply to Mikaël Barbero from comment #48)
> FYI, I've started to work on the issue based on Stefan's analysis in comment
> #31. More on that later today.

Here are the results of my yesterday analysis. 

I could not demonstrate that the synchronization accounts for a lot of the run time. I've used Yourkit and neither a tracing profiling nor a monitor profiling have shown such an issue. Stefan, how did you come to this conclusion in comment #31? Do you still have your profiling results?

I've then tested a very simple implementation of idea in comment #25. I will push a gerrit review shortly. Please note that this is not to be merged as is, it's just to test ideas.

Here are my benchmarks (with a 100ms refresh rate) on macOS:

Before: ~25s
After: ~12s

And a profiling with my patch does not show a lot of thing that can be done to improve this time.
Comment 62 Eclipse Genie CLA 2016-08-24 06:39:26 EDT
New Gerrit change created: https://git.eclipse.org/r/79602
Comment 63 Lars Vogel CLA 2016-08-24 06:40:51 EDT
(In reply to Lars Vogel from comment #60)
> Sounds reasonable. Would you be available to upload a Gerrit review for this
> suggestions? I'm sure Stefan will welcome help for this issue.

Sorry Mickael, I forgot that you are already working on this issue. I hope Rob can help with the review.
Comment 64 Stefan Xenos CLA 2016-08-24 13:09:08 EDT
Re: comment 61

I don't still have my profiler results, but I'll collect new ones (I'll try to get them today.)

I think your results are consistent with mine. I was testing UseJobMonitor using 100ms sampling with jvisualvm on ubuntu.

Your patch sends the progressmanager notifications less frequently. Most of the hotspots I was seeing were in the receivers of those notifications so it makes sense that the patch would speed things up a lot.
Comment 65 Stefan Xenos CLA 2016-08-24 14:30:11 EDT
I'm going to start by gathering up all the benchmarks from the various progress-monitor-related performance bugs and converting them to standard Eclipse performance tests. That should make it easier for us all to compare our test results and give us regression tests in the future.
Comment 66 Eclipse Genie CLA 2016-08-24 15:49:05 EDT
New Gerrit change created: https://git.eclipse.org/r/79664
Comment 67 Stefan Xenos CLA 2016-08-24 15:54:52 EDT
Mikael + Rob + Sebastian:

I've converted all of Sebastian's benchmarks, my own benchmarks bug 475747, and Rob's benchmarks from bug 499791 into a standard Eclipse performance regression test suite (attached).

This should make it easier for us all to run one another's tests and to be certain we are comparing apples-with-apples.

I've included a MAX_RUNTIME constant near the top of the file. If you're running the test in a profiler, make this really big and it will run the test for longer so you can get better data.

If you'd all be so kind, I'd appreciate if everyone could take a quick look over the test cases so we're sure I didn't miss a case that one of your patches was meant to improve.
Comment 69 Stefan Xenos CLA 2016-08-25 02:49:31 EDT
Created attachment 263767 [details]
Spreadsheet comparing the performance of various patches

I ran the following patches against my new performance suite (results attached)

Labelled as "ProgressManager patch"
https://git.eclipse.org/r/#/c/79602/

Labelled as "ProgressMonitorFocusJob patch"
https://git.eclipse.org/r/#/c/79578/

The test confirms a 2-3x speed up for "ProgressManager patch" for several test cases, but found no significant performance improvement in "ProgressMonitorFocusJob patch". I suspect the latter result just indicates my test case doesn't test the correct thing. I'd be grateful if Rob could show me what I missed (The tests containing "ProgressService" in their names are supposed to be the ones based on Rob's test cases).

Other findings:

The cost of monitor.isCanceled() appears to be a lot lower than I remember, so we can probably bump up the TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK constant right now without much of a performance penalty.
Comment 70 Stefan Xenos CLA 2016-08-25 03:02:20 EDT
Re: Comment 69

I figured out the problem with my test case -- Rob must have been testing his code with the "always run in background" option disabled, and I was testing with it enabled. I'll update the test case to flip the option to the correct value at the start/end of the test.
Comment 71 Mikaël Barbero CLA 2016-08-25 04:17:04 EDT
(In reply to Stefan Xenos from comment #67)
> If you'd all be so kind, I'd appreciate if everyone could take a quick look
> over the test cases so we're sure I didn't miss a case that one of your
> patches was meant to improve.

LGTM

(In reply to Stefan Xenos from comment #69)
> The test confirms a 2-3x speed up for "ProgressManager patch" for several
> test cases, 

Glad we have the same order of speedup. I will do another profiling session and see if we can improve this a bit as we are still twice as slow as this test case (no job involved) https://github.com/szarnekow/monitor-demo/blob/master/src/monitordemo/handlers/UseMonitor.java
Comment 72 Rob Stryker CLA 2016-08-25 04:42:21 EDT
> If you'd all be so kind, I'd appreciate if everyone could take a quick look over the test cases so we're sure I didn't miss a case that one of your patches was meant to improve.

Tests all look good to me. Nothing seems missing as far as I can tell. 


> Rob must have been testing his code with the "always run in background" option disabled,

We're lucky you figured that out, because I never would have found that ;) Now that you found out the difference, can you update the spreadsheet?  I'd love to have proof someone was able to verify I actually fixed something, and right now it looks like I didn't ;) Also I'm curious if you were able to replicate my results as dramatically as it seemed to me. 

>I'll update the test case to flip the option to the correct value at the start/end of the test.

Depending on your curiosity level, you may wish to test both combinations always for all tests.  I'm not sure it really applies for all tests, but it's worth considering. 


> I hope Rob can help with the review.

I'd be glad to run Mickael's patch when it's available and assist in the review.
Comment 73 Stefan Xenos CLA 2016-08-25 04:59:53 EDT
> Also I'm curious if you were able to replicate my results as
> dramatically as it seemed to me. 

Yes. I was able to reproduce a dramatic speed improvement with your patch, once I started flipping the preference. Actually, the improvement is so dramatic that it's causing problems with my test -- I wasn't run the baseline version in any reasonable amount of time. I had to reduce the total number of iterations by 30x to get a measurement.
Comment 74 Eclipse Genie CLA 2016-08-25 05:04:39 EDT
New Gerrit change created: https://git.eclipse.org/r/79708
Comment 75 Stefan Xenos CLA 2016-08-25 05:07:01 EDT
Could I trouble you folks for another code review?

https://git.eclipse.org/r/#/c/79708/

I updated the test case to cover the bottlenecks that Robs patch fixes... and I scaled back the number of iterations on all tests by a factor of 10 so that they can still be run without his patch.
Comment 76 Stefan Xenos CLA 2016-08-25 05:18:51 EDT
Re: Comment 72

> Now that you found out the difference, can you update the spreadsheet? 

Not tonight. It's well past midnight in California right now.

However, I can confirm that the subjective performance improvement of your patch is spectacular. Do you have plans to update it to avoid the unnecessary extra thread and the join? I'd like to merge it.

If you're stumped and can't figure out how to avoid the extra thread, I can help try to figure it out tomorrow.
Comment 77 Mikaël Barbero CLA 2016-08-25 06:19:32 EDT
(In reply to Mikaël Barbero from comment #71)
> Glad we have the same order of speedup. I will do another profiling session
> and see if we can improve this a bit as we are still twice as slow as this
> test case (no job involved)
> https://github.com/szarnekow/monitor-demo/blob/master/src/monitordemo/
> handlers/UseMonitor.java

I've updated my patch (https://git.eclipse.org/r/#/c/79602/) with almost 30% of speedup. It is done by using ConcurrentHashMap and ConcurrentLinkedQueue instead of synchronized Map and List where profiler was showing hot spots. There are several other synchronized collections around ProgressManager component, but as I don't have use cases showing issues, I preferred to keep them as is.

(In reply to Rob Stryker from comment #72)
> I'd be glad to run Mickael's patch when it's available and assist in the
> review.

Over here https://git.eclipse.org/r/#/c/79602 :) Thanks Rob.
Comment 79 Stefan Xenos CLA 2016-08-25 13:35:48 EDT
Mikael,

Your current patch uses a ScheduledExecutorService to get the execute-this-code-later behavior. This is giving us a 2-3x speedup (which is pretty good), but I think we may still be able to do better.

What if, instead of using a new thread, we use the UI thread for this purpose... and instead of using executor.schedule, we use Display.asyncExec followed by Display.timerExec.

We can make it part of the contract on those listener interfaces that they're always invoked in the UI thread. Then we can remove all the synchronized blocks, Concurrent data structures, and asyncExecs from those listeners. As long as none of the listeners do anything that takes longer than about 6ms, putting this on the UI thread shouldn't introduce any UI lag... but the removal of the extra synchronization, extra asyncExecs, and concurrent data structures should give us an extra speed boost.

The only remaining concurrent structure would be the "jobs" map in ProgressManager and we may even be able to get rid of that if we attach the JobInfo metadata to the Job directly somehow and only access it from the job's thread itself.
Comment 80 Mikaël Barbero CLA 2016-08-25 15:28:16 EDT
(In reply to Stefan Xenos from comment #79)

Thanks for your comments. I will try tomorrow to use the UI thread and remove all the unnecessary synchronized blocks and concurrent data structures from the listeners. 

Once that done, I will think about the "jobs" map in the ProgressManager.
Comment 81 Stefan Xenos CLA 2016-08-25 15:40:06 EDT
> I will try tomorrow to use the UI thread and remove all the
> unnecessary synchronized blocks

FYI, I just intended it as a suggestion. I actually think your patch is acceptable in its current form. If you're happy with it, we may want to merge your current patch and then investigate the possibility of using the UI thread as a separate enhancement (using your current patch as a performance baseline).
Comment 82 Rob Stryker CLA 2016-08-25 15:48:28 EDT
@Stefan:  I made a new patch on my issue,  https://git.eclipse.org/r/#/c/79578/

I tried about 7 different things, but none were working as expected, so in the end, I had to basically paste (ugh) AccumulatingProgressMonitor into the class. Disguting, I know, but I tried more than a half-dozen possible solutions and they either weren't working, blocked the UI, didn't fix the problem, or outright caused errors. 

I know it's not the nicest of solutions, but it works, it fixes your issue with the threads, and it basically just removes the original getBlockingProgressMonitor(etc) and replaces it with AccumulatingProgressMonitor instead. 

I also ran another small test, like making sure the collector (ie the display.asyncExec) only ran once every 100ms, but that didn't seem to be a significant overhead at all (shaved a half second off a 15 second job, so only 3%) so I don't think it's worth it to put that in.
Comment 83 Rob Stryker CLA 2016-08-25 16:59:17 EDT
 Mikaël Barbero's patch is definitely working.  One of my tests takes (avg of 3 attempts)  768ms with both his and my patch together.  With just my patch, it's (avg of 3 attempts)  2061ms. It's really nice to see how these patches are cumulative in nature. 

I don't think there's much more I can add right now until we see what Mikael comes up with for scheduling the work on the UI thread and see if that helps even more.
Comment 84 Lars Vogel CLA 2016-08-26 04:04:51 EDT
(In reply to Stefan Xenos from comment #81)
> > I will try tomorrow to use the UI thread and remove all the
> > unnecessary synchronized blocks
> 
> FYI, I just intended it as a suggestion. I actually think your patch is
> acceptable in its current form. If you're happy with it, we may want to
> merge your current patch and then investigate the possibility of using the
> UI thread as a separate enhancement (using your current patch as a
> performance baseline).

+1 for merging the existing patches and to open a new bug for further investigations. The current improvements are already impressive and changing the threading approach might create problems downstream.
Comment 85 Mikaël Barbero CLA 2016-08-26 09:14:22 EDT
(In reply to Lars Vogel from comment #84)
> (In reply to Stefan Xenos from comment #81)
> > > I will try tomorrow to use the UI thread and remove all the
> > > unnecessary synchronized blocks
> > 
> > FYI, I just intended it as a suggestion. I actually think your patch is
> > acceptable in its current form. If you're happy with it, we may want to
> > merge your current patch and then investigate the possibility of using the
> > UI thread as a separate enhancement (using your current patch as a
> > performance baseline).
> 
> +1 for merging the existing patches and to open a new bug for further
> investigations. The current improvements are already impressive and changing
> the threading approach might create problems downstream.

Ok for merging. I've cleanup my patch and it is ready for review / submit
Comment 86 Mikaël Barbero CLA 2016-08-26 09:25:10 EDT
(In reply to Lars Vogel from comment #84)
> +1 for merging the existing patches and to open a new bug for further
> investigations. The current improvements are already impressive and changing
> the threading approach might create problems downstream.

I've opened bug 500332 for further improvements.
Comment 88 Lars Vogel CLA 2016-08-26 13:41:10 EDT
Mickael, can you please add an entry to the N&N M2?
Comment 89 Stefan Xenos CLA 2016-08-26 13:41:18 EDT
Mikael, I merged your patch... it's a great improvement, thanks!

I think with this change, we've made significant improvements to all the benchmarks that were reported in comment 0 so we can finally close this.

However, I don't think we're done with performance problems in progress reporting, though. We'll follow up in bug 500332 and bug 499791 (which is still a very serious performance problem).
Comment 90 Lars Vogel CLA 2016-08-26 13:43:08 EDT
Mikael, can you please add an entry to the N&N M2?
Comment 91 Eclipse Genie CLA 2016-08-30 14:33:23 EDT
New Gerrit change created: https://git.eclipse.org/r/80057
Comment 92 Mikaël Barbero CLA 2016-08-30 15:02:22 EDT
(In reply to Lars Vogel from comment #90)
> Mikael, can you please add an entry to the N&N M2?

I did not forget and will do it this week.
Comment 94 Eclipse Genie CLA 2016-09-02 10:53:13 EDT
New Gerrit change created: https://git.eclipse.org/r/80312
Comment 95 Mikaël Barbero CLA 2016-09-02 10:53:36 EDT
(In reply to Lars Vogel from comment #90)
> Mikael, can you please add an entry to the N&N M2?

Done. https://git.eclipse.org/r/80312
Comment 96 Lars Vogel CLA 2016-09-05 05:07:25 EDT
I mark this as fixed. Thanks everyone and especially Mikael, Stefan and Rob for the work.

@Mikael, Stefan and Rob, if you think this should be downported to 4.6.2, please reopen and change the target.
Comment 97 Eclipse Genie CLA 2016-09-05 05:24:35 EDT
New Gerrit change created: https://git.eclipse.org/r/80370
Comment 99 Stefan Xenos CLA 2016-09-14 11:51:05 EDT
Reopening. This caused a regression. See bug 501434.
Comment 100 Stefan Xenos CLA 2016-09-15 11:40:45 EDT
Closing again. Markus says he doesn't believe the regression is serious enough to warrant reverting the patch.