Bug 498042 - By using Split, it takes a lot of time to cancel ProgressMonitor
Summary: By using Split, it takes a lot of time to cancel ProgressMonitor
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 4.7   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: 4.7 M4   Edit
Assignee: Stefan Xenos CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 445802
Blocks: 493384
  Show dependency tree
 
Reported: 2016-07-18 05:19 EDT by Vikas Chandra CLA
Modified: 2017-02-07 09:29 EST (History)
7 users (show)

See Also:
Lars.Vogel: review+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vikas Chandra CLA 2016-07-18 05:19:42 EDT
See 
https://bugs.eclipse.org/bugs/show_bug.cgi?id=493384#c46

Steps

1) Take latest eclipse
2) Apply gerrit patch https://git.eclipse.org/r/72453  on org.eclipse.pde.ui
( get pde code from ssh://git.eclipse.org/gitroot/pde/eclipse.pde.ui.git)

3) Launch eclipse
4) Import org.eclipse.jdt.core ( Import plugin-fragments and select project with source)

5) On manifest editor , go to runtime tab and click on Calculate Uses.
6) Cancel immediately

It takes 3-5 seconds to cancel. With isCanceled check it can be canceled instantaneously.

Further debug code as mentioned in https://bugs.eclipse.org/bugs/show_bug.cgi?id=493384#c46 shows that split was called ( multiple times)  and yet cancel didn't happen

More than 1 second delay is not good.
Comment 1 Mikaël Barbero CLA 2016-07-19 10:30:53 EDT
What about simplifying the logic that SubMonitor.split(int, int) uses to determine whether root.checkForCancellation() should be called? 

I think that it should be pretty straightforward to add a long field to SubMonitor set to the value of System.nanoTime() every time isCanceled is call. This value would need to be pass to any new submonitor created with split() (the private constructor would need to be updated). 

Then the logic for determining whether root.checkForCancellation() should be called is pretty easy: does current nano time minus the field value is greater than a given threshold (say 1 second)?

What do you think?
Comment 2 Stefan Xenos CLA 2016-07-19 13:49:01 EDT
The promise that SubMonitor.split(...) makes is that, no matter how frequently you call split(...), the cancellation checks won't create a performance problem.

Specifically, I've tuned it so that it can't use more than 10% of your job's runtime in the most extreme cases. Bug 475747 talks about the rationale and describes the benchmarks we used to tune it to 10%.

Currently, there is a performance problem in the workbench (bug 445802) that causes cancellation checks to be much slower than they should be (I'd guess, by about a factor of 10).

Once that is sorted out, I'll re-run the benchmarks and re-tune SubMonitor.split - specifically, by reducing the SubMonitor.TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK constant as low as possible without stepping over the 10% CPU time limit. Assuming we can actually speed up the cancellation checks by 10x then we should be able to do 10x as many of them and the average cancellation time will be reduced from 5s worst-case to 500ms worst-case.

I'm planning this work for later on in the 4.7 cycle. However, until then I'd suggest the following code pattern:

- Continue to use SubMonitor.split() wherever you can.
- If you have a specific job where cancellation is important and common, include manual cancellation checks.
- Once we've had a chance to sort out bug 445802, I'll retune SubMonitor.split().
- At that point, you can remove the manual cancellation checks.

Why not System.nanotime() or System.currentTimeMillis()? We tested this approach and the calls to currentTimeMillis() were too slow. They created the very performance problem that SubMonitor.split is intended to fix. If you want to replicate my results, let me know and I'll try to find the exact benchmark I used.
Comment 3 Stefan Xenos CLA 2016-07-19 13:56:36 EDT
Note that the API promise of SubMonitor.split is that it is always efficient. Low cancellation latency is a highly desirable trait but not an API promise. If we traded runtime efficiency for cancellation latency, there would be no workaround for those folks than need the performance. Going the other way, there is an easy workaround (it's easy to lower the cancellation latency by adding manual cancellation checks).

Hopefully once the performance problems in the jobs framework are fixed, there won't be any need for a tradeoff here. We'll be able to have both performance and low cancellation latency.
Comment 4 Mikaël Barbero CLA 2016-07-19 15:30:14 EDT
Thanks Stefan for the very clear explanation. I trust you about currentTimeMillis and nanoTime performance issues ;) I also realize that the path you've chosen to implement split() is better by letting the user decide when he needs low cancelation latency.

Let me know if I can help with bug 445802.
Comment 5 Stefan Xenos CLA 2016-07-20 14:37:59 EDT
Re: comment 4

Sure, you could help with bug 445802. I estimate the fix will take about 40 hours. Let me know on that bug if you have the time free and I can walk you through the fix I was considering.
Comment 6 Stefan Xenos CLA 2016-08-25 03:23:35 EDT
I repeated my original analysis and I think I was being overly conservative when I tuned TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK. 

I repeated my tests and I think I should be able to decrease the value of TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK by about 100 times and cancellation checks would still stay under 15% CPU time.

Based on this, I can probably fix this right now. Patch forthcoming.
Comment 7 Stefan Xenos CLA 2016-08-25 03:24:49 EDT
Note that if anyone notices performance regressions (even minor ones) due to this change, I'll have to raise the constant again.
Comment 8 Eclipse Genie CLA 2016-08-25 03:29:55 EDT
New Gerrit change created: https://git.eclipse.org/r/79695
Comment 9 Vikas Chandra CLA 2016-08-25 05:41:32 EDT
As far as comment#0 is concerned 

1) TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK=1000
Time taken for cancellation = 3 to 5 seconds

2) TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK=100
Time taken for cancellation = well within 1 second ( may be half a second or so)


3) TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK=10  ( current patch)
Time taken for cancellation = instantaneous
Comment 10 Stefan Xenos CLA 2016-08-25 14:15:32 EDT
Please keep an eye out for performance regressions due to this patch. If we see any significant amount of time going into the isCancelled call in SubMonitor.split in ANY job, we'll need to increase the constant... although probably not as high as it was before.
Comment 12 Lars Vogel CLA 2016-08-29 14:49:44 EDT
Stefan, is this something we should downport?
Comment 13 Stefan Xenos CLA 2016-08-29 17:09:02 EDT
Re: comment 12

Not yet. Despite its simplicity, I think there's some risk to this change. I'd like to let it simmer for a month or so in master before doing a backport.
Comment 14 Stefan Xenos CLA 2016-08-29 23:56:35 EDT
> I'd like to let it simmer for a month or so in master before
> doing a backport.

Actually, I take it back. If this causes a performance regression, it will be super-easy to revert from both streams... so we may as well backport if you want to.

Sorry for flip-flopping. :-)
Comment 15 Vikas Chandra CLA 2016-08-30 04:40:52 EDT
Is this back-ported in 4.6.1?
Comment 16 Lars Vogel CLA 2016-08-30 05:26:10 EDT
(In reply to Vikas Chandra from comment #15)
> Is this back-ported in 4.6.1?

AFAICS the change was only merged into master. 

IMHO this is too late for 4.6.1 and we should wait for 4.6.2 if we want to backport.
Comment 17 Vikas Chandra CLA 2016-10-13 04:31:38 EDT
Is this backported in 462 now?
Comment 18 Vikas Chandra CLA 2016-10-18 01:56:22 EDT
Stefan, Can this be backported to 4.6.2?  The 4.6.2 stream is open for development.
Comment 19 Eclipse Genie CLA 2016-11-08 12:00:55 EST
New Gerrit change created: https://git.eclipse.org/r/84689
Comment 20 Stefan Xenos CLA 2016-11-22 17:57:31 EST
Lars or Vikas, if one of you is keen to have a backport, can one of you folks do it?
Comment 21 Eclipse Genie CLA 2016-11-23 00:40:25 EST
New Gerrit change created: https://git.eclipse.org/r/85558
Comment 22 Lars Vogel CLA 2016-11-23 00:42:50 EST
> Lars or Vikas,..can one of you folks do it?

Sure. 

(In reply to Eclipse Genie from comment #21)
> New Gerrit change created: https://git.eclipse.org/r/85558

Dani, please review and give PMC approval if appropriate.
Comment 23 Dani Megert CLA 2016-11-23 04:59:14 EST
(In reply to Lars Vogel from comment #22)
> > Lars or Vikas,..can one of you folks do it?
> 
> Sure. 
> 
> (In reply to Eclipse Genie from comment #21)
> > New Gerrit change created: https://git.eclipse.org/r/85558
> 
> Dani, please review and give PMC approval if appropriate.

I'm not an RT PMC member, and I can also not submit it, since I'm not a committer on Equinox.
Comment 24 Lars Vogel CLA 2016-11-23 07:53:19 EST
(In reply to Dani Megert from comment #23)
> I'm not an RT PMC member, and I can also not submit it, since I'm not a
> committer on Equinox.

I'm not sure how the RT PMC works but I added Thomas to give approval or to decline it.
Comment 25 Lars Vogel CLA 2016-11-25 15:25:00 EST
No response from Thomas and RC3 is out of the door, moving back to the 4.7 stream.
Comment 26 Dani Megert CLA 2016-11-28 10:03:38 EST
(In reply to Lars Vogel from comment #25)
> No response from Thomas and RC3 is out of the door, moving back to the 4.7
> stream.

Why? 4.6.3 is way ahead.
Comment 27 Dani Megert CLA 2017-02-07 05:17:46 EST
Is this still on the list for 4.6.3?
Comment 28 Vikas Chandra CLA 2017-02-07 06:13:50 EST
This fix is already in the master for over 3 months and "if" there is no issue with the fix, this can be safely ported in 4.6.3. 

Stefan, Do you think this can be put in 4.6.3?
Comment 29 Lars Vogel CLA 2017-02-07 06:33:17 EST
(In reply to Dani Megert from comment #27)
> Is this still on the list for 4.6.3?

I would not downport, the job framework received lots to other tweaks also. IMHO not worth the effort to validate the fix, especially that 4.7.0 is not to long away from 4.6.3.
Comment 30 Stefan Xenos CLA 2017-02-07 09:23:06 EST
Ok. Closing.