Community
Participate
Working Groups
See https://bugs.eclipse.org/bugs/show_bug.cgi?id=493384#c46 Steps 1) Take latest eclipse 2) Apply gerrit patch https://git.eclipse.org/r/72453 on org.eclipse.pde.ui ( get pde code from ssh://git.eclipse.org/gitroot/pde/eclipse.pde.ui.git) 3) Launch eclipse 4) Import org.eclipse.jdt.core ( Import plugin-fragments and select project with source) 5) On manifest editor , go to runtime tab and click on Calculate Uses. 6) Cancel immediately It takes 3-5 seconds to cancel. With isCanceled check it can be canceled instantaneously. Further debug code as mentioned in https://bugs.eclipse.org/bugs/show_bug.cgi?id=493384#c46 shows that split was called ( multiple times) and yet cancel didn't happen More than 1 second delay is not good.
What about simplifying the logic that SubMonitor.split(int, int) uses to determine whether root.checkForCancellation() should be called? I think that it should be pretty straightforward to add a long field to SubMonitor set to the value of System.nanoTime() every time isCanceled is call. This value would need to be pass to any new submonitor created with split() (the private constructor would need to be updated). Then the logic for determining whether root.checkForCancellation() should be called is pretty easy: does current nano time minus the field value is greater than a given threshold (say 1 second)? What do you think?
The promise that SubMonitor.split(...) makes is that, no matter how frequently you call split(...), the cancellation checks won't create a performance problem. Specifically, I've tuned it so that it can't use more than 10% of your job's runtime in the most extreme cases. Bug 475747 talks about the rationale and describes the benchmarks we used to tune it to 10%. Currently, there is a performance problem in the workbench (bug 445802) that causes cancellation checks to be much slower than they should be (I'd guess, by about a factor of 10). Once that is sorted out, I'll re-run the benchmarks and re-tune SubMonitor.split - specifically, by reducing the SubMonitor.TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK constant as low as possible without stepping over the 10% CPU time limit. Assuming we can actually speed up the cancellation checks by 10x then we should be able to do 10x as many of them and the average cancellation time will be reduced from 5s worst-case to 500ms worst-case. I'm planning this work for later on in the 4.7 cycle. However, until then I'd suggest the following code pattern: - Continue to use SubMonitor.split() wherever you can. - If you have a specific job where cancellation is important and common, include manual cancellation checks. - Once we've had a chance to sort out bug 445802, I'll retune SubMonitor.split(). - At that point, you can remove the manual cancellation checks. Why not System.nanotime() or System.currentTimeMillis()? We tested this approach and the calls to currentTimeMillis() were too slow. They created the very performance problem that SubMonitor.split is intended to fix. If you want to replicate my results, let me know and I'll try to find the exact benchmark I used.
Note that the API promise of SubMonitor.split is that it is always efficient. Low cancellation latency is a highly desirable trait but not an API promise. If we traded runtime efficiency for cancellation latency, there would be no workaround for those folks than need the performance. Going the other way, there is an easy workaround (it's easy to lower the cancellation latency by adding manual cancellation checks). Hopefully once the performance problems in the jobs framework are fixed, there won't be any need for a tradeoff here. We'll be able to have both performance and low cancellation latency.
Thanks Stefan for the very clear explanation. I trust you about currentTimeMillis and nanoTime performance issues ;) I also realize that the path you've chosen to implement split() is better by letting the user decide when he needs low cancelation latency. Let me know if I can help with bug 445802.
Re: comment 4 Sure, you could help with bug 445802. I estimate the fix will take about 40 hours. Let me know on that bug if you have the time free and I can walk you through the fix I was considering.
I repeated my original analysis and I think I was being overly conservative when I tuned TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK. I repeated my tests and I think I should be able to decrease the value of TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK by about 100 times and cancellation checks would still stay under 15% CPU time. Based on this, I can probably fix this right now. Patch forthcoming.
Note that if anyone notices performance regressions (even minor ones) due to this change, I'll have to raise the constant again.
New Gerrit change created: https://git.eclipse.org/r/79695
As far as comment#0 is concerned 1) TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK=1000 Time taken for cancellation = 3 to 5 seconds 2) TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK=100 Time taken for cancellation = well within 1 second ( may be half a second or so) 3) TRIVIAL_OPERATIONS_BEFORE_CANCELLATION_CHECK=10 ( current patch) Time taken for cancellation = instantaneous
Please keep an eye out for performance regressions due to this patch. If we see any significant amount of time going into the isCancelled call in SubMonitor.split in ANY job, we'll need to increase the constant... although probably not as high as it was before.
Gerrit change https://git.eclipse.org/r/79695 was merged to [master]. Commit: http://git.eclipse.org/c/equinox/rt.equinox.bundles.git/commit/?id=04d0bef8bc6079faf513c7e893ec321dc4e9b3d9
Stefan, is this something we should downport?
Re: comment 12 Not yet. Despite its simplicity, I think there's some risk to this change. I'd like to let it simmer for a month or so in master before doing a backport.
> I'd like to let it simmer for a month or so in master before > doing a backport. Actually, I take it back. If this causes a performance regression, it will be super-easy to revert from both streams... so we may as well backport if you want to. Sorry for flip-flopping. :-)
Is this back-ported in 4.6.1?
(In reply to Vikas Chandra from comment #15) > Is this back-ported in 4.6.1? AFAICS the change was only merged into master. IMHO this is too late for 4.6.1 and we should wait for 4.6.2 if we want to backport.
Is this backported in 462 now?
Stefan, Can this be backported to 4.6.2? The 4.6.2 stream is open for development.
New Gerrit change created: https://git.eclipse.org/r/84689
Lars or Vikas, if one of you is keen to have a backport, can one of you folks do it?
New Gerrit change created: https://git.eclipse.org/r/85558
> Lars or Vikas,..can one of you folks do it? Sure. (In reply to Eclipse Genie from comment #21) > New Gerrit change created: https://git.eclipse.org/r/85558 Dani, please review and give PMC approval if appropriate.
(In reply to Lars Vogel from comment #22) > > Lars or Vikas,..can one of you folks do it? > > Sure. > > (In reply to Eclipse Genie from comment #21) > > New Gerrit change created: https://git.eclipse.org/r/85558 > > Dani, please review and give PMC approval if appropriate. I'm not an RT PMC member, and I can also not submit it, since I'm not a committer on Equinox.
(In reply to Dani Megert from comment #23) > I'm not an RT PMC member, and I can also not submit it, since I'm not a > committer on Equinox. I'm not sure how the RT PMC works but I added Thomas to give approval or to decline it.
No response from Thomas and RC3 is out of the door, moving back to the 4.7 stream.
(In reply to Lars Vogel from comment #25) > No response from Thomas and RC3 is out of the door, moving back to the 4.7 > stream. Why? 4.6.3 is way ahead.
Is this still on the list for 4.6.3?
This fix is already in the master for over 3 months and "if" there is no issue with the fix, this can be safely ported in 4.6.3. Stefan, Do you think this can be put in 4.6.3?
(In reply to Dani Megert from comment #27) > Is this still on the list for 4.6.3? I would not downport, the job framework received lots to other tweaks also. IMHO not worth the effort to validate the fix, especially that 4.7.0 is not to long away from 4.6.3.
Ok. Closing.