Bug 403531 - [performance] SaxContextParser creates many SimpleDateFormat instances
Summary: [performance] SaxContextParser creates many SimpleDateFormat instances
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 minor (vote)
Target Milestone: 3.10   Edit
Assignee: C. Thompson CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed
Depends on:
Blocks: 356109
  Show dependency tree
 
Reported: 2013-03-15 19:17 EDT by C. Thompson CLA
Modified: 2013-11-18 13:49 EST (History)
3 users (show)

See Also:


Attachments
Screen shot (29.55 KB, image/png)
2013-03-15 19:17 EDT, C. Thompson CLA
no flags Details
Patch for bug (1.64 KB, patch)
2013-03-15 19:26 EDT, C. Thompson CLA
no flags Details | Diff
Performance Test Patch (108.46 KB, application/octet-stream)
2013-10-29 21:11 EDT, C. Thompson CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description C. Thompson CLA 2013-03-15 19:17:08 EDT
Created attachment 228517 [details]
Screen shot

I ran a profiling to use the SaxContextParser and found "SimpleDateFormat" is created every time a start element is hit. That takes about 50% of the computing time for the SaxContextParser. See attached picture.

This should be filed as a subtask of 356109
Comment 1 C. Thompson CLA 2013-03-15 19:26:07 EDT
Created attachment 228518 [details]
Patch for bug

Fix for bug.
Comment 2 Tomasz Zarna CLA 2013-08-05 08:10:02 EDT
Hi Casey, thanks for the patch! Would you mind pushing it to Gerrit [1]?

[1] http://wiki.eclipse.org/Mylyn_Contributor_Reference#Reviews
Comment 3 Sam Davis CLA 2013-08-06 12:09:40 EDT
What version of Mylyn did you profile? I ask because I see "ca.ubc.cs.mylyn" in the screenshot. Also, note that SimpleDateFormat is not threadsafe so we'd need to make sure the field is only accessed by one thread.
Comment 4 C. Thompson CLA 2013-08-06 13:34:09 EDT
I profiled the master from git.
Comment 5 Tomasz Zarna CLA 2013-08-13 05:47:31 EDT
Casey's review: https://git.eclipse.org/r/#/c/15386/

(In reply to comment #3)
> Also, note that SimpleDateFormat is not threadsafe so
> we'd need to make sure the field is only accessed by one thread.

This could be accomplished by using org.apache.commons.lang.time.FastDateFormat, but only if used for formatting. See https://git.eclipse.org/r/#/c/15272/1/org.eclipse.mylyn.reviews.ui/src/org/eclipse/mylyn/internal/reviews/ui/providers/ReviewsLabelProvider.java for an example
Comment 6 Sam Davis CLA 2013-08-13 14:23:42 EDT
(In reply to comment #3)
> I see "ca.ubc.cs.mylyn" in
> the screenshot. 

I am still very curious about this. I assume that was the package name used before Mylyn became an Eclipse project.
Comment 7 C. Thompson CLA 2013-08-13 14:29:06 EDT
I am using mylyn core in my own Ph.D. research project. I am a student at UBC, so it is labeled very similar to the original project name.

My research project needs to open about 5000 context files. It was taking too long. So I profiled and found the biggest sink and fixed it.
Comment 8 C. Thompson CLA 2013-08-13 14:29:40 EDT
Do you want me to make this update?

(In reply to comment #5)
> Casey's review: https://git.eclipse.org/r/#/c/15386/
> 
> (In reply to comment #3)
> > Also, note that SimpleDateFormat is not threadsafe so
> > we'd need to make sure the field is only accessed by one thread.
> 
> This could be accomplished by using
> org.apache.commons.lang.time.FastDateFormat, but only if used for
> formatting. See
> https://git.eclipse.org/r/#/c/15272/1/org.eclipse.mylyn.reviews.ui/src/org/
> eclipse/mylyn/internal/reviews/ui/providers/ReviewsLabelProvider.java for an
> example
Comment 9 Sam Davis CLA 2013-08-13 14:57:10 EDT
I see, thanks for the explanation. It's great that this improves the performance; it would be a valuable contribution.

But I don't think it's guaranteed to be accessed from only one thread at a time. It's not clear to me that we can use FastDateFormat because there's a "slight" incompatibility with SimpleDateFormat to do with time zones, and I don't know what "Only formatting is supported" means. Unless we can make sure that FastDateFormat will work here, I think we should either synchronize access to the formatter by using a synchronized get method or use a ThreadLocal field. The latter might be faster. Tomek, what do you think? I don't know how much this will impact the performance improvement but we'd need to make sure there was still an improvement after making that change.

I wonder if there are other places where we could make the same optimization?
Comment 10 C. Thompson CLA 2013-08-13 15:10:15 EDT
How do I show there is still an improvement after making the change? 

When I made the change to my own project, the profiler didn't show any percentage SimpleDateFormat.<init>. I assume was was sped up by 19%. 

(In reply to comment #9)
> I see, thanks for the explanation. It's great that this improves the
> performance; it would be a valuable contribution.
> 
> But I don't think it's guaranteed to be accessed from only one thread at a
> time. It's not clear to me that we can use FastDateFormat because there's a
> "slight" incompatibility with SimpleDateFormat to do with time zones, and I
> don't know what "Only formatting is supported" means. Unless we can make
> sure that FastDateFormat will work here, I think we should either
> synchronize access to the formatter by using a synchronized get method or
> use a ThreadLocal field. The latter might be faster. Tomek, what do you
> think? I don't know how much this will impact the performance improvement
> but we'd need to make sure there was still an improvement after making that
> change.
> 
> I wonder if there are other places where we could make the same optimization?
Comment 11 Steffen Pingel CLA 2013-08-13 17:12:36 EDT
(In reply to comment #9) 
> But I don't think it's guaranteed to be accessed from only one thread at a time.

Sam, can you clarify which code path you are concerned about? I don't see any sharing of the SAX parser instance between threads.
Comment 12 Sam Davis CLA 2013-08-13 17:20:28 EDT
I think you're right Steffen, the SaxContextContentHandler instance is not shared between threads. In that case, if we just make the dateFormat field non-static it should be fine.
Comment 13 Tomasz Zarna CLA 2013-08-19 05:30:36 EDT
(In reply to comment #8)
> Do you want me to make this update?

See my comments on the review and comment 12 from Sam.

(In reply to comment #9)
> and I don't know what "Only formatting is supported" means. 

You cannot use FastDateFormat for parsing a date, as you could do with SimpleDateFormat


> I think we should either
> synchronize access to the formatter by using a synchronized get method or
> use a ThreadLocal field. The latter might be faster. Tomek, what do you
> think? I don't know how much this will impact the performance improvement
> but we'd need to make sure there was still an improvement after making that
> change.

Yes, that's why I asked for a simple performance test on the review.

> I wonder if there are other places where we could make the same optimization?

Casey, have you noticed any other bottle necks with SimpleDateFormat involved?

(In reply to comment #10)
> How do I show there is still an improvement after making the change? 
> 
> When I made the change to my own project, the profiler didn't show any
> percentage SimpleDateFormat.<init>. I assume was was sped up by 19%. 

With a perf test in place we could assert the same without running the profiler, by executing the test without and with your patch applied.
Comment 14 C. Thompson CLA 2013-08-19 09:53:28 EDT
How do I do a performance test?
Are there examples of performance test in the test suite?
Should I do something like:

long startTime = System.currentTimeMillis();

doSomething();

long finishTime = System.currentTimeMillis();

System.out.println("That took: "+(finishTime-startTime)+ " ms");




(In reply to comment #13)
> Casey, have you noticed any other bottle necks with SimpleDateFormat
> involved?
> 
> (In reply to comment #10)
> > How do I show there is still an improvement after making the change? 
> > 
> > When I made the change to my own project, the profiler didn't show any
> > percentage SimpleDateFormat.<init>. I assume was was sped up by 19%. 
> 
> With a perf test in place we could assert the same without running the
> profiler, by executing the test without and with your patch applied.
Comment 16 C. Thompson CLA 2013-09-23 19:37:57 EDT
I see how I can write a performance test. The strange part is how do I show the difference between the two implementations? 

Do you require only one performance test?
Comment 17 Tomasz Zarna CLA 2013-09-24 06:51:23 EDT
Yes, one perf test would be enough. We will run it twice, without and with the patch to see the diff and if it's worth it, the patch will go in.

I doubt there is a point in having this test to be part of a bigger suite run regularly [1]. I don't even think we have such tests for Mylyn [2]. Even thoug it's going to be a one-off, that would be a great example for other perf improvements to come.

[1] like we used for Eclipse Platform http://archive.eclipse.org/eclipse/downloads/drops/R-3.7.2-201202080800/performance/performance.php
[2] I guess AllPerformanceTests are run manually, on demand.
Comment 18 Sam Davis CLA 2013-10-04 11:32:24 EDT
I have made the change I mentioned in comment 12. With that, I think this is clearly a good change and we should merge it even though we don't have a performance test. Thanks very much, Casey, for the contribution!
Comment 19 C. Thompson CLA 2013-10-04 12:33:50 EDT
I am going to still try getting the performance test out there. I am just having a few problems setting up the test environment.
Comment 20 Sam Davis CLA 2013-10-04 12:39:24 EDT
That's good to hear! I would like to get this fix merged soon, though, so that we can bootstrap on it for a little while before the upcoming release. If the performance test is not ready then, we'll be happy to merge it later in a separate review.
Comment 21 Sam Davis CLA 2013-10-04 16:30:52 EDT
I have merged the change. Feel free to submit a review with the performance test and add me and Tomasz as reviewers.
Comment 22 C. Thompson CLA 2013-10-29 21:11:11 EDT
Created attachment 237052 [details]
Performance Test Patch

I have created a performance test. I have added the patch to the bug.
Comment 23 Tomasz Zarna CLA 2013-10-30 07:16:55 EDT
(In reply to C. Thompson from comment #22)
> I have created a performance test. I have added the patch to the bug.

Please push it to Gerrit.
Comment 24 C. Thompson CLA 2013-10-30 18:59:24 EDT
Gerrit code review:
https://git.eclipse.org/r/#/c/17909/
Comment 25 C. Thompson CLA 2013-10-31 13:20:23 EDT
new Gerrit id:
https://git.eclipse.org/r/#/c/17931/
Comment 26 Sam Davis CLA 2013-10-31 14:19:40 EDT
The change-id in the commit message identifies the review. To push a new patch set to a review you need to amend the commit and keep the same change id. Please abandon one of the 2 reviews you pushed. Thanks!
Comment 27 C. Thompson CLA 2013-11-15 19:04:24 EST
What needs to be done to get this task closed? Should I make another patch and push it to gerrit?
Comment 28 Sam Davis CLA 2013-11-18 13:49:58 EST
Currently you have the following 2 reviews:

17909: https://bugs.eclipse.org/bugs/show_bug.cgi?id=403531 [Ia8298fea]
https://git.eclipse.org/r/#/c/17909/

17931: 403531: Made changes according to comments on gerrit Bug: 403531 Task-Url: https://bugs.eclipse.org/bugs/show_bug.cgi?id=403531 [Id1113b56]
https://git.eclipse.org/r/#/c/17931/

The second depends on the first. Instead we should have one review which contains the changes from both and has a summary describing the change, e.g. "add a performance test for SaxContextParser"

I suggest that you restore the first review, abandon the second review, and ammend the first review so it includes the changes from the second.