Bug 571252 - [performance] cache JavaElement.javaProject
Summary: [performance] cache JavaElement.javaProject
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.20   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.21 M1   Edit
Assignee: Jörg Kubitz CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on: 574115
Blocks:
  Show dependency tree
 
Reported: 2021-02-17 06:36 EST by Jörg Kubitz CLA
Modified: 2021-07-08 01:28 EDT (History)
5 users (show)

See Also:


Attachments
Screnshot of VisualVM 2.0.5 during full build showing getJavaProject (132.81 KB, image/png)
2021-02-17 06:37 EST, Jörg Kubitz CLA
no flags Details
sampling changes compared (54.62 KB, image/png)
2021-03-22 07:15 EDT, Jörg Kubitz CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Kubitz CLA 2021-02-17 06:36:17 EST
Since parent is effective final the result will never change.
We got a big xtext Project where JavaElement.getJavaProject() is a hotspot.

The same optimisation can be applied to 
 getJavaModel()
 getAncestor()
 isAncestorOf()
even though they are no known hotspots for us. Its cheep to cache.
Comment 1 Jörg Kubitz CLA 2021-02-17 06:37:26 EST
Created attachment 285577 [details]
Screnshot of VisualVM 2.0.5 during full build showing getJavaProject
Comment 2 Eclipse Genie CLA 2021-02-17 07:00:09 EST
New Gerrit change created: https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/176407
Comment 3 Jörg Kubitz CLA 2021-02-17 08:27:55 EST
Did not cache:

 getAncestor(r)
 isAncestorOf(e)

since they take an argument which is kinda overkill to cache if not a hotspot.
Comment 4 Manoj N Palat CLA 2021-03-19 00:02:17 EDT
@Jörg : any performance data available showing before and after patch ?
Comment 5 Jörg Kubitz CLA 2021-03-22 07:14:01 EDT
> @Jörg : any performance data available showing before and after patch ?
Hi Manoj, i compared too recent samples with and without patch. The total time used (according to sampling) was reduced from 968ms to 191ms. i.e. 5x faster.
Comment 6 Jörg Kubitz CLA 2021-03-22 07:15:39 EDT
Created attachment 285907 [details]
sampling changes compared
Comment 7 Manoj N Palat CLA 2021-04-22 05:18:52 EDT
Moving target to the next M1
Comment 9 Jay Arthanareeswaran CLA 2021-06-08 02:29:36 EDT
Fabrice, since you reviewed this,

did you consider the extra memory the caching is going to cost us? Do you have the numbers for the same? Please share it here.
Comment 10 Jay Arthanareeswaran CLA 2021-06-08 05:55:44 EDT
(In reply to Jörg Kubitz from comment #6)
> Created attachment 285907 [details]
> sampling changes compared

Is the CPU figure given is average or actual? This comparison is not helping since the number of hits is not same (25 vs 111). Are we experiencing less number of hits to getJavaProject() after the patch?

It will be lot easier if the snapshots themselves are attached rather than a screenshot. Can you please do that?
Comment 11 Jörg Kubitz CLA 2021-06-08 06:16:50 EDT
(In reply to Jay Arthanareeswaran from comment #10)
> (In reply to Jörg Kubitz from comment #6)
> > Created attachment 285907 [details]
> > sampling changes compared
> 
> Is the CPU figure given is average or actual? This comparison is not helping
> since the number of hits is not same (25 vs 111). Are we experiencing less
> number of hits to getJavaProject() after the patch?
> 
> It will be lot easier if the snapshots themselves are attached rather than a
> screenshot. Can you please do that?

I doubt i am allowed to share stacktraces of proprietary closed source code involved.
The figures are from *sampling* done with VisualVM. This works by taking stacktraces at a fixed rate. The number of stacktraces for a certain function are reported as "Hits". The more "Hits" the more time was spend on that function. Given the sampling frequency is high enough the time spend is approximately proportional to the number of "Hits". It was the very reason for this change to reduce the time. => It is the EXPECTED and WANTED behaviour to reduce the number of Hits.
Please do not confuse "sampling" with "profiling". To ask about "average or actual" does not make any sense in context of sampling.
Comment 12 Jay Arthanareeswaran CLA 2021-06-09 00:09:35 EDT
(In reply to Jörg Kubitz from comment #11)
> I doubt i am allowed to share stacktraces of proprietary closed source code
> involved.
> The figures are from *sampling* done with VisualVM. This works by taking
> stacktraces at a fixed rate. The number of stacktraces for a certain
> function are reported as "Hits". The more "Hits" the more time was spend on
> that function. Given the sampling frequency is high enough the time spend is
> approximately proportional to the number of "Hits". It was the very reason
> for this change to reduce the time. => It is the EXPECTED and WANTED
> behaviour to reduce the number of Hits.
> Please do not confuse "sampling" with "profiling". To ask about "average or
> actual" does not make any sense in context of sampling.

I see. Sorry I didn't notice you mentioning sampling earlier. And I am not that familiar with VisualVM interface. That's why a snapshot is much more useful than a trimmed screenshot none (other than you, i.e.) have any clue about. If you want to make changes like these that involves so many files and touching APIs, we can never be too careful.

Also, do respond to comment #9 as well. I doubt Fabrice had time to think about it.
Comment 13 Sravan Kumar Lakkimsetti CLA 2021-06-09 00:18:56 EDT
Reopening as we need to investigate comparator errors caused see bug 574089
Comment 14 Sravan Kumar Lakkimsetti CLA 2021-06-09 00:54:01 EDT
(In reply to Sravan Kumar Lakkimsetti from comment #13)
> Reopening as we need to investigate comparator errors caused see bug 574089

Sorry wrong its bug 574088
Comment 15 Manoj N Palat CLA 2021-06-09 01:27:13 EDT
(In reply to Eclipse Genie from comment #8)
> Gerrit change https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/176407 was
> merged to [master].

@Fabrice: I had assigned Kalyan as an additional reviewer in the gerrit - You need to wait for his +1 before pushing this change. Please be mindful of this in the future. Thanks
Comment 16 Jörg Kubitz CLA 2021-06-09 01:33:39 EDT
(In reply to Jay Arthanareeswaran from comment #9)
> did you consider the extra memory the caching is going to cost us? Do you
> have the numbers for the same? Please share it here.

I added 2 References per JavaElement which are at most (if not erased by compiler)) 2*32 Bit pointers if using 32 Bit JVM or 64 Bit JVM with compressed references. Without compressed references it would be 2*64 Bit on a 64 Bit JVM.
I did not measure it because it is obvious by specification.
Feel free to verify it with any usecase.
Comment 17 Jay Arthanareeswaran CLA 2021-06-09 01:43:55 EDT
(In reply to Jörg Kubitz from comment #16)
> I added 2 References per JavaElement which are at most (if not erased by
> compiler)) 2*32 Bit pointers if using 32 Bit JVM or 64 Bit JVM with
> compressed references. Without compressed references it would be 2*64 Bit on
> a 64 Bit JVM.
> I did not measure it because it is obvious by specification.
> Feel free to verify it with any usecase.

That much is obvious from your patch. What I want to know is, for the performance boost you are claiming (for your use case i.e.), what is the cost in terms of memory? So far, you have not shared any data that shows how many JavaElement objects were created. Correct me if I am wrong.
Comment 18 Jörg Kubitz CLA 2021-06-09 03:38:35 EDT
(In reply to Jay Arthanareeswaran from comment #17)

The measurement is now 3 month old - i cannot reproduce exactly that anymore. Also i doubt the size of my workspace is relevant to you nor could you verify it. Nor is memory any issue on my setup.

If i clean compile "org.eclipse.jdt.core" (something you could verify but does not involve xtext) i end up with:
 27268 JavaElements created
194143 calls to getJavaProject() (i.e. ~ 7 calls per element)
 24864 calls to getJavaModel() (i.e. ~ 1 call per element)

27268 JavaElements * 8 bytes = 0.2 MB
Comment 19 Jörg Kubitz CLA 2021-06-09 04:59:14 EDT
For curiosity: In our current setup we have
 10,043,007 JavaModel instances
 32,818,934 calls to getJavaProject() i.e. ~ 3 calls per instance during build

The speedup for us comes at the expense of ~ 80 MB.
Comment 20 Jay Arthanareeswaran CLA 2021-06-09 05:52:23 EDT
(In reply to Jörg Kubitz from comment #18)
> (In reply to Jay Arthanareeswaran from comment #17)
> 
> The measurement is now 3 month old - i cannot reproduce exactly that
> anymore. Also i doubt the size of my workspace is relevant to you nor could
> you verify it. Nor is memory any issue on my setup.

I just wanted to see the trade-off.

(In reply to Jörg Kubitz from comment #19)
> For curiosity: In our current setup we have
>  10,043,007 JavaModel instances
>  32,818,934 calls to getJavaProject() i.e. ~ 3 calls per instance during
> build
> 
> The speedup for us comes at the expense of ~ 80 MB.

That's the number I wanted to see. A memory demand of this magnitude can't be called negligible, IMO.

In my full JDT workspace, a workspace open followed by a full build creates 364,098 elements and another clean build creates 180,006 Java element Objects.  One can argue that these get GCed, but we have to make sure we don't push the default heap limit.
Comment 21 Jörg Kubitz CLA 2021-06-09 06:19:26 EDT
(In reply to Jay Arthanareeswaran from comment #20)
> > The speedup for us comes at the expense of ~ 80 MB.
> 
> That's the number I wanted to see. A memory demand of this magnitude can't
> be called negligible, IMO.

For our workspace - where we allocate hundreds of GB at the same time - it is just noise.

> In my full JDT workspace, a workspace open followed by a full build creates
> 364,098 elements and another clean build creates 180,006 Java element
> Objects.  One can argue that these get GCed, but we have to make sure we
> don't push the default heap limit.

Then please share the numbers how much it influenced your heap limit. It should be well below 2 MB. thats ok for me.
Comment 22 Jay Arthanareeswaran CLA 2021-06-09 06:35:28 EDT
(In reply to Jörg Kubitz from comment #21)
> For our workspace - where we allocate hundreds of GB at the same time - it
> is just noise.
>
> Then please share the numbers how much it influenced your heap limit. It
> should be well below 2 MB. thats ok for me.

That's the thing. I don't have the liberty to say "that's okay for me" :), even though I run it with heap that's much larger than the default of 2GB and I didn't see any impact on the memory with your patch. But then I have to think about several others who may never touch their memory settings. For the record, this is not the first time somebody said, "memory is cheap these days". But as always, we have to think about all type of Eclipse users. So, we try to get the full picture before letting any performance fix in.

Anyway, the change is in, and I suppose we can let this remain and see how it goes.
Comment 23 Jörg Kubitz CLA 2021-06-09 06:44:44 EDT
> That's the thing. I don't have the liberty to say "that's okay for me" :),
> even though I run it with heap that's much larger than the default of 2GB
> and I didn't see any impact on the memory with your patch. But then I have
> to think about several others who may never touch their memory settings. For
> the record, this is not the first time somebody said, "memory is cheap these
> days". But as always, we have to think about all type of Eclipse users. So,
> we try to get the full picture before letting any performance fix in.

Sorry i did not know how much JDT worries about memory. We could probably shrink that memory by avoiding to cache JavaModel. I am not familiar with the details of that model hierarchy but isn't JavaModel always the parent of the (cached) JavaProject? (or even always the JavaModelManager.getJavaModelManager().getJavaModel() Singleton?)
Comment 24 Sarika Sinha CLA 2021-06-10 10:26:14 EDT
I was wondering if we could add an option for people to decide if they want to choose performance over memory? 

So projects affording more memory can use this feature by selecting a preference?
Comment 25 Jörg Kubitz CLA 2021-06-10 10:31:19 EDT
(In reply to Sarika Sinha from comment #24)
> I was wondering if we could add an option for people to decide if they want
> to choose performance over memory? 
> 
> So projects affording more memory can use this feature by selecting a
> preference?

In general a nice idea. In this particular case impossible (or way too complicated) - even an empty reference eats the same amount inside the datastructure. Also introducing conditionals for the option might eat performance.
Comment 26 Sarika Sinha CLA 2021-06-10 10:39:27 EDT
(In reply to Jörg Kubitz from comment #25)
> (In reply to Sarika Sinha from comment #24)
> > I was wondering if we could add an option for people to decide if they want
> > to choose performance over memory? 
> > 
> > So projects affording more memory can use this feature by selecting a
> > preference?
> 
> In general a nice idea. In this particular case impossible (or way too
> complicated) - even an empty reference eats the same amount inside the
> datastructure. Also introducing conditionals for the option might eat
> performance.

Yes, it will be tricky to achieve here. We need 2 different set of implementations itself. And one of the sets should be loaded based on the selection.
Comment 27 Jörg Kubitz CLA 2021-06-10 11:03:22 EDT
(In reply to Sarika Sinha from comment #26)
> Yes, it will be tricky to achieve here. We need 2 different set of
> implementations itself. 

Having 2 different set of implementations would render the types not effective final which could introduce conditional jumps by the hotspot compiler. i.e. performance loss.
Comment 29 Jay Arthanareeswaran CLA 2021-07-08 01:28:59 EDT
Verified for 4.21 M1.