Community
Participate
Working Groups
Build 20050328 When profiling some full build memory allocation, I noticed that when reading in all unit sources, the platform allocates more objects to obtain the unit charset than to retrieve the char[] source.
Created attachment 19244 [details] memory profile snapshot
Thanks. I will investigate how to improve it. But note that determining the correct encoding might involve more work than reading the contents, so I am not surtprised it is more costly in terms of object allocation. Just to confirm: the number/percentage columns refer to object allocation operations, not total size of objects, correct?
Yes, number of allocations. Also, note that when reading contents, we need to bufferize reading, since we cannot obtain the file length at once; so our reading is not optimal there.
My surprise came from the fact I use no special encoding in my workspace but the platform default I presume, and the cost seems to be huge. Did this occur in 3.0 ?
I have not looked into this yet (will soon), but note that user-specified encodings for the workspace and containers in general are only taken into account as a last resort. The main flow for encoding determination is based on content type determination. This has been the case since 3.0.
Please disregard my previous comment. Comment 4 states the exact problem: we are making a lot of effort looking for file-specific settings when no project settings exist. Got the details with DJ on how to avoid that (Preferences.nodeExists()). Will attach patch that does that soon. With the patch, the call to CharsetManager#internalGetCharsetFor() (responsible for 60% of objects created) should be avoided. Need bug 90362 fixed for optimal results.
Created attachment 19566 [details] patch for org.eclipse.core.resources
Released some minor changes to avoid creating garbage in ContentType#getId().
Fixed. Released to HEAD. Note that this PR is about creating garbage checking for the file encoding forced by the user (which is seldom done).
My last comment was not complete: Note that this PR is about creating garbage when checking for the file encoding forced by the user (which is seldom done). This extra work was happening for every call to IFile#getCharset. Now it only happens if the user actually has any encoding settings at the project level (for the project in question), which is not that rare. For performance issues around content types, see umbrella bug 57137.