Bug 89287 - [encoding] Reading unit contents allocated tons of object when retrieving charset
Summary: [encoding] Reading unit contents allocated tons of object when retrieving cha...
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.1   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.1 M7   Edit
Assignee: Rafael Chaves CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on: 90362
Blocks:
  Show dependency tree
 
Reported: 2005-03-28 16:14 EST by Philipe Mulet CLA
Modified: 2005-04-06 13:28 EDT (History)
0 users

See Also:


Attachments
memory profile snapshot (33.03 KB, image/gif)
2005-03-28 16:17 EST, Philipe Mulet CLA
no flags Details
patch for org.eclipse.core.resources (5.46 KB, patch)
2005-04-05 16:34 EDT, Rafael Chaves CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Philipe Mulet CLA 2005-03-28 16:14:57 EST
Build 20050328

When profiling some full build memory allocation, I noticed that when reading in
all unit sources, the platform allocates more objects to obtain the unit charset
than to retrieve the char[] source.
Comment 1 Philipe Mulet CLA 2005-03-28 16:17:59 EST
Created attachment 19244 [details]
memory profile snapshot
Comment 2 Rafael Chaves CLA 2005-03-28 16:39:07 EST
Thanks. I will investigate how to improve it. But note that determining the
correct encoding might involve more work than reading the contents, so I am not
surtprised it is more costly in terms of object allocation.

Just to confirm: the number/percentage columns refer to object allocation
operations, not total size of objects, correct?
Comment 3 Philipe Mulet CLA 2005-03-28 17:35:07 EST
Yes, number of allocations. Also, note that when reading contents, we need to
bufferize reading, since we cannot obtain the file length at once; so our
reading is not optimal there.
Comment 4 Philipe Mulet CLA 2005-04-05 04:29:49 EDT
My surprise came from the fact I use no special encoding in my workspace but the
platform default I presume, and the cost seems to be huge.

Did this occur in 3.0 ?
Comment 5 Rafael Chaves CLA 2005-04-05 10:54:57 EDT
I have not looked into this yet (will soon), but note that user-specified
encodings for the workspace and containers in general are only taken into
account as a last resort. The main flow for encoding determination is based on
content type determination. This has been the case since 3.0.
Comment 6 Rafael Chaves CLA 2005-04-05 16:33:48 EDT
Please disregard my previous comment. Comment 4 states the exact problem: we are
making a lot of effort looking for file-specific settings when no project
settings exist. Got the details with DJ on how to avoid that
(Preferences.nodeExists()). Will attach patch that does that soon. With the
patch, the call to CharsetManager#internalGetCharsetFor() (responsible for 60%
of objects created) should be avoided.

Need bug 90362 fixed for optimal results.

Comment 7 Rafael Chaves CLA 2005-04-05 16:34:51 EDT
Created attachment 19566 [details]
patch for org.eclipse.core.resources
Comment 8 Rafael Chaves CLA 2005-04-05 17:05:28 EDT
Released some minor changes to avoid creating garbage in ContentType#getId().
Comment 9 Rafael Chaves CLA 2005-04-06 13:21:41 EDT
Fixed. Released to HEAD.

Note that this PR is about creating garbage checking for the file encoding
forced by the user (which is seldom done).
Comment 10 Rafael Chaves CLA 2005-04-06 13:28:03 EDT
My last comment was not complete: 

Note that this PR is about creating garbage when checking for the file encoding
forced by the user (which is seldom done). This extra work was happening for
every call to IFile#getCharset. Now it only happens if the user actually has any
encoding settings at the project level (for the project in question), which is
not that rare.

For performance issues around content types, see umbrella bug 57137.