89287 – [encoding] Reading unit contents allocated tons of object when retrieving charset

Bug 89287 - [encoding] Reading unit contents allocated tons of object when retrieving charset

Summary: [encoding] Reading unit contents allocated tons of object when retrieving cha...

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Resources (show other bugs)
Version:	3.1
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.1 M7
Assignee:	Rafael Chaves
QA Contact:

URL:
Whiteboard:
Keywords:	performance

Depends on:	90362
Blocks:
	Show dependency tree

Reported:	2005-03-28 16:14 EST by Philipe Mulet
Modified:	2005-04-06 13:28 EDT (History)
CC List:	0 users

See Also:

Attachments
memory profile snapshot (33.03 KB, image/gif) 2005-03-28 16:17 EST, Philipe Mulet	no flags	Details
patch for org.eclipse.core.resources (5.46 KB, patch) 2005-04-05 16:34 EDT, Rafael Chaves	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philipe Mulet

2005-03-28 16:14:57 EST

Build 20050328

When profiling some full build memory allocation, I noticed that when reading in
all unit sources, the platform allocates more objects to obtain the unit charset
than to retrieve the char[] source.

Comment 1 Philipe Mulet

2005-03-28 16:17:59 EST

Created attachment 19244 [details]
memory profile snapshot

Comment 2 Rafael Chaves

2005-03-28 16:39:07 EST

Thanks. I will investigate how to improve it. But note that determining the
correct encoding might involve more work than reading the contents, so I am not
surtprised it is more costly in terms of object allocation.

Just to confirm: the number/percentage columns refer to object allocation
operations, not total size of objects, correct?

Comment 3 Philipe Mulet

2005-03-28 17:35:07 EST

Yes, number of allocations. Also, note that when reading contents, we need to
bufferize reading, since we cannot obtain the file length at once; so our
reading is not optimal there.

Comment 4 Philipe Mulet

2005-04-05 04:29:49 EDT

My surprise came from the fact I use no special encoding in my workspace but the
platform default I presume, and the cost seems to be huge.

Did this occur in 3.0 ?

Comment 5 Rafael Chaves

2005-04-05 10:54:57 EDT

I have not looked into this yet (will soon), but note that user-specified
encodings for the workspace and containers in general are only taken into
account as a last resort. The main flow for encoding determination is based on
content type determination. This has been the case since 3.0.

Comment 6 Rafael Chaves

2005-04-05 16:33:48 EDT

Please disregard my previous comment. Comment 4 states the exact problem: we are
making a lot of effort looking for file-specific settings when no project
settings exist. Got the details with DJ on how to avoid that
(Preferences.nodeExists()). Will attach patch that does that soon. With the
patch, the call to CharsetManager#internalGetCharsetFor() (responsible for 60%
of objects created) should be avoided.

Need bug 90362 fixed for optimal results.

Comment 7 Rafael Chaves

2005-04-05 16:34:51 EDT

Created attachment 19566 [details]
patch for org.eclipse.core.resources

Comment 8 Rafael Chaves

2005-04-05 17:05:28 EDT

Released some minor changes to avoid creating garbage in ContentType#getId().

Comment 9 Rafael Chaves

2005-04-06 13:21:41 EDT

Fixed. Released to HEAD.

Note that this PR is about creating garbage checking for the file encoding
forced by the user (which is seldom done).

Comment 10 Rafael Chaves

2005-04-06 13:28:03 EDT

My last comment was not complete: 

Note that this PR is about creating garbage when checking for the file encoding
forced by the user (which is seldom done). This extra work was happening for
every call to IFile#getCharset. Now it only happens if the user actually has any
encoding settings at the project level (for the project in question), which is
not that rare.

For performance issues around content types, see umbrella bug 57137.