Bug 83949 - [api] Provide char based API in IDocument, IDocumentProvider and ITextStore
Summary: [api] Provide char based API in IDocument, IDocumentProvider and ITextStore
Status: RESOLVED DUPLICATE of bug 75086
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Platform-Text-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2005-01-28 12:17 EST by Keith McQueen CLA
Modified: 2005-02-22 12:31 EST (History)
0 users

See Also:


Attachments
Zip file containing LARGE text file (1.52 MB, application/x-zip-compressed)
2005-02-22 12:19 EST, Keith McQueen CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Keith McQueen CLA 2005-01-28 12:17:10 EST
I’ve noticed some inefficiencies in the IDocument, IDocumentProvider and
ITextStore (or their implementations) which could be resolved very simply by
adding overloads to the public void set(String) method.  My suggestion is that
extensions be provided declaring the following methods: public void set(char[])
and, optionally, public void set(byte[]) .  This would work much better for
large documents using the GapTextStore.  Because the GapTextStore uses a char[]
internally anyway, efficiency is lost (refer to the
setDocumentContent(IDocument, InputStream, String) method) by reading the input
stream into a StringBuffer, passing the resulting String to the IDocument.set()
method which passes it to the text store which, in most cases, turns it into a
char[].  It would be faster and consume less memory if the String/StringBuffer
were not created until absolutely necessary.  To make a long story longer, if
the input stream were simply read into a char[], and then the char[] passed into
the IDocument.set() then only those documents/text stores that need a String
need use a String which can simply be constructed from the char[].  The
GapTextStore would be able to immediately make use of the char[] as is.  I would
be more than willing to apply the very simple changes if I were given permission
to at least submit the changes for approval.

Note that this really does become a performance issue with large files.  My team
are writing an eclipse based product.  Our project manager tried opening a large
(31MB) file which could not be opened until the heap size was set to over 512MB.
 This memory overhead is directly related to the use of the StringBuffer/String
when its use was simply not called for.  If I could fix this issue it would
result in increased performance across the board for all eclipse users.

Sincerely,

Keith McQueen (hopeful eclipse developer)
Comment 1 Dani Megert CLA 2005-02-01 05:52:19 EST
Keith, you can provide a patch which implements your suggestions. Things to take
care of:

- existing API cannot be removed and has to be supported
- ensure that all tests in the following test projects still run:
	org.eclipse.core.filebuffers.tests
	org.eclipse.jdt.text.tests
	org.eclipse.jdt.ui.tests
	org.eclipse.jdt.ui.tests.refactoring
	org.eclipse.jface.text.tests
	org.eclipse.text.tests
	org.eclipse.ui.editors.tests
	org.eclipse.ui.workbench.texteditor.tests
	org.eclipse.ltk.core.refactoring.tests
	org.eclipse.ltk.ui.refactoring.tests
- write new test cases for the new functionality

Of course the benefit will only be achieved for clients that actually switch to
use the new API (a quick search found over 200 clients of IDocument.set(String)
but most of them are in test cases).
Comment 2 Keith McQueen CLA 2005-02-02 11:33:35 EST
I have been developing with eclipse, for an application based on the eclipse
platform, but this would be my first foray into actually developing for eclipse.
 Is there a protocol, or a set of guidelines to follow?  I have managed to set
up the eclipse CVS repository, so I can check things out, but how exactly will I
submit changes back to the CVS store.  It appears that there are a relative few
with such privileges.  Some more pointers would be nice.  Thank you very much.

Keith
Comment 3 Dani Megert CLA 2005-02-02 13:08:04 EST
You load one of the latest versions from CVS (e.g. last I-build) and then apply
your changes. For each project you then create a patch (select the project,
context menu > Team > Create Patch...) and attach it to this PR for review.

>Is there a protocol, or a set of guidelines to follow? 
Most important is to honor the current code style (e.g. don't reformat existing
code).
Comment 4 Keith McQueen CLA 2005-02-02 17:32:50 EST
Thank you for the pointers.  I'm sorry to be so annoying, but I checked out the
200502010800 version of the org.eclipse.jface.text project, but it doesn't seem
to build.  I get errors stating that (among other things) the class
DocumentRewriteSession cannot be resolved.  Should I just try an earlier version
or is there something else I need to do?  I really do appreciate your assistance.
Comment 5 Dani Megert CLA 2005-02-03 03:29:21 EST
I'd first start by looking at the Platform Text architecture and especially what
plug-ins belong to it. You are missing some dependent plug-ins. Dependent
plug-ins are listed in the plugin.xml's required section.
Comment 6 Keith McQueen CLA 2005-02-09 20:07:08 EST
Well, I've "hacked" around a bit in the eclipse code, and made my proposed
changes, but, to my dismay (though I'm sure you're not really surprised) I
didn't get the performance boost I had hoped for, in either speed or memory use.
 I'm not sure now what the best approach would be.  It seems like the line
tracker may have something to do with the large memory consumption, but I'm not
really sure about that either.  Do we have any recourse for handling large
documents (in excess of 20M)?  Our users are already complaining about
performance, but now I don't really know how to address the issue.  What do you
think?

Just FYI:
I am working with a large text file around 31M.  When opening the file in the
default editor, the used heap grows to about 451M, nearly 15 times the size of
the file.  I can garbage collect, which brings the used heap to about 224M, but
that is still nearly 7 times the size of the file.  I get the same results with
both the orginal (as is) eclipse code and my modified eclipse code.  I would
love see the used heap be around 3 times the file size.  This is my quest.  I
know that one of the problems I have addressing this issue is that I don't
really have a good java profiler at my disposal to determine precisely where the
trouble is.  I hate to be a bother, but your assistance would be greatly
appreciated.
Comment 7 Tom Hofmann CLA 2005-02-10 03:20:33 EST
There are other things but the Document that may use memory - for example, if
you're not using the Text editor, but a custom editor, it may allocate all kinds
of objects to track the document structure.

Also, if you're looking into using the text infrastructure for files > 10 or 20
MByte, it may be interesting to look at the memmap features of java.nio.

Have fun...
Comment 8 Dani Megert CLA 2005-02-10 13:16:24 EST
For a normal editor that does not use quick diff it should be once the file
size. If quick diff is enabled 3 times the file is used upon opening and two
once it is fully open. 7 is definitely too much. Can you tell who's holding on
to these?
Comment 9 Keith McQueen CLA 2005-02-10 13:27:37 EST
In the case where I stated the used heap size for a 31M file, I was just using
the default eclipse text editor.  I am not sure where all the memory is being
consumed (or rather who is doing it) but the behavior I notice is that opening
the file, the memory only goes to a certain point (~192M sits there for a
second), but then it all of a sudden skyrockets to the 451M level.  I was
wondering if it wasn't the  Abstract/DefaultLineTracker which creates a list of
Line objects.  Oviously there are very many lines in this 31M file, so that
would translate to a large collection of Line objects.  I tried to have the line
tracker not create all the lines at once, but that causes a number of problems
as well.  I considered having the lines built as needed (the ol' lazy loading
idea), but because I wasn't sure if that is really where the problem is, it
didn't seem worth the effort it would take.  So, to make a long story longer, I
don't really know where the memory hog is.  I don't have access to a good memory
debugger at this point.  The profiler(s) I do have for eclipse, don't really
work for memory debugging, or at least they are not intuitive to me.
Comment 10 Dani Megert CLA 2005-02-11 01:56:17 EST
Please note that memory shown Windows Task manager an similar system tools is
not very useful and only leads to speculations because some VMs never give back
allocated memory from the HEAP even if an object is already gone. The only way
to see what's going on is to use a profiler.

Having said that, if you think there's a memory leak or too much memory is used
then file a bug report and we can look at it.
Comment 11 Keith McQueen CLA 2005-02-11 14:13:25 EST
I was using the org.eclipse.ui.tools.heapstatus plugin to monitor memory usage
and force garbage collection.  I don't know how accurate it is really, but it is
much more accurate than the windows task manager.  I am concerned about memory
usage, because all the classes I have used are the eclipse platform classes,
there are none of our/my own classes here (using the default eclipse editor,
document, document provider, text store and line tracker).  There really just
does not seem to be support for large documents in eclipse.
Comment 12 Keith McQueen CLA 2005-02-22 12:19:39 EST
Created attachment 18191 [details]
Zip file containing LARGE text file

I know the attachment is large, but that's the point.  It includes a
large(IMHO) text file of about 15M (the original was around 31M, but I couldn't
compress it enough to send it).  I would appreciate it if you would just
observe how eclipse behaves when opening it in the default text editor.  Files
of this size are not uncommon for my users, but I really don't think that the
eclipse document/text framework scales (well) to files this large.  I don't
mean any disrespect, its just that the contents of the file seems to be
multiplied about 7 or 8 times in memory (at best and sometimes as much as 20
times).  Passing Strings around just ends up making more and more copies.
Comment 13 Dani Megert CLA 2005-02-22 12:31:20 EST

*** This bug has been marked as a duplicate of 75086 ***