Community
Participate
Working Groups
Hi This is linked to <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=84872">bug 84872</a> I had a weird idea, so I thought I'd try it out. Since most java identifier strings are ascii, the normal java string encoding (utf-16) is thus about 50% efficient. So I converted ClassFile to store it's strings in utf-8 form using byte arrays. This reduced the memory retained by ClassFile instances by 34% (as measured by YourKit). I'm not sure what this will do to CPU performance, since there is now additional conversion happening, but the memory saved may be worth it for larger projects. Also, I tried converting PackageFragment#names to perform the same trick, but that only effected a 1.2% saving, so I dumped that part of the patch.
Created attachment 20225 [details] compress strings using utf-8 encoding
Recreating strings all the time must be deadly for GC. It would benefit from recoding algorithms to perform on byte[] as well, so as to save internal string creations.
Not necessarily. Virtually of the "re-created" objects would be short-lived and would cycle through the eden space almost immediately. We create temporary objects everywhere constantly. I contemplated writing a "UTF8String" class that would reduce the conversions required and also interoperate with String, StringBuffer, etc. but concluded that until a real need grew, we were better off with a simple solution. But I could be persuaded -grin- ... Also, writing a UTFString class would remove some of the memory benefit, since an additional object would be required. Unless I made all of the methods on UTF8String static, and left the stored type as a byte []... Hmmm, slightly awkward to use, but it could be workable.
Interesting idea. Will consider post 3.2.
As of now 'LATER' and 'REMIND' resolutions are no longer supported. Please reopen this bug if it is still valid for you.