Community
Participate
Working Groups
Created attachment 94630 [details] proposed patch Some IUs contain large amounts of text that we end up escaping as we write it as XML. As an example licensing info. We're currently using String concatination but should instead use a StringBuffer.
What is the gain, given that this force us to have another copy of the string in memory?
For the tptp_min testcase this is saving a little over 1s (on my laptop). Writing manifests and especially licenses are the main culprits. Also, this is just for serializing the IUs in the profile but would also help whenever we persist any repo changes during generation or mirroring. If there are no characters to escape we don't allocate a replacement buffer and eventually return the original string. When there are characters to escape I think the memory footprint should be similar to the current implementation.
This change looks good to me. The old code was really inefficient: txt = txt.substring(0, i) + replace + txt.substring(i + 1); This creates at least two garbage strings, and a string buffer, for each character that is escaped. The new code just creates a single string buffer for the whole escaping operation.
The only tweak I would make is to initialize the string buffer so it is large enough to fit the entire original text, plus some extra space for escaped characters. Otherwise the buffer may need to grow several times and create more garbage.
Patch released with John's suggestion - using the string length + 16 as the inital buffer size.