Bug 577649 - Split runs more efficiently when wrapping
Summary: Split runs more efficiently when wrapping
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.23   Edit
Hardware: PC Windows 10
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform-SWT-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 168557 577238
  Show dependency tree
 
Reported: 2021-12-06 10:39 EST by Jonah Graham CLA
Modified: 2021-12-06 14:38 EST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonah Graham CLA 2021-12-06 10:39:58 EST
In TextLayout there are a lot of native buffers that are associated with each run (StyleItem). At the moment when we split a run, for example to achieve word wrapping, we are allocating additional buffers, when the existing ones can just be resused with a bit of pointer arithmetic.

e.g. Imagine a run of text that is:

Hello this is a run of text.

And the word wrapping determines that wrapping between "a" and "run" is best. In today's code we have a run containing (amongst other things) a list of glyph indexes into the font. In this case an array >= 29 elements. When we split, we allocate two new arrays, one of length 17 and one with the rest of the string.

However we only needed to split the existing array, so the first run points to the beginning of the array, and the second run points at the wrap point.

The above is pretty straightforward, but leads to each run no longer owning their memory.

Therefore I want to use a reference counted buffer so that each run can keep the native memory live.

(Note the actual numbers above are demonstrative and may not 100% match the sizes of arrays in practice, which are often bigger)
Comment 1 Nikita Nemkin CLA 2021-12-06 11:14:45 EST
Do you really need a full reference count? Since runs are always deallocated en-masse, I think a simple "owner" flag is enough (true for the original run, false for the splits).

Also, reshaping split runs might generate more glyphs than available in the original buffer. This is rare, but needs to be handled by copying.
Comment 2 Jonah Graham CLA 2021-12-06 11:29:54 EST
(In reply to Nikita Nemkin from comment #1)
> Do you really need a full reference count? Since runs are always deallocated
> en-masse, I think a simple "owner" flag is enough (true for the original
> run, false for the splits).

That probably is enough - my prototype did something like that (https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/188556) - Perhaps I was just trying to overengineer.

As for the reshaping - yes large buffers are needed if a new shape is called, but one of the current problems with wrapping is that shape is called unnecessarily, when a simple split of the original shaped results is all that is needed. 

I have renamed this bug accordingly.
Comment 3 Nikita Nemkin CLA 2021-12-06 13:11:09 EST
(In reply to Jonah Graham from comment #2)
>
> As for the reshaping - yes large buffers are needed if a new shape is
> called, but one of the current problems with wrapping is that shape is
> called unnecessarily, when a simple split of the original shaped results is
> all that is needed. 

You can call ScriptShape with an old buffer size and only allocate if it returns E_OUTOFMEMORY.

How do you plan to ensure that no reshaping is necessary? AFAIK Uniscribe doesn't indicate shaping boundaries.
Comment 4 Jonah Graham CLA 2021-12-06 13:56:21 EST
(In reply to Nikita Nemkin from comment #3)
> (In reply to Jonah Graham from comment #2)
> >
> > As for the reshaping - yes large buffers are needed if a new shape is
> > called, but one of the current problems with wrapping is that shape is
> > called unnecessarily, when a simple split of the original shaped results is
> > all that is needed. 
> 
> You can call ScriptShape with an old buffer size and only allocate if it
> returns E_OUTOFMEMORY.

The issue isn't slightly too large buffers, but running shape on massively large inputs.

e.g. if you have a 3000 character line and the wrapping happens at column 100, the ScriptShape/Place calls run on buffers of this sizes:

- 30000
- 100
- 29900
- 100
- 29800
- 100
- etc...

Basically the current algorithm finds the first wrap point, splits the input run into two runs, runs shape on both those new runs and then repeats. 

A simple way of doing the splitting is just to split all the runs into smallish pieces and then run the rest of the code. For example, if we split into 250 character chunks before doing wrapping then the sizes of the calls to shape are:

- 250
- 100
- 150
- 100
- 50
- 250
- 50
- 200
- 100
and repeats forever.

So for the same 3000 character input you are "re" shaping a lot less input, and the wrapping happens much faster for long inputs.


> How do you plan to ensure that no reshaping is necessary? AFAIK Uniscribe
> doesn't indicate shaping boundaries.

Still working on that :-) For simple scripts it can probably be skipped as the results are always the same as splitting up old ones. Obviously ScriptBreak tells you where logically you can split a run, but I don't understand how/when that invalidates previous ScriptShape/Place
Comment 5 Nikita Nemkin CLA 2021-12-06 14:19:22 EST
I understand the problem now, it's pretty bad indeed.

For shaping boundaries, the discussion on this page https://github.com/harfbuzz/harfbuzz/issues/1463 (+ various linked issues, especially https://github.com/yeslogic/allsorts/issues/29) is VERY enlightening.

It's pretty safe to assume that whitespace is a shaping boundary. Since it's also a line breaking boundary, the most common case doesn't need reshaping at all. For other cases, heuristics are necessary.

Ideally, we'd switch to DirectWrite, which does provide this information, but it would be very difficult to fit into the existing GC implementation.
Comment 6 Jonah Graham CLA 2021-12-06 14:38:48 EST
(In reply to Nikita Nemkin from comment #5)
Thank you for the links - there are lots of useful uniscribe pages, like[1], but they all stop short of how to handle wrapping!

[1] http://www.catch22.net/tuts/neatpad/uniscribe-mysteries#