Bug 490743 - StackOverflowError in StyledTextRenderer.getTextLayout(..) (was: Manifest editor can not be opened with I20160330-1230)
Summary: StackOverflowError in StyledTextRenderer.getTextLayout(..) (was: Manifest edi...
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.6   Edit
Hardware: PC Linux
: P3 major (vote)
Target Milestone: 4.6 M7   Edit
Assignee: Markus Keller CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 490911 491652 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-03-30 22:42 EDT by David Williams CLA
Modified: 2016-04-27 05:31 EDT (History)
9 users (show)

See Also:


Attachments
Bug490743.java (883 bytes, text/plain)
2016-03-31 11:18 EDT, Markus Keller CLA
no flags Details
Bug490743.java (win and gtk) (1.27 KB, text/plain)
2016-04-05 10:11 EDT, Markus Keller CLA
no flags Details
SnippetBug490743.java - find problematic font sizes (3.09 KB, text/plain)
2016-04-05 10:30 EDT, Markus Keller CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2016-03-30 22:42:43 EDT
This is a continuation of comments in bug 490637. I found out a bit more about how to reproduce the issue and it seems different than what bug 490637 was about. 

In that bug, I mentioned "no editor was left open", but turns out that is only because an error occurred while trying to open the Manifest editor -- as is always done when a new project is created. Thus, I did not "see" any editor, but some part of Eclipse thinks an editor was open, and tries to re-open on workspace restart, and hits the problem again. 

Variations I've tried: 

- Oracle as well as IBM's VM. (Same behavior.)
- commented out "export SWT_GTK3=0" (Same behavior.)
- started with a fresh workspace, and did not create an 'example', just an empty plugin project. (Same behavior.)
- Another way to "see" the error (other than restarting eclipse, after creating a new project) is to click on (i.e, "open") the MANIFEST.MF file. It should have opened upon plugin project creation, but did not, but if I then try to make it happen, with  an "open", I get the dialog box saying "an error occurred, see log" and there was the "stack overflow" stack dump. Prior to that being in the .log, nothing was written to the .log. 
- I created a Java Project in a new workspace (not plugin project). Did nothing else but click on tabs at bottom of the perspective. When I clicked on "Declaration" tab, I got the stack overflow error then. 

See bug 490637 for a stack dump in log. All these "new" cases did not seem much different. 

I would mark this as "blocking" -- and think it should be if anyone else can reproduce it. Otherwise, I'll count as "major" and assume some quirk with my Linux install (still "missing function", but maybe not everyone is?)
Comment 1 Sravan Kumar Lakkimsetti CLA 2016-03-31 04:54:26 EDT
From the log attached to bug 490637 I can see that there is a stack overflow error. 

I am not able to reproduce the error, But I had similar problem when I was doing the hidpi work. This can happen when you have non integer scaling factors. 

I am setting up ubuntu 12 to test this
Comment 2 Sravan Kumar Lakkimsetti CLA 2016-03-31 07:54:15 EDT
I did setup ubuntu 12. But I am still not able to reproduce this issue
Comment 3 Markus Keller CLA 2016-03-31 09:16:33 EDT
This is the counterexample for the proof I asked for in bug 490490 comment 10.

Exception in thread "main" java.lang.StackOverflowError
	at org.eclipse.swt.custom.StyledTextRenderer.getTextLayout(StyledTextRenderer.java:1017)
	at org.eclipse.swt.custom.StyledTextRenderer.getTextLayout(StyledTextRenderer.java:714)
	at org.eclipse.swt.custom.StyledText.getPointAtOffset(StyledText.java:5497)
	at org.eclipse.swt.custom.StyledText.setCaretLocation(StyledText.java:8613)
	at org.eclipse.swt.custom.StyledTextRenderer.getTextLayout(StyledTextRenderer.java:1017)
...

This problem depends on the DPI setting and the specific font and font-size used in the editor.

The StackOverflow gets triggered in
StyledTextRenderer#getTextLayout(int, int, int, int) line: 992
when TextLayout#getLineBounds(int).height > StyledTextRenderer#getLineHeight().

In that case, the "then" block of "if (index != -1)" calls styledText.setCaretLocation();

10 years ago, bug 131906 last touched this code with http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=b5d3c82373a130ee58125864784553ac1c1a11da
Comment 4 David Williams CLA 2016-03-31 11:07:13 EDT
(In reply to Markus Keller from comment #3)

> This problem depends on the DPI setting and the specific font and font-size
> used in the editor.
> 

I think you have uncovered the issue (or, gave pretty good hints :) I didn't really understand what you meant). But in bug 490637 when I reported by screen size, etc., and said "dpi was notoriously inaccurate) I meant because the "millimeters" reported for screen size was about a 20 inch screen, not 17 in. diagonal. 

After reading your note here, Markus, I briefly read 
http://askubuntu.com/questions/197828/how-to-find-and-change-the-screen-dpi

and while I did not have time to do all the calculations, 

xrandr | grep -w connected
reports 
LVDS-0 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 382mm x 215mm

That is "smaller" than the mm reported by NVidia setting ((508x286 millimeters). 

So, I did a very quick "let's try it" approach, and invoked

gsettings set org.gnome.desktop.interface text-scaling-factor 0.9375
(as mentioned in 
http://askubuntu.com/questions/197828/how-to-find-and-change-the-screen-dpi)

xdpyinfo | grep dots
still reports 96x96 dots per inch (s the article implied with gnome3 that can not be changed -- I am not using "gnome3" as far as I know, but would not surprised if it was installed with something. 

And, guess what? Now the Manifest Editor "comes up"! 

I had also changed my system default fonts back to size 10 -- I think I had changed them to 12 somewhere along the line). 

So, between the two, I seem to be missing that faulty "if" condition? 

I have no idea what this means for SWT  :) -- after all, stack overflow must be avoided even is a users machine has the "wrong settings.  

But, perhaps adds to the data points.
Comment 5 Markus Keller CLA 2016-03-31 11:18:30 EDT
Created attachment 260648 [details]
Bug490743.java

To reproduce the StackOverflowError on master, run this snippet on Windows 7 at 150% scaling.
Comment 6 Markus Keller CLA 2016-03-31 14:42:37 EDT
(In reply to Markus Keller from comment #5, i.e. on Windows 7 150%)
One of the underlying problems is that StyledTextRenderer#getTextLayout(int, int, int, int) calls TextLayout#setDescent(int), passing a descent from FontMetrics#getDescent().

The originally computed descent from the OS in pixels is 4. At 150%, the value in SWT points is 3. But if you pass 3 to TextLayout#setDescent(int), this internally gets scaled up to 5 (not 4).

The last for-loop in TextLayout#computeRuns(GC) uses a certain descent to compute the line heights (which is stored in lineY[]). That descent is the max of the native descent (from OS#GetTextMetrics(..)) and the value passed to #setDescent(int). => The scaleUp to 5 increases the final line height.

Part of the solution could be to make sure we never internally deal with pixel values that cannot be represented as SWT points. E.g. with a new method DPIUtil#quantize(int), which quantizes a given size in pixels into a result in pixels that can be the result of a call to {@link DPIUtil#autoScaleUp(int)}.

However, this doesn't solve the problem in cases where we combine pixel values. E.g. at the end of #computeRuns(GC), we have:

	lineY[line] = lineY[line - 1] + ascent + descent + lineSpacing;

In the example, ascent is 17 (11pt) and descent is 5 (3pt). The sum is 22, which is 15pt. But the sum of 11pt and 3 pt is only 14pt! This difference causes the StackOverflowError.

=> Conclusion: With non-integral scale factors, we cannot perform calculations on pixel values if API clients have access to the operand values in points.

The solution is to keep fields ascent etc. in points, not in pixels.
Comment 7 Eclipse Genie CLA 2016-03-31 14:59:21 EDT
New Gerrit change created: https://git.eclipse.org/r/69652
Comment 8 Patrik Suzzi CLA 2016-04-01 13:17:22 EDT
*** Bug 490911 has been marked as a duplicate of this bug. ***
Comment 10 Markus Keller CLA 2016-04-04 15:50:56 EDT
Thanks Niraj for the additional fixes. Renaming the variables to *InPixels is totally the right thing to do and makes the code easier to understand.

I've removed the (now redundant) comments I had added, and I've renamed a few more occurrences of baseline to baselineInPixels. Snippet204 now also looks good at 150% and 200%.

Now we have to port this to GTK's TextLayout as well.
Comment 11 Sravan Kumar Lakkimsetti CLA 2016-04-05 04:19:19 EDT
I looked at this to port to Linux but the StyleItem class does not have ascent or descent or line spacing in gtk

The ascent and descent are there in the TexTLayout and use pixels only. they donot use point geometry. 

I am at loss on how to port this to gtk
Comment 12 Eclipse Genie CLA 2016-04-05 10:03:52 EDT
New Gerrit change created: https://git.eclipse.org/r/69920
Comment 13 Markus Keller CLA 2016-04-05 10:11:12 EDT
Created attachment 260719 [details]
Bug490743.java (win and gtk)

Updated Bug490743.java snippet to reproduce the StackOverflowError at 150% on Ubuntu 14.04 with font Ubuntu 12 (as well as on Windows 7 with Segoe UI 8).

(In reply to Markus Keller from comment #6)
Unfortunately, the solution from comment 6 only applies to Win32. There, field TextLayout#lineY manages the line heights, so to stay consistent with ascent/descent, we can just perform all height calculations in points.

On GTK, line heights are managed by Pango via pango_layout_iter_get_line_extents(), and if we wanted to change that to align to points, we'd have to split the whole layout and manage lines individually like on win32. That would be a huge undertaking.

The best solution I see for GTK is to keep the principal architecture, but make sure ascent and descent in points get properly rounded so that they match the total line height. I.e. in API points, either ascent or descent will not exactly match the value from Pango in certain situations. Since lines are drawn from top to bottom, I expect less trouble if descent is considered the derived value.

(In reply to Eclipse Genie from comment #12)
> New Gerrit change created: https://git.eclipse.org/r/69920

Implementation of this strategy. Verified that this solves the StackOverflowError with the snippet at 100%, 150%, 200%, and that Snippet204 looks good.
Comment 15 Markus Keller CLA 2016-04-05 10:30:25 EDT
Created attachment 260721 [details]
SnippetBug490743.java - find problematic font sizes

Here's a snippet I used to find the problematic font sizes that are embedded in Bug490743.java. Run it an then press Arrow Up/Down to change the font size.
Comment 16 Niraj Modi CLA 2016-04-14 10:14:36 EDT
*** Bug 491652 has been marked as a duplicate of this bug. ***
Comment 17 Niraj Modi CLA 2016-04-14 10:17:19 EDT
Reopening: On Win7, Stack-Overflow is still seen using Test snippet in attachment 260719 [details] with "Consolas" font, size:10 at 125% zoom.
Comment 18 Niraj Modi CLA 2016-04-14 11:32:01 EDT
(In reply to Niraj Modi from comment #17)
> Reopening: On Win7, Stack-Overflow is still seen using Test snippet in
> attachment 260719 [details] with "Consolas" font, size:10 at 125% zoom.

Narrowed down on this problem and main reason seems to be the rounding error getting introduced at StyledTextRenderer Line#994:
ascent = metrics.getAscent() + metrics.getLeading();

Potential fix:
If we elaborate above function calls, it will look like below:
ascent = DPIUtil.autoScaleDown(handle.tmAscent - handle.tmInternalLeading) + DPIUtil.autoScaleDown(handle.tmInternalLeading);

Simplifying this further will take us to:
ascent = DPIUtil.autoScaleDown(handle.tmAscent);

Replacing the original method calls with this simplified call actually minimizes the rounding error and fixes this issue.
Note: This change requires us to access the native handle of Fontmetrics directly in StyledTextRenderer class(which looks UN-avoidable to me right now)

Will share a patch based on this approach shortly.
Comment 19 Eclipse Genie CLA 2016-04-14 12:45:35 EDT
New Gerrit change created: https://git.eclipse.org/r/70689
Comment 20 Markus Keller CLA 2016-04-15 15:01:40 EDT
(In reply to Niraj Modi from comment #18)
StyledTextRenderer may not be the only client that runs into this issue, and external clients don't even have the possibility to use platform-dependent workarounds.

Luckily, it all boils down to a broken invariant. [DbC] to the rescue!

API clients expect this invariant in FontMetrics:

   getHeight() == getLeading() + getAscent() + getDescent()

Separate rounding of each RHS term can break the invariant, and that causes the StackOverflowError.

An additional problem is that ascent and descent are more important to be as close as possible to the real value. Any necessary rounding adjustment should go into leading, that's why we should compute leading as a value derived from the other terms in points.

Fixed with http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=17d9b1917889175de869993b7820ac382a9bf35b

[DbC] https://en.wikipedia.org/wiki/Design_by_contract
Comment 21 David Williams CLA 2016-04-15 15:49:20 EDT
(In reply to Markus Keller from comment #20)
> (In reply to Niraj Modi from comment #18)
> StyledTextRenderer may not be the only client that runs into this issue, and
> external clients don't even have the possibility to use platform-dependent
> workarounds.
> 
> Luckily, it all boils down to a broken invariant. [DbC] to the rescue!
> 
> API clients expect this invariant in FontMetrics:
> 
>    getHeight() == getLeading() + getAscent() + getDescent()
> 
> Separate rounding of each RHS term can break the invariant, and that causes
> the StackOverflowError.
> 
> An additional problem is that ascent and descent are more important to be as
> close as possible to the real value. Any necessary rounding adjustment
> should go into leading, that's why we should compute leading as a value
> derived from the other terms in points.
> 
> Fixed with
> http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/
> ?id=17d9b1917889175de869993b7820ac382a9bf35b
> 
> [DbC] https://en.wikipedia.org/wiki/Design_by_contract

I'll confess I do not really understand the solution, exactly (and not that I need to) but I was looking at it to see if using "double" for intermediate computations would help. Is it because Java does not do the computation until later? When it really needs to, or ... ? 

Unrelated, do the "hash" and "equals" still look correct for that class? 
One sure has a lot more terms in it that the other, and intuitively I would have expected them to have the same. (Again, mostly asking for my education, if you don't mind me cluttering up your bug :)
Comment 22 Markus Keller CLA 2016-04-18 07:57:01 EDT
(In reply to David Williams from comment #21)
> ...  see if using "double" for intermediate computations would help. ...

No, it's not about the precision of intermediate values or a problem of Java. It's a purely mathematical problem due to the fact that we have two integer-based coordinate systems:

1. We have APIs that define coordinates as int-based points. Existing clients rely on them, so there's no way we can change them. And even if we added new APIs with coordinates in doubles, we'd still have to support old clients forever.

2. The OS APIs are in pixels.

At non-integral scale factors, we have two competing grids whose points cannot be converted into each other without rounding. The problem is that integer additions cannot be converted to the other grid without errors. Maybe a graphical variant of comment 6 explains more than a thousand words:

Assume 150%, the width of a single pixel or point is represented by a sequence of --- or ... characters, and the whole width is delimited by |:

                3pt         +  1pt    ==   4pt
API points:    |---...---|  + |...|   ==  |---...---...|
Native pixels: |--..--..--| + |..--|  ==  |--..--..--..--|
                5px         +  2px    ==   7px [sic!]

But the problem is that 4pt * 150% == 6px and not 7px as if we had performed the addition in pixels. 

From comment #6:
> => Conclusion: With non-integral scale factors, we cannot perform
> calculations on pixel values if API clients have access to the operand
> values in points.

The invariant I gave in comment 20 is one example for this. To API clients, it has to look as if everything was computed in points. We cannot accept any implementation that reveal rounding errors like the one depicted above.

> Unrelated, do the "hash" and "equals" still look correct for that class? 
> One sure has a lot more terms in it that the other, and intuitively I would
> have expected them to have the same.

They are still correct. You rpobably got distracted by the formatting. equals() uses one field per line, whereas hashCode() contains lines with 2-3 fields each.
Comment 23 David Williams CLA 2016-04-18 09:11:18 EDT
(In reply to Markus Keller from comment #22)
Thanks! That helps me understand your comment better than I did and teaches me I should always look at complex hash and equals functions more carefully. :) 

Reading https://msdn.microsoft.com/en-us/library/ms838191.aspx helps me understand how difficult the problem is in general.
Comment 24 Mickael Istria CLA 2016-04-25 07:16:44 EDT
For those facing the issue on Linux and not able to retrieve the fixed version immediately, then simply setting -Dswt.enable.autoScale=false in eclipse.ini prevents this issue from happenning.
Comment 25 Lakshmi P Shanmugam CLA 2016-04-27 05:31:48 EDT
Verified on Ubuntu 14.04 with I20160426-1615.