Bug 544501 - [SWT] Eclipse crashes with reference count error message cairo_surface_destroy
Summary: [SWT] Eclipse crashes with reference count error message cairo_surface_destroy
Status: CLOSED MOVED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.6   Edit
Hardware: PC Linux
: P3 critical (vote)
Target Milestone: 4.14   Edit
Assignee: Platform-SWT-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: triaged
Depends on:
Blocks: 535099
  Show dependency tree
 
Reported: 2019-02-15 10:27 EST by Ansgar Radermacher CLA
Modified: 2019-11-04 11:21 EST (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ansgar Radermacher CLA 2019-02-15 10:27:18 EST
My Eclipse (2018-12) crashes on two different Linux machines with the following error cairo-surface.c:955: cairo_surface_destroy: Assertion `CAIRO_REFERENCE_COUNT_HAS_REFERENCE (&surface->ref_count)' failed.

The crash happens when a larger Papyrus diagram (with embedded SVG images) is opened or zoomed in/out. While the error could certainly be in Papyrus, the bug is not reproducible on a Windows machine, so likely specific to the SWT/GTK implementation.
The bug is reproducible with the Papyrus for Robotics RCP (eclipse.org/papyrus/components/robotics) when opening a certain diagram, the associated model is available on request. 

The 1st machine is an Ubuntu 16.04 one with GTK version 3.18.9, the 2nd one an Ubuntu 18.04 with GTK version 3.22.30.
Comment 1 Eric Williams CLA 2019-02-15 10:40:28 EST
Could you provide an SWT snippet that reproduces the issue?
Comment 2 Ansgar Radermacher CLA 2019-02-15 10:57:45 EST
In reply to Eric Williams from comment #1)
> Could you provide an SWT snippet that reproduces the issue?

unfortunately no. In the moment, I've just observed that my Papyrus/Eclipse installation crashes very often (but not always) when I open a larger diagram of a model using Linux whereas I cannot reproduce this issue with Windows. The problem can be reproduced in a 2nd (debug) Eclipse instance as well as in normal use.

I could add the SWT sources to my first instance and try to debug into the issue (would appreciate some hints where to set breakpoints or similar).

Alternatively, you could try to reproduce the error, if you install Papyrus for Robotics and I sent you a download link where you can find the model that triggers the error.
Comment 3 Eric Williams CLA 2019-02-15 11:01:12 EST
(In reply to Ansgar Radermacher from comment #2)
> Alternatively, you could try to reproduce the error, if you install Papyrus
> for Robotics and I sent you a download link where you can find the model
> that triggers the error.

Please do! Additionally, can you try on older releases (i.e. 4.9, or 4.8) and see if the issue reproduces there? This can help us narrow down a regression.
Comment 4 Ansgar Radermacher CLA 2019-02-16 06:20:38 EST
(In reply to Eric Williams from comment #3)
> (In reply to Ansgar Radermacher from comment #2)
> > Alternatively, you could try to reproduce the error, if you install Papyrus
> > for Robotics and I sent you a download link where you can find the model
> > that triggers the error.
> 
> Please do! Additionally, can you try on older releases (i.e. 4.9, or 4.8)
> and see if the issue reproduces there? This can help us narrow down a
> regression.

I can reproduce the error with Eclipse 2018-09, photon, oxygen and neon, i.e. this is not a regression. For the last two Eclipse versions, I used standard Papyrus as Papyrus for Robotics does not run on these Eclipse versions.

With standard port shapes, the crash is not reproducible. But it can be triggered by activating SVG decorations on ports (via CSS). Thus, the error seems to be specific to the SVG display.
Comment 5 Eric Williams CLA 2019-02-21 14:57:06 EST
I can reproduce the crash as well.
Comment 6 Ansgar Radermacher CLA 2019-04-01 04:28:44 EDT
The bug is still reproducible with 2019-03.
Comment 7 Ansgar Radermacher CLA 2019-05-22 08:39:20 EDT
(In reply to Ansgar Radermacher from comment #6)
> The bug is still reproducible with 2019-03.

I have a new (faster) PC since a couple of days. While the exceptions happened "often" before, they now happen "almost always" if a diagram containing SVG shapes is opened. This might indicate that it is a threading issue.
Comment 8 Eric Williams CLA 2019-05-22 08:46:19 EDT
I haven't forgotten about this bug, I'll try to take a look for 4.13.
Comment 9 Ansgar Radermacher CLA 2019-06-13 10:46:55 EDT
Good news from my side. I can't reproduce the error with M3 of Eclipse 2019-06. It's not related to GTK version of the OS, since I can still reproduce the issues with 2019-03. So it has apparently been fixed - although it is not clear for me what has been done specifically.
Should I close with "works for me?"
Comment 10 Eric Williams CLA 2019-06-13 10:52:36 EDT
(In reply to Ansgar Radermacher from comment #9)
> Good news from my side. I can't reproduce the error with M3 of Eclipse
> 2019-06. It's not related to GTK version of the OS, since I can still
> reproduce the issues with 2019-03. So it has apparently been fixed -
> although it is not clear for me what has been done specifically.
> Should I close with "works for me?"

I think it might be fixed by bug 545032 -- I had to revert it for 4.12 RC1, but it's back in 4.13 now.

Can you try with 4.12 RC1? If the issue *does* reproduce with that then we have our answer.
Comment 11 Ansgar Radermacher CLA 2019-06-13 12:08:58 EDT
Yes, it's broken again in RC1 - I can reproduce the crash. So it will be fixed in the 2019-06 release?
Comment 12 Eric Williams CLA 2019-06-13 12:13:48 EDT
(In reply to Ansgar Radermacher from comment #11)
> Yes, it's broken again in RC1 - I can reproduce the crash. So it will be
> fixed in the 2019-06 release?

Yes, it's already fixed in master. To summarize:

4.12 M3 and earlier: fixed
4.12 RC1 and 4.12 GA: broken
I20190610-1800 and newer: fixed

Thanks for your help and investigation!

*** This bug has been marked as a duplicate of bug 545032 ***
Comment 13 Eric Williams CLA 2019-07-31 15:36:46 EDT
According to: https://bugs.eclipse.org/bugs/show_bug.cgi?id=545032#c43

this issue is back. Reopening the ticket.
Comment 14 Eric Williams CLA 2019-08-14 19:01:47 EDT
Moving into 4.14 since I won't be able to get to this for 4.13.
Comment 15 Ansgar Radermacher CLA 2019-10-22 03:38:20 EDT
(In reply to Eric Williams from comment #14)
> Moving into 4.14 since I won't be able to get to this for 4.13.

Hi Eric,

any chance that this will be fixed for 4.14?

Best

Ansgar
Comment 16 Eric Williams CLA 2019-10-22 22:07:53 EDT
(In reply to Ansgar Radermacher from comment #15)
> (In reply to Eric Williams from comment #14)
> > Moving into 4.14 since I won't be able to get to this for 4.13.
> 
> Hi Eric,
> 
> any chance that this will be fixed for 4.14?
> 
> Best
> 
> Ansgar

Hoping to. I am making some changes to drawing code and this might get fixed as a result. Let me check back in a week or so.
Comment 17 Eric Williams CLA 2019-10-28 13:46:34 EDT
Ansgar, can you try with a recent I-build? I20191027-1800 or newer should be good. Some changes were made to the internal drawing mechanism and I'm wondering if they've taken care of this issue.
Comment 18 Ansgar Radermacher CLA 2019-10-29 06:28:10 EDT
(In reply to Eric Williams from comment #17)
> Ansgar, can you try with a recent I-build? I20191027-1800 or newer should be
> good. Some changes were made to the internal drawing mechanism and I'm
> wondering if they've taken care of this issue.

Hi Eric,

I've installed an Eclipse-SDK from here: https://download.eclipse.org/eclipse/downloads/drops4/I20191028-1800/ (Version: 2019-12 (4.14)
Build id: I20191028-1800) and then installed Papyrus for Robotics from its update-site.

Unfortunately, I can immediately reproduce the crash: cairo-surface.c:955: cairo_surface_destroy: Assertion `CAIRO_REFERENCE_COUNT_HAS_REFERENCE (&surface->ref_count)' failed.
Comment 19 Eric Williams CLA 2019-10-29 12:15:29 EDT
(In reply to Ansgar Radermacher from comment #18)
> (In reply to Eric Williams from comment #17)
> > Ansgar, can you try with a recent I-build? I20191027-1800 or newer should be
> > good. Some changes were made to the internal drawing mechanism and I'm
> > wondering if they've taken care of this issue.
> 
> Hi Eric,
> 
> I've installed an Eclipse-SDK from here:
> https://download.eclipse.org/eclipse/downloads/drops4/I20191028-1800/
> (Version: 2019-12 (4.14)
> Build id: I20191028-1800) and then installed Papyrus for Robotics from its
> update-site.
> 
> Unfortunately, I can immediately reproduce the crash: cairo-surface.c:955:
> cairo_surface_destroy: Assertion `CAIRO_REFERENCE_COUNT_HAS_REFERENCE
> (&surface->ref_count)' failed.

Sorry to hear it, I'll investigate.
Comment 20 Ansgar Radermacher CLA 2019-10-30 08:19:18 EDT
Hi Eric,

I did some investigations: I've added a check to cairo_surface_destroy, whether the reference count is already 0 (= already destroyed? Btw, it required a very small additional library, as cairo_surface_get_reference_count is not part of the functions exported via JNI).
Thus, I was able to get a stack trace when the problem happens:

Thread [Thread-71] (Suspended (breakpoint at line 1106 in Cairo))	
	Cairo.cairo_surface_destroy(long) line: 1106	
	Image.destroy() line: 928	
	Image(Resource).dispose() line: 69	
	SVGImage(AbstractRenderedImage).getSWTImage() line: 131	
	RenderHelper$1.run() line: 104	
	Thread.run() line: 748

Does this help? SVGImage is part of GMF-runtime. It is also possible that the root cause is not in SWT, but the Windows implementation if destroy surface will simply do nothing, if the ref count is already 0.
Comment 21 Eric Williams CLA 2019-10-30 08:41:04 EDT
(In reply to Ansgar Radermacher from comment #20)
> Hi Eric,
> 
> I did some investigations: I've added a check to cairo_surface_destroy,
> whether the reference count is already 0 (= already destroyed? Btw, it
> required a very small additional library, as
> cairo_surface_get_reference_count is not part of the functions exported via
> JNI).
> Thus, I was able to get a stack trace when the problem happens:
> 
> Thread [Thread-71] (Suspended (breakpoint at line 1106 in Cairo))	
> 	Cairo.cairo_surface_destroy(long) line: 1106	
> 	Image.destroy() line: 928	
> 	Image(Resource).dispose() line: 69	
> 	SVGImage(AbstractRenderedImage).getSWTImage() line: 131	
> 	RenderHelper$1.run() line: 104	
> 	Thread.run() line: 748
> 
> Does this help? SVGImage is part of GMF-runtime. It is also possible that
> the root cause is not in SWT, but the Windows implementation if destroy
> surface will simply do nothing, if the ref count is already 0.

The reference count already being 0 means the underlying OS resources associated with the Image (a Cairo surface in this case), have already been freed. I am wondering if the problem is that the SWT Image has already been disposed elsewhere and GMF is trying to dispose it unnecessarily. Could this be?

As a side note, we recently added native SVG support on Linux: bug 545804. So if you are using a 3rd party library to load SVG Images you may want to take a look at refactoring it on Linux as you can probably gain some performance by having SWT do it directly.
Comment 22 Ansgar Radermacher CLA 2019-10-30 09:31:10 EDT
> [...] 
> The reference count already being 0 means the underlying OS resources
> associated with the Image (a Cairo surface in this case), have already been
> freed. I am wondering if the problem is that the SWT Image has already been
> disposed elsewhere and GMF is trying to dispose it unnecessarily. Could this
> be?
> 
> As a side note, we recently added native SVG support on Linux: bug 545804.
> So if you are using a 3rd party library to load SVG Images you may want to
> take a look at refactoring it on Linux as you can probably gain some
> performance by having SWT do it directly.

Thanks for your comments. I now have an idea what is going on. The class RenderHelper from GMF creates new images via getSWTImage in a rendering thread. The code of this method shown below disposes an eventual existing image - this could only happen, if another thread calls getSWTImage on the same image at the same time - in this case "dispose" could be called twice.

in oe.gmf.runtime.draw2d.ui.render.internal.AbstractRenderedImage:

final public Image getSWTImage() {
    if (img != null)
        return img;

    Image image = renderImage();   [calls SVGImage.renderImage in our case]
    if (img != null && !img.isDisposed()) {
        img.dispose();
     }
     img = image;
     return img;
}

The issue does not happen any more, if I tag the getSWTImage above as "synchronized" (in this case the 2nd (img != null) check and eventual removal can be removed, as img is always null).
The exception is also not reproducible any more, if I make the dispose method in swt.graphics.Resource "synchronized", but I rather think the culprit is the getSWTImage method.
Comment 23 Ansgar Radermacher CLA 2019-10-30 11:12:39 EDT
I've created the new GMF-runtime bug 552568.
Comment 24 Eric Williams CLA 2019-10-30 11:33:17 EDT
(In reply to Ansgar Radermacher from comment #23)
> I've created the new GMF-runtime bug 552568.

Perfect, thanks Ansgar for both your patience and your investigation into this issue.