Bug 277948 - Mozilla SWT browser crashes eclipse when closing a popup window
Summary: Mozilla SWT browser crashes eclipse when closing a popup window
Status: RESOLVED NOT_ECLIPSE
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 3.4.1   Edit
Hardware: PC Linux
: P3 major with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Grant Gayed CLA
QA Contact:
URL:
Whiteboard:
Keywords: needinfo
Depends on:
Blocks:
 
Reported: 2009-05-26 16:43 EDT by Preet Shihn CLA
Modified: 2010-05-27 12:52 EDT (History)
2 users (show)

See Also:


Attachments
the tar file containing the web files to test on (603 bytes, application/x-gzip)
2009-05-26 16:43 EDT, Preet Shihn CLA
no flags Details
crash log (62.32 KB, application/octet-stream)
2009-05-26 16:54 EDT, Preet Shihn CLA
no flags Details
patch (2.20 KB, patch)
2009-05-27 15:59 EDT, Grant Gayed CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Preet Shihn CLA 2009-05-26 16:43:05 EDT
Created attachment 137223 [details]
the tar file containing the web files to test on

Build ID: M20080911-1700

Steps To Reproduce:
* On a linux machine, extract the attached tar.gz file to a location (or in the root folder of a web server). This will create a folder called 'webfiles' with 2 files in the folder

* Point the eclipse mozilla browser (a xulrunner version is preferred but it happens even if it isn't) to the url of the test.html located in the 'webfiles' folder you just extracted

* click on the link titled "TAR file"

* A download dialog will come up. Click cancel. 

* click the link again and repeat the process many times. It crashes within 5-10 minutes

More information:
This happens on any SWT mozilla browser. It likely happens on windows mozilla browser too.  

This is very important to us as our customers are being affected in their every day use
Comment 1 Preet Shihn CLA 2009-05-26 16:54:33 EDT
Created attachment 137227 [details]
crash log 

This is the crash log file generated by the JVM
Comment 2 Preet Shihn CLA 2009-05-26 16:57:15 EDT
I reproduced the crash with a debug version of the xulrunner.
The stack trace in libxul.dll in the attached log file matches with the following symbols:

get_gtk_widget_for_gdk_window(_GdkDrawable*) + 72
IM_get_input_context(_MozDrawingarea*) + 29
nsWindow::IMEGetContext
nsWindow::IMELoseFocus() + 62
nsWindow::IMEDestroyContext() + 37
nsWindow::Destroy()
nsView::~nsView$delete()
nsIview::Destroy();
nsViewManager::~nsViewManager$delete() + 52
nsViewManager::Release() + 78
nsCOMPtr_base::~nsCOMPtr_base$base() + 22
nsCOMPtr<nsIViewManager>::~nsCOMPtr()
DocumentViewerImpl::~DocumentViewerImpl$delete() + 420
DocumentViewerImpl::Release() + 78
nsCOMPtr_base::~nsCOMPtr_base$base() + 22
nsCOMPtr<nsIContentViewer>::~nsComPtr() + 27
nsGlobalWindow::Close()
nsExternalAppHandler::Notify(nsITimer*) + 44
nsTimerImpl::Fire() + 150
handleTimerEvent(TimerEventType*) + 127
PL_HandleEvent + 35
PL_ProcessPendingEvents+0x8e + 142
nsEvenetQueueImpl::ProcessPendingEvents()
event_processor_callback(..) + 22
Comment 3 Preet Shihn CLA 2009-05-26 17:09:03 EDT
In this particular case, a new window(popup) is created to deal with the link. Since the link points to a mime type that the browser can not load, (looking at the mozilla code) it posts an event in UI thread to close the window via a timer.  

SWT browser is listening to the close window event and it closes the shell when that happens. What seems to be happening in this race condition is that the shell gets disposed causing the browser to be disposed. Now the C++ code in mozilla is trying to release the window and free up memory; and it seems like the window is already gone so it gets an invalid pointer.


Although this particular reproducible example is specific for URLs that the browser can not load, I think the race condition exists in all scenarios when the underlying mozilla is trying to close a popup window. This particular scenario just easily reproduces the crash. (we have seen random crashes in our customer base with libxul.so that had no specific reproducible steps - so hard to report)

This scenario could the same if, say, some javascript was closing a window. It would make the same xpcom call to close the window. 
Comment 4 Grant Gayed CLA 2009-05-27 12:41:40 EDT
I can reproduce the problem easily with xulrunner 1.8.1.3, but not with 1.9.  Which xulrunner version are you using?

If the problem is as you describe (xulrunner is referencing a browser that has been disposed) then this is a xulrunner bug, and it may be fixed in later xulrunner versions.  It's a similar scenario that happens throughout swt: any time an event is sent then existing control references need isDisposed() checks because a client could have disposed something.

BTW: do you thinh bug 275411 is a duplicate of this one?
Comment 5 Preet Shihn CLA 2009-05-27 14:22:41 EDT
I can reproduce this on 1.8.1.3 and 1.8.1.16 (we use 1.8.1.16 mostly). We can not move to 1.9 because our code is dependent on eclipse 3.4 and the SWT browser in that version does not quite work well with mozilla 1.9

I'm not sure where the real bug lies, but this crash does not happen on mozilla based browsers for version 1.8.x. This may be because of the way their code is written.

My real worry is that the crash is happening in
nsGlobalWindow::Close() which would be called to close any window. This race condition would exist in ALL window closures. 

Regarding bug 275411, I don't think it's the same by looking at the stack traces. The root cause maybe the same since a bad pointer could manifest errors elsewhere in the code. However, from what I know, bug 275411 has been encountered even when no popup windows were involved. 

I'm just wondering if there is some sort of more basic race condition in synchronizing the SWT UI thread and the Mozilla's UI thread. 

BTW, not directly related to this bug, but sometimes I've seen crashes in the finalize method, which means that when Java is releasing an object, underlying mozilla code has already disposed it off. Some reference count miscalculation?
Comment 6 Grant Gayed CLA 2009-05-27 15:59:54 EDT
Created attachment 137406 [details]
patch

The attached patch delays the destroying of the Browser and its parent shell for xulrunner 1.8.x (this appears to be the only version with this problem, tested mozilla 1.4-1.9.1b3).

All popup browser creates/destroys come through swt because xulrunner doesn't know how to create/destroy the parent window, so this should fix any cases of this problem that are happening in other contexts as well.

> BTW, not directly related to this bug, but sometimes I've seen crashes in the
> finalize method, which means that when Java is releasing an object, underlying
> mozilla code has already disposed it off. Some reference count miscalculation?

It's possible there could be 1+ AddRefs missing, but they don't cause a problem for the browser if it doesn't reference them after they've been freed.
Comment 7 Pawan Singh CLA 2009-05-27 16:09:25 EDT
I looked at the patch. How can one be sure that putting the dispose in an asyncExec will still not cause the crash? It seems like Mozilla code itself does the same trick of posting some event to dispose things. That means that if events are intermingled, the crash will still happen. This patch will simply affect the timing of it all.
Comment 8 Grant Gayed CLA 2009-05-27 16:51:34 EDT
The asyncExec guarantees that the browser will still be valid for use by the invoker of Mozilla.DestroyBrowserWindow() and its invocation chain.  If xulrunner still tried to reference the browser after swt had asynchronously destroyed it then the same problem would still happen.

That being said, the DestroyBrowserWindow implementation is responsible for destroying the browser's shell, and since a browser cannot live without a parent, destroying the browser as well.  It can't wait longer than an asyncExec() to do it, and if xulrunner is still referencing the browser a full iteration after asking for it to be destroyed then it's a xulrunner bug that's worse than the current one.  Disposing a control when the event loop is run (ie.- not in a callback implementation) is generally safe because there isn't a call stack that risks having its data turn bad on it.
Comment 9 Preet Shihn CLA 2009-05-28 18:47:23 EDT
I can reproduce the crash easily with your patch. 
(using the html files I have attached to this bug)
Comment 10 Preet Shihn CLA 2009-06-02 17:15:55 EDT
Myabe I was too quick to make the previous statement, it seems like I was unable to build the SWT plugin properly, so the changes in the patch might have never gone into the new SWT plugin i got. 

I am trying to follow in the instructions available at:
http://www.eclipse.org/swt/faq.php#howbuildplugin

It works fine. I get a plugin that works but i don't think it is taking into account any changes I make into the file in the org.eclipse.swt project (the src files dont seem to be in the platform specific plugin). Even if I introduce very obvious errors in the Mozilla.java file, I still get a swt plugin that works. I made sure that I clear all the bin and output class folders.

It seems like it is being picked up from the host eclipse where I'm running it from. 

Here's what the output looks like with some simple compile errors introduced. You can see even though it complained, I did get a successful SWT plugin.

What am I doing wrong?




Buildfile: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/build.xml

properties:

init:

jar.plugin:
       [mkdir] Created dir: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder

properties:

init:

build.jars:

properties:

init:

@dot:
       [mkdir] Created dir: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/@dot.bin
       [javac] Compiling 522 source files to /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/@dot.bin
       [javac] ----------
       [javac] 1. ERROR in /home/pshihn/eclipseswt/org.eclipse.swt/Eclipse SWT Mozilla/common/org/eclipse/swt/browser/Mozilla.java (at line 157)
       [javac] 	System.out.println("jhjhjhjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj")
       [javac] 	                                                                 ^
       [javac] Syntax error, insert ";" to complete Statement
       [javac] ----------
       [javac] 2. ERROR in /home/pshihn/eclipseswt/org.eclipse.swt/Eclipse SWT Mozilla/common/org/eclipse/swt/browser/Mozilla.java (at line 1318)
       [javac] 	public int /*long*/ method8 (int /*long*/[] args) {return DestroyBrowserWindow ();}
       [javac] 	                                                          ^^^^^^^^^^^^^^^^^^^^
       [javac] The method DestroyBrowserWindow() is undefined for the type new XPCOMObject(){}
       [javac] ----------
       [javac] 2 problems (2 errors)
       [javac] Compilation failed. Compiler errors are available in /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/@dot.bin.xml
       [javac] Compile failed; see the compiler error output for details.
        [copy] Copying 2 files to /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/@dot.bin

copy.translationfiles:
       [mkdir] Created dir: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/@dot
        [copy] Copying 708 files to /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/@dot
         [jar] Building jar: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/swt.jar
      [delete] Deleting directory /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/@dot.bin

properties:

init:

gather.bin.parts:
       [mkdir] Created dir: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/org.eclipse.swt.gtk.linux.x86_3.3.0
        [copy] Copying 708 files to /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/org.eclipse.swt.gtk.linux.x86_3.3.0
        [copy] Copying 20 files to /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder/org.eclipse.swt.gtk.linux.x86_3.3.0
         [jar] Building jar: /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/org.eclipse.swt.gtk.linux.x86_3.3.0.jar
      [delete] Deleting directory /home/pshihn/eclipseswt/org.eclipse.swt.gtk.linux.x86/temp.folder
BUILD SUCCESSFUL
Total time: 5 seconds
Comment 11 Grant Gayed CLA 2009-06-04 10:48:35 EDT
If you're launching your app from within eclipse then you don't need to rebuild the swt plugin.  It should work with steps:

- apply the change to the swt project in your workspace
- add a System.out.println("here") somewhere conspicuous to indicate when the new code is running
- invoke Run > Run Configurations...
- select your app and go to its Plug-ins tab
- ensure that either:
  -> "Launch with: all workspace and enabled target plugins" is selected, or
  -> the swt plugins in your workspace are checked in the table below, not the ones in the "Target Platform" section
Comment 12 Grant Gayed CLA 2010-05-27 12:52:32 EDT
Timed out, closing as NOT_ECLIPSE since the problem appears to only happen with XULRunner 1.8.x.  If you are still restricted to using XULRunner 1.8.x then see if the patch in comment 6 makes your crash go away (instructions in comment 11).  If it does help then you can either ship a modified swt jar with your app, or add a comment here and it can be considered for a future eclipse release.