Bug 285130 - Debugger dies in different unpredictable ways
Summary: Debugger dies in different unpredictable ways
Status: RESOLVED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Debug (show other bugs)
Version: 3.5   Edit
Hardware: All All
: P3 major with 3 votes (vote)
Target Milestone: 4.8 M1   Edit
Assignee: Igor Fedorenko CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-30 05:38 EDT by Henrik Dohlmann CLA
Modified: 2018-09-19 02:49 EDT (History)
15 users (show)

See Also:


Attachments
Frozen Debug view (27.87 KB, image/png)
2009-08-03 02:52 EDT, Henrik Dohlmann CLA
no flags Details
Frozen Debug view with GC threads (37.50 KB, image/png)
2009-08-04 04:20 EDT, Henrik Dohlmann CLA
no flags Details
Error dialog while debugging (21.55 KB, image/png)
2009-09-09 09:21 EDT, Henrik Dohlmann CLA
no flags Details
patch (6.65 KB, patch)
2010-02-18 17:59 EST, Darin Wright CLA
no flags Details | Diff
patch (9.65 KB, patch)
2010-02-22 15:17 EST, Darin Wright CLA
no flags Details | Diff
patch (1.49 KB, patch)
2010-02-22 16:00 EST, Darin Wright CLA
no flags Details | Diff
Screenshot from ProcessExplorer (Windows) (78.11 KB, image/jpeg)
2010-05-10 23:27 EDT, Jason CLA
no flags Details
Debug View threads missing (273.69 KB, image/gif)
2017-07-12 06:06 EDT, Sarika Sinha CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henrik Dohlmann CLA 2009-07-30 05:38:05 EDT
OS: Vista 64bit. JVM: 1.6.0u14 32bit.

I try to debug an application with lots of recursion that pumps out alot of information to the console.

Sometimes the console stops showing data, and Eclipse is sort of dead. Switching to the debug perspective shows an empty debug view. Switching to another perspective is possible, but some views are black (not repainted?). 

Starting debugger from the debug perspective somtimes freezes with alot of empty entries in the thread list.

Somtimes I get a "Problem Occured" dialog with the text "'process model delta' has encountered a problem", "An internal error has occured". The detail view further contains "unable to create new native thread". Nothing is written to the ".log" file.

Trying to shut down Eclipse stalls during "Saving workbench state". This is without any information in the ".log". When that happens I cannot connect with JConsole, but with SysInternal's Process Explorer I can see that the javaw.exe has an excessive amount of threads called "msvcr71.dll!_endthreadex+0x31".
Comment 1 Henrik Dohlmann CLA 2009-07-30 06:50:31 EDT
Setting an early breakpoint in a loop and continuing each time it is hit, I got an Out Of Memory dialog. The corresponding log entry is:

!SESSION 2009-07-30 11:51:41.155 -----------------------------------------------
eclipse.buildId=I20090611-1540
java.version=1.6.0_14
java.vendor=Sun Microsystems Inc.
BootLoader constants: OS=win32, ARCH=x86, WS=win32, NL=da_DK
Framework arguments:  -product org.eclipse.epp.package.jee.product
Command-line arguments:  -os win32 -ws win32 -arch x86 -product org.eclipse.epp.package.jee.product -clean

!ENTRY org.eclipse.core.jobs 4 2 2009-07-30 12:49:37.090
!MESSAGE An internal error occurred during: "Label Job".
!STACK 0
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:597)
	at org.eclipse.core.internal.jobs.WorkerPool.jobQueued(WorkerPool.java:145)
	at org.eclipse.core.internal.jobs.JobManager.schedule(JobManager.java:1001)
	at org.eclipse.core.internal.jobs.InternalJob.schedule(InternalJob.java:391)
	at org.eclipse.core.runtime.jobs.Job.schedule(Job.java:461)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelLabelProvider.complete(TreeModelLabelProvider.java:309)
	at org.eclipse.debug.internal.ui.viewers.model.LabelUpdate.done(LabelUpdate.java:146)
	at org.eclipse.debug.internal.ui.model.elements.ElementLabelProvider$LabelUpdater.run(ElementLabelProvider.java:166)
	at org.eclipse.debug.internal.ui.model.elements.ElementLabelProvider$LabelJob.run(ElementLabelProvider.java:71)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)


The machine has 4G of mem.
I am launching Eclipse with:
C:\Tools\eclipse\eclipse.exe -vm "C:\Program Files (x86)\Java\jdk1.6.0_14\bin\javaw.exe" -clean -vmargs -Xmx1g -XX:MaxPermSize=512m


Comment 2 Darin Wright CLA 2009-07-31 11:39:00 EDT
Qs:

* How many threads does your application have? are they short lived, or do they persist for a while?
* How much console output does you application have?
* Does debugging your application work on Eclipse 3.4?
Comment 3 Henrik Dohlmann CLA 2009-08-03 02:52:59 EDT
Created attachment 143238 [details]
Frozen Debug view

The list of running threads is expanded alot, but no names show up and then it freezes
Comment 4 Henrik Dohlmann CLA 2009-08-03 02:53:54 EDT
The app has a handful of long-lived threads and a couple of threadpools for
many short-lived ones, like on/off the EDT, internal on/off core/gui-layer,
firing messages, communicating with 3d renderer through jni (cortona using
jacozoom).

Console output can go from absurd to totatlly crazy, but even reduced to its
minimum (6 lines of copyright stuff), the debugger still hangs.

Debugging worked in 3.4, but I think it stopped working at one of the latest
updates (3.4.2, perhaps with some updated).
Comment 5 Henrik Dohlmann CLA 2009-08-04 04:20:37 EDT
Created attachment 143368 [details]
Frozen Debug view with GC threads

Today I managed to get the Debug View to show all the threads instead of blank lines. The app under debug was still a little alive (?), and when i poked it, the Debug View suddenly managed to add names to all the blank entries.

So, there are alot of entries with the name:
* Thread [garbage collected] (Running).
Comment 6 Henrik Dohlmann CLA 2009-08-04 07:00:15 EDT
While the Debug View hanged, i tried to start a new instance. This resulted in the following in the log:

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:597)
at org.eclipse.jdt.internal.debug.core.model.JDIThread.suspendUnderlyingThread(JDIThread.java:1598)
at org.eclipse.jdt.internal.debug.core.model.JDIThread.suspend(JDIThread.java:1489)
at org.eclipse.debug.internal.core.commands.SuspendCommand.execute(SuspendCommand.java:29)
at org.eclipse.debug.internal.core.commands.ForEachCommand.doExecute(ForEachCommand.java:30)
at org.eclipse.debug.internal.core.commands.DebugCommand$1.run(DebugCommand.java:204)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 7 Henrik Dohlmann CLA 2009-08-05 03:45:32 EDT
Just rechecked with 3.4.2 and there I can debug without problems. It seems that all the GC threads do not show up at all in the Debug view in 3.4.2 (or it happens so fast I cannot notice).
Comment 8 Henrik Dohlmann CLA 2009-08-10 08:02:28 EDT
I have a college with the same setup, and the same problem, so it is not faulty hardware!

Any idea? Not being able to debug is rather critical, i believe...
Comment 9 Henrik Dohlmann CLA 2009-09-09 09:21:06 EDT
Created attachment 146746 [details]
Error dialog while debugging

A debug session that starts in the Java Perspective.
The application is poked alot and works fine until the breakpoint is hit.
Then the attached dialog shows up.
When pressing details, the rendering of the window turns black.
Comment 10 Henrik Dohlmann CLA 2009-09-09 09:25:19 EDT
I have updated to JDK 1.6.0_16 and still have this problem.
The log contained the following stacktrace when I got the dialog shown in comment #9:

java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:597)
at org.eclipse.core.internal.jobs.WorkerPool.jobQueued(WorkerPool.java:145)
at org.eclipse.core.internal.jobs.JobManager.schedule(JobManager.java:1001)
at org.eclipse.core.internal.jobs.InternalJob.schedule(InternalJob.java:391)
at org.eclipse.core.runtime.jobs.Job.schedule(Job.java:435)
at org.eclipse.debug.core.DebugPlugin.fireDebugEventSet(DebugPlugin.java:471)
at org.eclipse.debug.core.model.DebugElement.fireEvent(DebugElement.java:95)
at org.eclipse.debug.core.model.DebugElement.fireCreationEvent(DebugElement.java:113)
at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget.createThread(JDIDebugTarget.java:498)
at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget$ThreadStartHandler.handleEvent(JDIDebugTarget.java:1847)
at org.eclipse.jdt.internal.debug.core.EventDispatcher.dispatch(EventDispatcher.java:155)
at org.eclipse.jdt.internal.debug.core.EventDispatcher.access$0(EventDispatcher.java:104)
at org.eclipse.jdt.internal.debug.core.EventDispatcher$1.run(EventDispatcher.java:250)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 11 Darin Wright CLA 2009-09-09 09:44:26 EDT
John, it looks like the SDK's VM is unable to create new threads a some point. Have you seen any other bugs like this?
Comment 12 John Arthorne CLA 2009-09-09 10:24:48 EDT
This looks like he simply exceeded the memory available. Each thread allocated needs a chunk of stack and heap memory, so I'm sure if you create enough threads you eventually run out of memory. I've never seen the thread label "garbage collected", but perhaps there is something preventing these threads from terminating (such as catching and ignoring java.lang.ThreadDeath).
Comment 13 Henrik Dohlmann CLA 2009-09-09 10:30:50 EDT
Well, as stated earlier, I run on a 64 bit machine with 4GB of RAM.
I start Eclipse with: -clean -vmargs -Xmx1g -XX:MaxPermSize=512m

Anything else i can try to tweak to see if it is a memory problem?
Comment 14 Darin Wright CLA 2009-09-09 10:33:41 EDT
Just a note: I think the threads in the app being debugged are a not the problem. The problem is allocating threads in the SDK where the debug client is executing. The debugger did not change much in 3.5 (from 3.4), so I'm not sure why there is an issue.
Comment 15 Henrik Dohlmann CLA 2009-09-28 03:01:47 EDT
Whatever small changes have been made between 3.4.2 and 3.5 has a major impact on 64-bit Windows running 32-bit JVM.

Am I the only one with that mix that see this problem?
Comment 16 Darko Ostricki CLA 2010-02-18 08:39:08 EST
I have exactly the same Problem with a 
Springsource Toolsuite Version: 2.1.0.SR01 (Eclipse 3.5.0)
eclipse crashes when debugging an application usually eith the error
>Unhandled event loop exception
>java.lang.OutOfMemoryError: unable to create new native thread

I'm using jdk 1.6.18 under 32bit WinOS

It seems the number of threads raises within one or two seconds from 60 up to over 800 threads which then cause the OutOfMemoryError.
I made a thread dump using visualVM with the result that there are a few hundred threads in it, and a all of the following kind:

"Worker-277" prio=6 tid=0x66040c00 nid=0x11f8 waiting on condition [0x7283f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.eclipse.core.internal.jobs.JobManager.join(JobManager.java:815)
	at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget$ThreadDeathHandler.handleEvent(JDIDebugTarget.java:1930)
	at org.eclipse.jdt.internal.debug.core.EventDispatcher.dispatch(EventDispatcher.java:155)
	at org.eclipse.jdt.internal.debug.core.EventDispatcher.access$0(EventDispatcher.java:104)
	at org.eclipse.jdt.internal.debug.core.EventDispatcher$1.run(EventDispatcher.java:250)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)

   Locked ownable synchronizers:
	- None

Would be good if it would be fixed soon.
Comment 17 Darin Wright CLA 2010-02-18 13:44:33 EST
Looks like this problem is caused by the fix to bug 269231. Each event set is handled in a separate job.
Comment 18 Darin Wright CLA 2010-02-18 17:59:37 EST
Created attachment 159499 [details]
patch

Patch processes thread start/death events in one job to avoid flooding the queue. As well, since thread start/death events are serialized we are sure to process them in the correct order (start before death), so the thread death event handler no longer has to join/wait on thread start event jobs.
Comment 19 Darin Wright CLA 2010-02-18 18:23:53 EST
Applied/Fixed. Please verify, Curtis.
Comment 20 Henrik Dohlmann CLA 2010-02-19 03:33:10 EST
(In reply to comment #16)
> I made a thread dump using visualVM with the result that there are a few
> hundred threads in it, and a all of the following kind:

Thanks, Darko.
It seems that was the information needed to pinpoint the bug :-)

I will try to remember visualVM for another time...

Workaround before patch: 
Running 64bit eclipse with a 64bit jvm and debugging my 32bit application using a 32bit jre runtime doesn't hang, but still experience extreme slowdowns and the excessive thread-tree in the thread view.
Comment 21 Darin Wright CLA 2010-02-22 10:05:10 EST
Re-opening. Unfortunately, this is causing test failures on Linux/Mac: bub 303486
Comment 22 Darin Wright CLA 2010-02-22 10:09:26 EST
I've reverted these changes until I can figure out why they are causing a failure. Some breakpoints are being missed with this change.
Comment 23 Darin Wright CLA 2010-02-22 15:17:37 EST
Created attachment 159842 [details]
patch

updated patch. 

On linux, it appears that a ThreadStartEvent set and BreakpointEvent set can delievered for the same thread, at the same time. (One would expect that until the thread start event set is resumed, a breakpoint in that thread cannot be hit). In this case, the breakpoint is on the first line of the a runnable, so it appears the VM is delivering both event sets at once. The debugger was processing the events sets at same time in different jobs, and the breakpoint event failed, since we had not yet created a model object for the thread (being processed in the thread start event job).

This fix uses scheduling rules to ensure that the thread start event is processed before any breakpoint event in the same thread.
Comment 24 Darin Wright CLA 2010-02-22 15:19:14 EST
Fixed. Please verify, Curtis.
Comment 25 Darin Wright CLA 2010-02-22 16:00:22 EST
Created attachment 159846 [details]
patch

Additional fix to avoid running jobs after shutdown. Applied.
Comment 26 Darin Wright CLA 2010-02-25 16:11:47 EST
I've reverted this change again, it appears to be causing bug 303966.
Comment 27 Jason CLA 2010-05-10 23:27:35 EDT
Created attachment 167859 [details]
Screenshot from ProcessExplorer (Windows)
Comment 28 Jason CLA 2010-05-10 23:27:56 EDT
Any update on this issue?

I am seeing a large number of native threads being created during debug (~800-900), all of which are "MSVCR71.dll!endthreadex+0x31".

This causing the CPU to spin up to 50%, and eclipse hangs.

I am currently unable to debug.

The application I am debugging does create a large number of short-lived threads regularly.

Running on (good old) Windows XP, Eclipse build 20100218-1602

Sample shot from Process Explorer attached

Thanks,

Jason
Comment 29 Darko Ostricki CLA 2010-05-11 03:22:40 EDT
As a Workaround tip:
For me it worked out to create a complete new workspace from the scratch. 
Don't try to copy the workspace or reuse the workspace metadata!!!
Comment 30 Jason CLA 2010-05-11 03:28:05 EDT
my workaround has been to reduce the number of threads created/destroyed by the application I am debugging.  With fewer threads it seems eclipse is able to handle the native thread creation/destruction without going into a tail spin.
Comment 31 Darin Wright CLA 2010-05-11 15:28:56 EDT
No one on the debug team has had more time to spend on this one. Since the fix is in a sensitive code area, and any potential fix is risky, this may not get fixed in 3.6.
Comment 32 Jason CLA 2010-05-11 19:34:42 EDT
Understood.  I think the problem only manifests with large numbers of threads created/destroyed in rapid succession (on Windows anyway).  For now the workaround to reduce thread numbers is fine.
Comment 33 Henrik Dohlmann CLA 2010-05-12 03:59:35 EDT
I will look into the possibility mentioned in the workaround. Unfortunately, I do not believe we can control the number of threads used while communicating with a 3d renderer through jni (cortona using jacozoom).
Comment 34 Nathan Reynolds CLA 2013-03-25 18:05:55 EDT
I am running 32-bit Eclipse (Juno SR 2 - 20130225-0426) on 64-bit Windows 7.  The process has 4 GB of virtual address space.  The program I am debugging is creating a lot of short-lived threads.  This causes Eclipse to create 1791 threads which won't exit.  The entire 4 GB address space is used up.  I suppose I could run 64-bit Eclipse to get more address space but then my machine will thrash.

All of the threads have exactly the same call stack!  Do we really need that many threads to deal with debuggee thread death events?  I would think only 1 thread would be required.

at java.lang.Thread.sleep(Native Method)
at org.eclipse.core.internal.jobs.JobManager.join(JobManager.java:925)
at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget$ThreadDeathHandler.handleEvent(JDIDebugTarget.java:2051)
at org.eclipse.jdt.internal.debug.core.EventDispatcher.dispatch(EventDispatcher.java:152)
at org.eclipse.jdt.internal.debug.core.EventDispatcher.access$0(EventDispatcher.java:100)
at org.eclipse.jdt.internal.debug.core.EventDispatcher$1.run(EventDispatcher.java:249)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
Comment 35 Nathan Reynolds CLA 2013-03-28 14:21:53 EDT
The simple program below reproduces the problem 100% of the time when debugged with 32-bit Eclipse IDE on 64-bit Windows 7.  The program never outputs anything.  Both Eclipse IDE and the program stop using the CPU.

Eclipse IDE has 1901 threads running.  It's virtual address space is completely used up (i.e all 4 GB used).  The debugged program is no where near maxed out on its address space.

I used SysInternal's VM Map to inspect the address space.  It calls out 1901 call stacks with an associated thread ID which consume 1 MB of virtual address space.  It calls out another 1901 call stacks which say "64-bit thread stack" which consume 256 KB of virtual address space.  Before I really dug into the problem, I assumed this meant the thread had exited and HotSpot didn't clean up the call stack yet.  Now that I see there is a 1:1 ratio between call stacks with thread ID and call stacks without an ID, I am wondering if the two are tied to each other.  If so, why consume so much space for both?  Why not put them together?

I took a thread dump of Eclipse.  1870 threads have the following call stack.  Some thing is definitely wrong with Eclipse JDT debugger.  These threads don't exit after several minutes.  These threads might exist because the debuggee's thread terminated instead of created.  I say that because the 3ʳᵈ frame says "ThreadDeathHandler".

    at java.lang.Thread.sleep(Native Method)
    at org.eclipse.core.internal.jobs.JobManager.join(JobManager.java:925)
    at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget$ThreadDeathHandler.handleEvent(JDIDebugTarget.java:2051)
    at org.eclipse.jdt.internal.debug.core.EventDispatcher.dispatch(EventDispatcher.java:152)
    at org.eclipse.jdt.internal.debug.core.EventDispatcher.access$0(EventDispatcher.java:100)
    at org.eclipse.jdt.internal.debug.core.EventDispatcher$1.run(EventDispatcher.java:249)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)

I used a BTrace script to dump the call stack when Thread.start() is called.  1896 threads were created with the following call stack.  Unfortunately, that isn't very helpful... to me.

   org.eclipse.core.internal.jobs.WorkerPool.jobQueued(WorkerPool.java:148)
   org.eclipse.core.internal.jobs.WorkerPool.startJob(WorkerPool.java:244)
   org.eclipse.core.internal.jobs.Worker.run(Worker.java:50)

Program

public class ThreadLauncher
{
   public static void main(String[] args)
   {
      Thread thread;
      int i;
     
      i = 0;
     
      try
      {
         for ( ; true; i++)
         {
            thread = new Thread();
           
            thread.setDaemon(false);
            thread.setName("Thread " + i);
            thread.setPriority(Thread.MIN_PRIORITY);
            thread.start();
         }
      }
      catch (Throwable t)
      {
         System.out.println("Failed to create thread #" + i);
         System.out.flush();
         t.printStackTrace();
         System.err.flush();
      }
   }
}
Comment 36 Bahir Bilgin CLA 2013-08-08 11:15:08 EDT
Hello all,


after analyzing the OutOfMemoryException while debugging in eclipse I found out following information, which might be helpful to solve the eclipse-problem:

at first some facts:
- we are working on a web application (.NET interoperating with JavaVM through Caffeine/JNI) which creates lots of threads.
- if a thread ends in the .NET-environment the corresponding Java-thread is also being shutdown/released at the same time.
- if a request lifetime ends in the .NET-env. the corresponding Java-thread is also being shutdown/released at the same time (this dued to the problem in eclipse).

Now, when I tried debugging (remotely bound to the JavaVM-part of our application) I could sometimes observe that lots of threads are created in eclipse's Debug-Window and shortly after that or almost at the same time eclipse freezes.

The more interesting thing was, when I had a look on eclipse's Heap Dump with Eclipse Memory Analyzer after eclipse got frozen.
There have been about 700 Workers running, i.e. threads opened.

So, what I suppose is that the eclipse debugger tries to "catch" the "lost" Java-threads (shutdown or killed by .NET) by repeatingly instantiating new workers until memory space is completely filled by them.



OUR SOLUTION/WORKAROUND was to change the logic of our application as this:
When requests are finished in .NET the Java threads are not released any more. After this change the eclipse debugger works fine as it should.


I hope this information helps fixing this problem.
Comment 37 Warwick Burrows CLA 2014-06-12 20:38:01 EDT
I'm seeing the same problem with Kepler SR2 and Win 7 64 bit and java 1.6.0.41.  Windows resource monitor says that the process has over 18000 threads and when I hit Ctrl-break in the eclipse/java console window the thread dumps have the same thread death related join profile. Is there someway to get a higher priority on fixing this? RIght now I have to download netbeans as I have a serious blocking issue I need to debug and can't do it with Eclipse anymore.  Would it help to shorten the join timeout used?  Could ThreadDeathHandler detect these cases an not join since it seems that the thread is gone anyway?

Here's the stack trace.

"Worker-2199" prio=6 tid=0x000000001e644800 nid=0x148bc waiting on condition [0x00000000db77e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.eclipse.core.internal.jobs.JobManager.join(JobManager.java:935)
        at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget$ThreadDeathHandler.handleEvent(JDIDebugTarget.java:2055)
        at org.eclipse.jdt.internal.debug.core.EventDispatcher.dispatch(EventDispatcher.java:152)
        at org.eclipse.jdt.internal.debug.core.EventDispatcher.access$0(EventDispatcher.java:100)
        at org.eclipse.jdt.internal.debug.core.EventDispatcher$1.run(EventDispatcher.java:249)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
Comment 38 Eclipse Genie CLA 2017-05-12 19:55:57 EDT
New Gerrit change created: https://git.eclipse.org/r/97002
Comment 39 Igor Fedorenko CLA 2017-05-13 12:10:41 EDT
(In reply to Eclipse Genie from comment #38)
> New Gerrit change created: https://git.eclipse.org/r/97002

This proposed change will process most JDI events on single processing job/thread in the order they arrive from JVM. Events that require expression evaluation are still processed on separate threads (started from the processing thread), this is necessary because expression evaluation may result in secondary events that require evaluation and therefor cannot be on the same thread.

All tests pass both on Linux and OSX, so I wonder if there is a chance to include this bugfix in Eclipse 4.7? Thank you in advance.
Comment 40 Andrey Loskutov CLA 2017-05-13 16:56:25 EDT
(In reply to Igor Fedorenko from comment #39)
> (In reply to Eclipse Genie from comment #38)
> > New Gerrit change created: https://git.eclipse.org/r/97002
> 
> This proposed change will process most JDI events on single processing
> job/thread in the order they arrive from JVM. 

I must say, quite a bold change. I don't know how other debugger are built, but serialising async events sounds dangerous for me, or at least probably causing behavioral changes? But I must comfess I have not much experience in this area. 

> Events that require expression
> evaluation are still processed on separate threads (started from the
> processing thread), this is necessary because expression evaluation may
> result in secondary events that require evaluation and therefor cannot be on
> the same thread.
> 
> All tests pass both on Linux and OSX, 

All existing tests. 
Could you try to add some regression test for the current issue?

> so I wonder if there is a chance to
> include this bugfix in Eclipse 4.7? Thank you in advance.

No, I think this is too late, especially for such change. 

We should try to review and merge to 4.8 and if this would work good, backport to 4.7.1.
Comment 41 Igor Fedorenko CLA 2017-05-13 17:31:11 EDT
(In reply to Andrey Loskutov from comment #40)
> (In reply to Igor Fedorenko from comment #39)
> > (In reply to Eclipse Genie from comment #38)
> > > New Gerrit change created: https://git.eclipse.org/r/97002
> > 
> > This proposed change will process most JDI events on single processing
> > job/thread in the order they arrive from JVM. 
> 
> I must say, quite a bold change. I don't know how other debugger are built,
> but serialising async events sounds dangerous for me, or at least probably
> causing behavioral changes? But I must comfess I have not much experience in
> this area. 
> 

The events were never async, there was always single pump thread reading events from the target but then dispatching them to new threads, one thread per event. The reason for one-thread-per-event implementation, I believe, is bug 269231, which boils down to conditional breakpoint expression evaluation hitting secondary breakpoint. The proposed change keeps this behaviour, but process all "simple" events on single worker thread. There was also bug 271700, which added fancy job synchronization to restore order of some events, thread-start and breakpoint in that thread, in particular. The proposed change does not require this synchronization because thread-start events are always processed by the primary processing job and will always be handled before any other event from that thread, assuming jvm orders thread events, of course.


> > Events that require expression
> > evaluation are still processed on separate threads (started from the
> > processing thread), this is necessary because expression evaluation may
> > result in secondary events that require evaluation and therefor cannot be on
> > the same thread.
> > 
> > All tests pass both on Linux and OSX, 
> 
> All existing tests. 
> Could you try to add some regression test for the current issue?

I am not sure there is a direct way to test this. The best I can think of is to start 1000 threads in the target jvm and count IJobChangeEvent for jobs of particular type. Unfortunately this will be very sensitive to job class name and the test can become useless after a simple package rename (among virtually endless other possible refactorings). Does this sound reasonable?
Comment 42 Igor Fedorenko CLA 2017-05-15 13:16:13 EDT
I updated the gerrit change to include a regression test. Let me know if there is anything else I can do to help your review and accept the change.
Comment 43 Sarika Sinha CLA 2017-05-16 01:05:24 EDT
(In reply to Igor Fedorenko from comment #42)
> I updated the gerrit change to include a regression test. Let me know if
> there is anything else I can do to help your review and accept the change.

Thanks Igor, will look at it after 4.7 RCs are done.
Comment 44 Igor Fedorenko CLA 2017-07-05 07:06:28 EDT
Is there anything I can do to help review and merge this change? 

Also, will you consider 4.7.1 backport of I submit a gerrit change against 4.7 maintenance branch?
Comment 45 Sarika Sinha CLA 2017-07-05 07:58:51 EDT
@Igor, 
We will release to 4.8 and observe for couple of weeks. We can cherry pick to different branch so you need not provide another patch.

@Andrey,
Are you ok with releasing this change or have some concerns ?
Comment 46 Sarika Sinha CLA 2017-07-05 08:00:21 EDT
@Igor,
Can you please also test in the related areas of following bugs -
bug 269231 and bug 303966
Comment 47 Andrey Loskutov CLA 2017-07-05 09:11:22 EDT
(In reply to Sarika Sinha from comment #45)
> @Igor, 
> We will release to 4.8 and observe for couple of weeks. We can cherry pick
> to different branch so you need not provide another patch.
> 
> @Andrey,
> Are you ok with releasing this change or have some concerns ?

I'm doing the review right now (I will comment on patch), I think we should try to merge in M1.
Comment 48 Andrey Loskutov CLA 2017-07-05 11:10:58 EDT
(In reply to Andrey Loskutov from comment #47)
> (In reply to Sarika Sinha from comment #45)
> > @Igor, 
> > We will release to 4.8 and observe for couple of weeks. We can cherry pick
> > to different branch so you need not provide another patch.
> > 
> > @Andrey,
> > Are you ok with releasing this change or have some concerns ?
> 
> I'm doing the review right now (I will comment on patch), I think we should
> try to merge in M1.

So the patch works fine with the program from comment 35:

import java.util.concurrent.atomic.AtomicInteger;
public class ThreadLauncher {
	public static void main(String[] args) {
		Thread thread;
		AtomicInteger count = new AtomicInteger();
		try {
			for (int i = 0; true; i++) {
				count.set(i);
				thread = new Thread() {
					@Override
					public void run() {
						super.run();
						System.out.println("# " + count.get());
					}
				};
				thread.setDaemon(false);
				thread.setName("Thread " + i);
				thread.setPriority(Thread.MIN_PRIORITY);
				thread.start();
			}
		} catch (Throwable t) {
			System.out.println("Failed to create thread #" + count.get());
			System.out.flush();
			t.printStackTrace();
			System.err.flush();
		}
	}
}

With the patch, the target JVM dies on my workstation/Java after creating some ~1.500.000 threads because of OOM, and Eclipse JVM is not affected in any way. Without the patch I see much slower execution (because Eclipse is busy creating debugger threads) and both JVM's crash (freeze)pretty soon (after ~200.000 threads created).
Comment 49 Sarika Sinha CLA 2017-07-06 01:40:22 EDT
I tried the above piece of code without patch, I see the output as #75831780 
and it's still running.
Comment 50 Sarika Sinha CLA 2017-07-06 05:19:04 EDT
I am not sure what configuration matters, but I could not reproduce OOM even after 6 hours and I terminated with a count of # 206859540
Comment 51 Andrey Loskutov CLA 2017-07-06 05:46:54 EDT
Sarika, try with the original code from comment 35. It should die faster.
Comment 52 Igor Fedorenko CLA 2017-07-06 07:13:27 EDT
Sorry, didn't have time to work on this yesterday, looking at review comments now.

@Sarika I can reliably reproduce the problem on two OSX systems (8 core 16 thread Mac Pro and 4c/8t macbook pro) with Oracle JDK 1.8.0_131 using the code from comment #35. Our internal users report the problem on similarly equipped Linux systems when debugging our internal app. Also, I think recently released fix for bug 516609 will make this bug less likely to cause any permanent damage to the system, so make sure you are not using latest platform.runtime bundles.
Comment 53 Igor Fedorenko CLA 2017-07-06 08:32:43 EDT
I believe I addressed all raised concerns in the latest gerrit patchset. Thank you for the feedback and good catch re JDIDebugTarget.ThreadDeathHandler, I totally missed it.
Comment 54 Sarika Sinha CLA 2017-07-07 04:09:10 EDT
(In reply to Igor Fedorenko from comment #52)
> Sorry, didn't have time to work on this yesterday, looking at review
> comments now.
> 
> @Sarika I can reliably reproduce the problem on two OSX systems (8 core 16
> thread Mac Pro and 4c/8t macbook pro) with Oracle JDK 1.8.0_131 using the
> code from comment #35. Our internal users report the problem on similarly
> equipped Linux systems when debugging our internal app. Also, I think
> recently released fix for bug 516609 will make this bug less likely to cause
> any permanent damage to the system, so make sure you are not using latest
> platform.runtime bundles.

I tried  22nd June build and 6th July build but could not reproduce OOM using original code from comment 35.

Help me understand your comment - after platform.runtime changes this bug will be less evident ?
Comment 55 Andrey Loskutov CLA 2017-07-07 05:10:52 EDT
(In reply to Sarika Sinha from comment #54)
> I tried  22nd June build and 6th July build but could not reproduce OOM
> using original code from comment 35.

OOM is not the main problem, the problem is the not responsible Eclipse / debugger. OOM (on both sides) is a side effect of creating too many native threads. So depending on the JVM max heap size and workstation "power" OOM may or may not happen.

> Help me understand your comment - after platform.runtime changes this bug
> will be less evident ?

In theory this should reduce the OOM probability, because there will be not that many "idle" threads kept running by the job framework.

Please note, that mandatory part of the reproducer is to have Debug view opened.

Anecdote: I just froze my Windows 7 notebook trying (successfully) to reproduce the problem on Windows.

My steps:
1 Start Eclipse #1
2 Start Eclipse #2 from Eclipse
3 In Eclipse #2 create new Java project and paste the snippet from comment #35
4 In Eclipse #2 open Debug view 
5 In Eclipse #2 right click on java example -> debug as java application
6 Depending on the code (with/without patch) Eclipse #2 works fine or freezes pretty soon.
Comment 56 Sarika Sinha CLA 2017-07-07 06:25:02 EDT
(In reply to Andrey Loskutov from comment #55)
> 
> Please note, that mandatory part of the reproducer is to have Debug view
> opened.
 Thanks for the tip !!
Comment 57 Andrey Loskutov CLA 2017-07-07 10:23:41 EDT
(In reply to Sarika Sinha from comment #56)
> (In reply to Andrey Loskutov from comment #55)
> > 
> > Please note, that mandatory part of the reproducer is to have Debug view
> > opened.
>  Thanks for the tip !!

So you can reproduce now?

For me this patch works on Linux RHEL 7.2 and Windows 7, both JDK 1.8.0_131 64 bit. I think we should merge it *now* so that starting with M1 we can see if anyone finds a use case to break the new event dispatching in a single thread (mostly).
Comment 58 Sarika Sinha CLA 2017-07-08 04:24:03 EDT
Yes, debugger becomes responsive.

After the patch we don't get the problem of ir-responsive debugger, but we do get OOM after some time.
Comment 59 Igor Fedorenko CLA 2017-07-08 06:56:41 EDT
OOME in which JVM? Also, what operating system and java version do you use?
Comment 60 Sarika Sinha CLA 2017-07-09 01:55:52 EDT
I am using Windows 7 with Oracle 1.8.0_102

Also getting many timeout exceptions -
org.eclipse.jdi.TimeoutException: Timeout occurred while waiting for packet 2132866.
	at org.eclipse.jdi.internal.connect.PacketReceiveManager.getReply(PacketReceiveManager.java:193)
	at org.eclipse.jdi.internal.connect.PacketReceiveManager.getReply(PacketReceiveManager.java:204)
	at org.eclipse.jdi.internal.MirrorImpl.requestVM(MirrorImpl.java:192)
	at org.eclipse.jdi.internal.MirrorImpl.requestVM(MirrorImpl.java:227)
	at org.eclipse.jdi.internal.MirrorImpl.requestVM(MirrorImpl.java:243)
	at org.eclipse.jdi.internal.ObjectReferenceImpl.referenceType(ObjectReferenceImpl.java:526)
	at org.eclipse.jdt.internal.debug.core.model.JDIThread.determineIfDaemonThread(JDIThread.java:571)
	at org.eclipse.jdt.internal.debug.core.model.JDIThread.initialize(JDIThread.java:350)
	at org.eclipse.jdt.internal.debug.core.model.JDIThread.<init>(JDIThread.java:313)
	at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget.newThread(JDIDebugTarget.java:660)
	at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget.createThread(JDIDebugTarget.java:636)
	at org.eclipse.jdt.internal.debug.core.model.JDIDebugTarget$ThreadStartHandler.handleEvent(JDIDebugTarget.java:2354)
	at org.eclipse.jdt.internal.debug.core.EventDispatcher.dispatch(EventDispatcher.java:152)
	at org.eclipse.jdt.internal.debug.core.EventDispatcher.access$2(EventDispatcher.java:100)
	at org.eclipse.jdt.internal.debug.core.EventDispatcher$EventDispatchJob.run(EventDispatcher.java:284)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:56)
Comment 61 Igor Fedorenko CLA 2017-07-09 07:17:22 EDT
I believe there are at least three separate problems with current processing "bursts" and "floods" of JDI EventSet

* Each event is processed on separate thread. This results in large number native threads created by Eclipse and affects host operating system running Eclipse, i.e. entire(!) system hangs or looses network connection or crippled in other ways that require reboot. https://git.eclipse.org/r/97002 solves this problem.

* JDI EventSet to DebugEvent[] conversion takes too long. I ran some very informal perf tests yesterday and on my hardware the program from comment #35 floods Eclipse with ~30 event sets each millisecond, but Eclipse is only able to create ~10 corresponding DebugEvent[] each millisecond. Even with https://git.eclipse.org/r/97002 this results in excessive Eclipse jvm heap use and eventual OOME. There are few possible ways to improve here, but I suggest we deal with "too many threads" problem first, which I believe is far more severe.

* Each DebugPlugin#fireDebugEventSet(DebugEvent[]) results in a separate Display#asyncExec, which results in poor UI behaviour when Eclipse is flooded by JDT events. On OSX Eclipse keeps updating Debug view long after the flood stops, for example. This is something we will need to follow up with debug core team.

My recommendation is to merge https://git.eclipse.org/r/97002 because it stops Eclipse from crippling host operating system and deal with the other problems with followup subsequent changes.
Comment 62 Eclipse Genie CLA 2017-07-09 16:47:58 EDT
New Gerrit change created: https://git.eclipse.org/r/100968
Comment 63 Sarika Sinha CLA 2017-07-10 00:15:06 EDT
(In reply to Eclipse Genie from comment #62)
> New Gerrit change created: https://git.eclipse.org/r/100968

After this patch, I don't get OOME and the timeouts.
But the debugger goes slow after a while say after # 930360.
Comment 64 Eclipse Genie CLA 2017-07-10 20:44:46 EDT
New Gerrit change created: https://git.eclipse.org/r/101020
Comment 65 Eclipse Genie CLA 2017-07-11 19:05:15 EDT
New Gerrit change created: https://git.eclipse.org/r/101086
Comment 66 Igor Fedorenko CLA 2017-07-11 19:34:53 EDT
At this point I do not plan additional work on this unless jdt.debug developers request specific adjustments to proposed changes or there are new ideas

I recommend merging https://git.eclipse.org/r/#/c/97002/ as is. It's gone through few code review iterations and I believe significantly improves Eclipse debugger stability when target vm sends bursts of jdi commands. 

I also recommend merging Andrey's fix for bug 519433 (https://git.eclipse.org/r/#/c/101033/), which significantly improves Debug view performance.

The following small example reliably kills Eclipse (and often OSX) without the two changes and works mostly okay with them.

      import java.util.concurrent.Semaphore;
      public class ThreadLauncher10k {
        public static void main(String[] args) throws Exception {
          final Semaphore semaphore = new Semaphore(32);
          for (long threadCount = 0; threadCount < 10000; threadCount++) {
            semaphore.acquire();
            Thread thread = new Thread() {
              @Override
              public void run() {
                semaphore.release();
              }
            };
            thread.setName("thread-" + threadCount);
            thread.start();
          }
          System.out.println("Press Enter");
          System.in.read();
        }
      }


I also recommend we do not attempt to solve jdi packets "prolonged flood" use case demonstrated by the example in comment #35. I submitted https://git.eclipse.org/r/#/c/101086/ proof-of-concept that shows one likely workable way to cap jdi packet buffer, but I am not sure that approach provides good-enough user experience and generally not sure how eclipse should deal with jdi packet rates it cannot handle.
Comment 67 Sarika Sinha CLA 2017-07-12 06:06:13 EDT
Created attachment 269327 [details]
Debug View threads missing

After applying the patch https://git.eclipse.org/r/#/c/97002/ and testing the ThreadLauncher10k Semaphore code, I can see the debug view displaying the empty lines .

I don't see this kind of behavior without the patch.
Comment 68 Sarika Sinha CLA 2017-07-12 06:36:16 EDT
After applying the patch from Bug 519433, the missing lines does reduce on dEbug View, but it does not disappear totally.
Comment 69 Igor Fedorenko CLA 2017-07-12 07:02:12 EDT
Can you confirm you are able to reproduce eclipse (and/or operating system) crash using ThreadLauncher10k without the patches applied?
Comment 70 Igor Fedorenko CLA 2017-07-12 08:02:43 EDT
fwiw I just tried the patches on 4 core 4 thread windows 10 box. Debug view is updated correctly if I wait for eclipse to process all ui updates (i.e. Debug view title changes from busy italics to normal font). Debug view gets corrupted and shows empty lines if I start interacting with the view while it is busy. If this is the behaviour you see, I believe this is unrelated bug and you should be able to reproduce it without patches discussed here.
Comment 71 Sarika Sinha CLA 2017-07-13 01:47:34 EDT
(In reply to Igor Fedorenko from comment #69)
> Can you confirm you are able to reproduce eclipse (and/or operating system)
> crash using ThreadLauncher10k without the patches applied?

Not on windows but yes on Mac I did observe the debugger hanging with 1024m, with 2048m it worked ok.
Comment 73 Eclipse Genie CLA 2017-07-13 08:20:21 EDT
New Gerrit change created: https://git.eclipse.org/r/101180
Comment 75 Andrey Loskutov CLA 2017-07-13 09:11:21 EDT
(In reply to Igor Fedorenko from comment #66)
> 
> I recommend merging https://git.eclipse.org/r/#/c/97002/ as is. It's gone
> through few code review iterations and I believe significantly improves
> Eclipse debugger stability when target vm sends bursts of jdi commands. 
> 
> I also recommend merging Andrey's fix for bug 519433
> (https://git.eclipse.org/r/#/c/101033/), which significantly improves Debug
> view performance.
> 
> The following small example reliably kills Eclipse (and often OSX) without
> the two changes and works mostly okay with them.

I've merged now two mentioned patches above, thanks Igor!

> I also recommend we do not attempt to solve jdi packets "prolonged flood"
> use case demonstrated by the example in comment #35. I submitted
> https://git.eclipse.org/r/#/c/101086/ proof-of-concept that shows one likely
> workable way to cap jdi packet buffer, but I am not sure that approach
> provides good-enough user experience and generally not sure how eclipse
> should deal with jdi packet rates it cannot handle.

I agree with all of that. I have no idea / no time for further investigations. I think we got some more robustness now with the two patches, so I propose any ideas for further improvements in the "JDI event flood" area should go into a new one.

I'm not closing this one for possible backport into 4.7.1.

@Igor, if you are interested in back-porting it, it would be nice if you could cherry pick related commits into 4_7_maintenance branch. I don't think we should merge into 4.7.1 *now*, let see if we will get some regressions in M1-M2.
Comment 76 Igor Fedorenko CLA 2017-07-14 07:49:24 EDT
(In reply to Andrey Loskutov from comment #75)
> 
> I'm not closing this one for possible backport into 4.7.1.
> 
> @Igor, if you are interested in back-porting it, it would be nice if you
> could cherry pick related commits into 4_7_maintenance branch. I don't think
> we should merge into 4.7.1 *now*, let see if we will get some regressions in
> M1-M2.

I am little confused about proposed timeline of the backport. Oxygen.1 is in September, which I think implies the change will need to be merged to 4_7_maintenance branch by middle of August. So, in other words, if there are no regressions reported immediately after 4.8 M1 (which is usually first half of August), you will merge the backport. Did I get this right?
Comment 77 Sarika Sinha CLA 2017-07-24 02:24:38 EDT
(In reply to Igor Fedorenko from comment #76)
> (In reply to Andrey Loskutov from comment #75)
> > 
> > I'm not closing this one for possible backport into 4.7.1.
> > 
> > @Igor, if you are interested in back-porting it, it would be nice if you
> > could cherry pick related commits into 4_7_maintenance branch. I don't think
> > we should merge into 4.7.1 *now*, let see if we will get some regressions in
> > M1-M2.
> 
> I am little confused about proposed timeline of the backport. Oxygen.1 is in
> September, which I think implies the change will need to be merged to
> 4_7_maintenance branch by middle of August. So, in other words, if there are
> no regressions reported immediately after 4.8 M1 (which is usually first
> half of August), you will merge the backport. Did I get this right?

Frankly speaking, I am not sure about backport to 4.7.1, may be 4.7.2 will be better. We are still having continuous test failure.
Comment 78 Sarika Sinha CLA 2017-11-07 05:13:23 EST
I am leaving it for 4.8