Bug 576521 - [win32] Debug view blocks UI for extended period (freeze)
Summary: [win32] Debug view blocks UI for extended period (freeze)
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Debug (show other bugs)
Version: 4.8   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 4.22 M3   Edit
Assignee: Jörg Kubitz CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance, regression
Depends on: 576559
Blocks:
  Show dependency tree
 
Reported: 2021-10-08 04:55 EDT by Jörg Kubitz CLA
Modified: 2021-11-11 01:37 EST (History)
5 users (show)

See Also:


Attachments
Visual Vm Sampling snapshot.nps (9.48 KB, application/octet-stream)
2021-10-08 04:55 EDT, Jörg Kubitz CLA
no flags Details
Screenshot Stacktraces.png (177.30 KB, image/png)
2021-10-08 06:09 EDT, Jörg Kubitz CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Kubitz CLA 2021-10-08 04:55:09 EDT
Created attachment 287282 [details]
Visual Vm Sampling snapshot.nps

UI was blocked for several dozen seconds.

Same stacktraces as in bug 552328#c33 (https://bugs.eclipse.org/bugs/show_bug.cgi?id=552328#c33)

 Version: 2021-06 (4.20) Build id: I20210611-1600

99% time was spend in
 org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider$DelayedDoModelChangedJob.runInUIThread ()

due to
 org.eclipse.swt.internal.win32.OS.SendMessage[native] ()
in
 org.eclipse.swt.widgets.Tree.getSelection ()
Comment 1 Sarika Sinha CLA 2021-10-08 06:02:32 EDT
Can you attach an image, I can't see nps file.
Comment 2 Jörg Kubitz CLA 2021-10-08 06:09:48 EDT
Created attachment 287284 [details]
Screenshot Stacktraces.png

(In reply to Sarika Sinha from comment #1)
> Can you attach an image, I can't see nps file.

you could download freely 
https://visualvm.github.io/download.html
Comment 3 Jörg Kubitz CLA 2021-10-08 07:28:57 EDT
I do not know if that was the issue, but i can reproduce the same behavior by spawning 9000 idle Threads. Run this in debug mode:

public class Bug552328 {
	public static void main(String[] args) {
		int waitMillis = 100_000;
		int threadCount = 9_000;
		java.util.ArrayList<Thread> threads = new java.util.ArrayList<>();
		for (int i = 0; i < threadCount; i++) {
			Thread r = new Thread(() -> {				
				try {
					Thread.sleep(waitMillis);
				} catch (InterruptedException e) {
				}
			}, "" + i); 
			threads.add(r);
		}
		for (Thread t: threads) {
			t.start();
		}
		try {
			Thread.sleep(waitMillis);
		} catch (InterruptedException e) {
		}
	}
}

That freezes eclipse IDE for 2 minutes.

i.e. updating the thread's names (bug 552328) is not the problem.
But adding each of the n threads individually (one by one) to the tree seems to have effort O(n*n).
Comment 4 Jörg Kubitz CLA 2021-10-08 07:48:07 EDT
No i am convinced the "high" thread count is the problem:
In the Sampling Screenshot you can see the
org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.handleAdd(IModelDelta) is the problem. That is called for each individual thread and updates the whole tree with
	getViewer().setChildCount(parentPath, viewCount);
        ...
	getViewer().replace(parentPath, viewIndex, element);

I am not sure if the solution could be in platform debug to defer the tree viewer update until all deltas processed or if the solution should be in jface /swt that these methods should not spend so much time in getSelection(). 
I did not even select anything in that tree.
Comment 5 Jörg Kubitz CLA 2021-10-08 07:52:19 EDT
 bug 546354 
 bug 551800
both sow "getSelection" as symptoms too.
Ma also relate to 
 Bug 564448 - [Styled Text] Block Selection performance depends on selection direction
Comment 6 Jörg Kubitz CLA 2021-10-11 05:29:51 EDT
With eclipse 3.7 the debug view was very slow too for this many threads but at least the UI did not freeze. => regression (maybe due to bug 161435)

The test code needed to be a bit modified to run on that old java:

public class Bug576521 {
	public static void main(String[] args) {
		final int waitMillis = 100000;
		int threadCount = 9000;
		java.util.ArrayList<Thread> threads = new java.util.ArrayList<Thread>();
		for (int i = 0; i < threadCount; i++) {
			Thread r = new Thread(new Runnable() {
				@Override
				public void run() {				
					try {
						Thread.sleep(waitMillis);
					} catch (InterruptedException e) {
					}
				}
			}, "" + i); 
			threads.add(r);
		}
		for (Thread t: threads) {
			t.start();
		}
		try {
			Thread.sleep(waitMillis);
		} catch (InterruptedException e) {
		}
	}
}
Comment 7 Eclipse Genie CLA 2021-10-11 08:02:06 EDT
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.debug/+/186356
Comment 8 Mickael Istria CLA 2021-10-11 08:13:35 EDT
Bug info is confusing: Is this a regression? Did it start in Eclipse 3.8?
Comment 9 Jörg Kubitz CLA 2021-10-11 10:37:30 EDT
(In reply to Mickael Istria from comment #8)
> Bug info is confusing: Is this a regression? Did it start in Eclipse 3.8?

What exactly is confusing? See comment 6.
Comment 10 Jörg Kubitz CLA 2021-10-11 10:41:44 EDT
BTW can anybody please test with code from comment 6 if this is an issue on Linux/Mac too?
Comment 11 Mickael Istria CLA 2021-10-11 11:00:35 EDT
(In reply to Jörg Kubitz from comment #10)
> BTW can anybody please test with code from comment 6 if this is an issue on
> Linux/Mac too?

You could try to format it as a test and send it to Gerrit to get a hint of how it behaves on Linux.
Comment 12 Thomas Wolf CLA 2021-10-11 14:58:09 EDT
(In reply to Jörg Kubitz from comment #10)
> BTW can anybody please test with code from comment 6 if this is an issue on
> Linux/Mac too?

Tried that (from comment 3, though) on Mac OS X 10.14.6 with Eclipse 2021-09. Spectacular results: blocks a bit, then not, then blocks a bit more, then crashes my Mac hard and makes it reboot. First time I thought it was coincidence, tried again, same result, but it crashed even harder and didn't find mouse or keyboard anymore after the reboot. I won't try a third time ;-) Apparently creating 9000 threads causes serious trouble.

That tree is a virtual tree so things may be rather special anyway. But seeing that you've identified TreeViewer.preservingSelection() as being at least part of this problem in debug.ui, you might be interested in EGit bug 519231, too: EGit's staging view also blocked the UI for an extended time when there were many nodes and then the presentation was switched from tree view to list view. The work-around there was simply to get the selection, clear it, perform the update, and then re-set the selection again so that preservingSelection() essentially had no work to do. (Code change in EGit for that is https://git.eclipse.org/r/c/egit/egit/+/165364 .) So improvements to preservingSelection() are certainly a good idea.
Comment 13 Jörg Kubitz CLA 2021-10-11 15:46:44 EDT
(In reply to Thomas Wolf from comment #12)

Thanks for testing, Thomas.

> Tried that (from comment 3, though) on Mac OS X 10.14.6
> Apparently creating 9000 threads causes serious trouble.

Report that to JDK please. Java threads are meant to be light weight. 
Expected is a clean "unable to create new native thread" Error if OS does not support more. On Linux/Windows tens of thousands threads are pretty normal on our productions.

> view to list view. The work-around there was simply to get the selection,
> clear it, perform the update, and then re-set the selection again so that
> preservingSelection() essentially had no work to do.

Interesting workaround. Thanks for sharing. I'll do my best to not lose selection temporarily.
Comment 14 Thomas Wolf CLA 2021-10-11 16:00:58 EDT
(In reply to Jörg Kubitz from comment #13)
> (In reply to Thomas Wolf from comment #12)
> 
> Thanks for testing, Thomas.
> 
> > Tried that (from comment 3, though) on Mac OS X 10.14.6
> > Apparently creating 9000 threads causes serious trouble.
> 
> Report that to JDK please. Java threads are meant to be light weight. 

Not worth doing so as the precise cause is unclear. Might have been the number of threads, or might have been the debugging Eclipse trying to update its view.
I didn't even get a crash dump. Not even anything in the system logs.

UI freeze monitoring logged multiple freezes with this stack trace before the crash:

	at org.eclipse.swt.internal.cocoa.OS.objc_msgSend(Native Method)
	at org.eclipse.swt.internal.cocoa.NSOutlineView.reloadItem(NSOutlineView.java:70)
	at org.eclipse.swt.widgets.Tree.setItemCount(Tree.java:3111)
	at org.eclipse.swt.widgets.TreeItem.setItemCount(TreeItem.java:1412)
	at org.eclipse.jface.viewers.TreeViewer.lambda$2(TreeViewer.java:392)
	at org.eclipse.jface.viewers.TreeViewer$$Lambda$1399/0x000000010157b040.run(Unknown Source)
	at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1400)
	at org.eclipse.jface.viewers.TreeViewer.preservingSelection(TreeViewer.java:363)
	at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1361)
	at org.eclipse.jface.viewers.TreeViewer.setChildCount(TreeViewer.java:384)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.handleAdd(TreeModelContentProvider.java:1343)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.updateNodes(TreeModelContentProvider.java:1263)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.updateNodes(TreeModelContentProvider.java:1299)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.updateNodes(TreeModelContentProvider.java:1299)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.updateNodes(TreeModelContentProvider.java:1299)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.updateModel(TreeModelContentProvider.java:563)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.doModelChanged(TreeModelContentProvider.java:530)
	at org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider$DelayedDoModelChangedJob.runInUIThread(TreeModelContentProvider.java:436)

I've seen other reports about SWT on Mac to cause Eclipse crashes if there are exceptions during objc_msgSend. Maybe an OOME? Causing computer crashes is new though. But with so little hard data this can't really be reported at JDK. (Or at SWT.)
Comment 15 Eclipse Genie CLA 2021-10-12 05:10:59 EDT
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.debug/+/186381
Comment 16 Jörg Kubitz CLA 2021-10-12 06:06:05 EDT
UI freeze was introduced with bug 519433 by queuing up events (https://git.eclipse.org/r/c/platform/eclipse.platform.debug/+/101033/).
It was running slow - but without freeze - in oxygen (4.7).
Comment 17 Andrey Loskutov CLA 2021-10-12 09:56:07 EDT
(In reply to Jörg Kubitz from comment #10)
> BTW can anybody please test with code from comment 6 if this is an issue on
> Linux/Mac too?

I get constant OOM on Linux, because of VM not being able create so many native threads.

The OOM problem could be avoided (and this bug reproduced on Linux) by using this snippet:

public class Bug576521 {
	/**
	 * Create a constant load for the Debug view model, without hitting system limits for max
	 * thread number. The thread creation / death cause UI model updates because of adding/removing
	 * elements in the Debug view tree. There is a lot of overhead in JFace.
	 */
	public static void main(String[] args) {
		int threadCount = 2_000;
		long start = System.currentTimeMillis();
		while(System.currentTimeMillis() - start < 100_000) {
			java.util.ArrayList<Thread> threads = new java.util.ArrayList<>();
			for (int i = 0; i < threadCount; i++) {
				Thread r = new Thread(() -> {
					try {
						Thread.sleep(500);
					} catch (InterruptedException e) {
					}
				}, "" + i);
				threads.add(r);
			}
			for (Thread t: threads) {
				t.start();
			}
			try {
				Thread.sleep(1000);
			} catch (InterruptedException e) {
			}
		}
		System.out.println("Done");
	}
}

With that I can see UI freezes on Linux too, not just immediate OOM's.
Comment 19 Andrey Loskutov CLA 2021-10-12 11:54:13 EDT
(In reply to Eclipse Genie from comment #18)
> Gerrit change
> https://git.eclipse.org/r/c/platform/eclipse.platform.debug/+/186381 was
> merged to [master].
> Commit:
> http://git.eclipse.org/c/platform/eclipse.platform.debug.git/commit/
> ?id=485d05449115e746767fb9516931c39249d5ecec

The flood of thread start/thread death events can't be handled by JFace trees in the speed we can receive them from the debuggee. What is interesting here, that the *pauses* in the model updates added by the batch size limit in the patch allow Debug view to *reduce* the actual work updating elements in the JFace tree. This sounds strange, but that can be observed with the snippet from comment 17.

*How exactly* this happens, I have no explanation now, to understand that one has to spend lot of time debugging this debug model updates and may be instrument JFace/SWT tree elements add/remove/update count that *actually* happens on the UI side. For me, it looks like we reduce number of simultaneously existing tree elements by avoiding creation of tree elements, but I don't see *where* this reduction happens, because we have exact same number of debug events to proceed in both cases. Could be somewhere around org.eclipse.debug.internal.ui.viewers.model.TreeModelContentProvider.fWaitingRequests, but may be something completely different. The async debug updates design is extremely hard to debug :-)
Comment 20 Jörg Kubitz CLA 2021-10-12 12:26:14 EDT
(In reply to Andrey Loskutov from comment #19)

> *How exactly* this happens, I have no explanation now

Those queued events that are not of type DelayedDoModelChange but of Runnable will start the next series of updates. by deferring the Runnables there are less updates.
Comment 21 Andrey Loskutov CLA 2021-10-12 14:19:09 EDT
(In reply to Jörg Kubitz from comment #20)
> (In reply to Andrey Loskutov from comment #19)
> 
> > *How exactly* this happens, I have no explanation now
> 
> Those queued events that are not of type DelayedDoModelChange but of
> Runnable will start the next series of updates. by deferring the Runnables
> there are less updates.

Are you sure? We don't skip those, they are still executed, so where the "less updates" is from? 

I've just noticed, with the merged patch and snippet from comment 17 we don't get *almost immediate* freeze on Linux anymore, but get the freeze after the full snippet is executed, long after it prints "Done". So we don't actually reduce work, we just make it less annoying for a short term "spikes" (which is good).

Ideally, "Thread X created / Thread X updated / Thread X destroyed" events, if landed in the same queue, could be automatically removed from the Debug view update tasks, avoiding the overhead for short living threads completely.
Comment 22 Jörg Kubitz CLA 2021-10-12 14:33:08 EDT
(In reply to Andrey Loskutov from comment #21)
> Are you sure? We don't skip those, they are still executed, so where the
> "less updates" is from? 
but we execute them after the other UI tasks had a chance to run. so we run them so much later that it's not a problem anymore. if you set a breakpoint there you will see the queue will not fill up anymore. Only after executing them a new backgroundthread polls in the new state. If you stay on that debug point till the debugee is finished i don't even get the events to create all tree/thread items.


> I've just noticed, with the merged patch and snippet from comment 17 we
> don't get *almost immediate* freeze on Linux anymore, but get the freeze
> after the full snippet is executed, long after it prints "Done". So we don't
The threads terminate all at once. "drawing" that takes its time. (it's a real "animation" on my computer)

> actually reduce work, we just make it less annoying for a short term
> "spikes" (which is good).

> Ideally, "Thread X created / Thread X updated / Thread X destroyed" events,
> if landed in the same queue, could be automatically removed from the Debug
> view update tasks, avoiding the overhead for short living threads completely.

yea that would good but would not help with my 9000 idle threads. they are only created. The Tree takes ridiculous time to "draw" them. Therefore the jface time should just be reduced that helps much more then combining events. It's easily possible to quickly draw a tree with 9000 items. just think of the package explorer as a good example - with much more items. Just has to be done the right way - without updating each item one by one.
Comment 23 Jörg Kubitz CLA 2021-11-10 13:53:21 EST
The total freeze was fixed. Any improvement to the tree performance may be targeted with another bug or version - if any.
Comment 24 Lars Vogel CLA 2021-11-11 01:37:27 EST
(In reply to Jörg Kubitz from comment #23)
> The total freeze was fixed. Any improvement to the tree performance may be
> targeted with another bug or version - if any.

Thanks, Jörg.