Bug 421303 - Deadlock in IOConsoleOutputStream
Summary: Deadlock in IOConsoleOutputStream
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Debug (show other bugs)
Version: 4.3.1   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 4.15 M3   Edit
Assignee: Paul Pazderski CLA
QA Contact:
URL:
Whiteboard: 4.15 M2
Keywords: greatbug
Depends on:
Blocks: 553348
  Show dependency tree
 
Reported: 2013-11-08 02:51 EST by Stefan Stern CLA
Modified: 2022-04-14 12:30 EDT (History)
6 users (show)

See Also:


Attachments
plugin project containing code to cause the deadlock (4.34 KB, application/octet-stream)
2013-11-08 02:51 EST, Stefan Stern CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Stern CLA 2013-11-08 02:51:12 EST
Created attachment 237298 [details]
plugin project containing code to cause the deadlock

As suspected in https://bugs.eclipse.org/bugs/show_bug.cgi?id=382374#c2, there actually is potential for a deadlock in the IOConsoleOutputStream. I've prepared a sample to reproduce the problem. We hit the bug in our Eclipse 3.7.2-based application, however the bug is still open in 4.3.1.
In order to reproduce the bug, take the following steps:
1. Import the attached project
2. Set two breakpoints:
  - DeadlockCommandHandler line 28
  - DeadlockCommandHandler line 41
3. Start debugging the plugin as an Eclipse Application
4. Open the Console view and open a new "Deadlock Console"
5. Choose "Deadlock" > "Trigger Console Deadlock " from the menu bar
  -> Both breakpoints in DeadlockCommandHandler will hold one thread each. 
6. Step the worker thread, stopped at DeadlockCommandHandler line 28, into org.eclipse.ui.console.IOConsoleOutputStream.encodedWrite(String)
7. Step the main / SWT UI thread, stopped at DeadlockCommandHandler line 41, into org.eclipse.ui.console.IOConsoleOutputStream.encodedWrite(String)
8. Deadlock is reached.

The worker thread will halt at IOConsolePartitioner#streamAppended in line 471, waiting for the notification that the buffer was emptied by the UIJob. This will never happen, as the UI thread is blocked, waiting for the monitor of the MessageConsoleStream object in the synchronized method IOConsoleOutputStream#encodedWrite(String).
In 3.7.2 the deadlock happens a bit differently, as only the IOConsoleOutputStream#write(String) method was synchronized. It doesn't matter, though, the outcome is the same. The UI freezes as the UI thread is blocked.
I couldn't find a way to provoke the problem with a JUnit test reliably, as it is required to fill the buffer in IOConsolePartitioner over 160000 characters before the problem occurs and it then needs the UI thread to run into the locked MessageConsoleStream object. This happened in our application a few times and we luckily had a debug run, so we could freeze the VM and get the stack trace.
If you propose a fix for this bug, I can investigate and prepare a patch. I just don't want to spend the effort on it without discussing the new implementation beforehand.
Comment 1 Lars Vogel CLA 2019-11-27 07:28:40 EST
This bug hasn't had any activity in quite some time. Maybe the problem got
resolved, was a duplicate of something else, or became less pressing for some
reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it.
The information can be, for example, that the problem still occurs, that you
still want the feature, that more information is needed, or that the bug is
(for whatever reason) no longer relevant.

If the bug is still relevant, please remove the stalebug whiteboard tag.
Comment 2 Paul Pazderski CLA 2020-01-09 17:55:44 EST
Hey Stefan,

in case you still read this, it's quite a shame no one had responded back than. Race conditions can be ugly and your example and report is great to reproduce the problem!

You reported the deadlock for IOConsoleOutputStream where UI and non-UI thread write to same stream. It is even a bit more general. Non-UI thread must write large amounts of content to console and while doing this hold lock(s) which UI thread may require.

Fortunate this make the deadlock rather hard to hit. ProcessConsole should be safe because afaik the writing threads share no locks with UI.

The fix I propose is rather simple. I wish to avoid changing the synchronization stuff in IOConsole and surroundings to much. I made the experience that it break rather easy.
The wait() which is responsible for the deadlock is not much more as a throttle for too fast output appending. My solution is to limit this block to 1s. If UI is not blocked it has a whole second to start content processing. And for the rare case which is currently a deadlock UI is only blocked for a second.
Comment 3 Eclipse Genie CLA 2020-01-09 17:56:49 EST
New Gerrit change created: https://git.eclipse.org/r/155583
Comment 5 Stefan Stern CLA 2020-01-15 03:07:18 EST
Hey,

thanks for fixing this! :) 

Since it's been a while, I'm working in another field now. But others still work on the RCP-based product. I'll lset the colleagues know, that this issue will be fixed in the next Eclipse Platform release! 

Thanks and all the best!
Comment 6 Sergey Kh CLA 2020-01-15 09:07:29 EST
Hi Guys,

I am following this ticket and am very interested in it.
I looked at the fix and I don't understand it a bit. Maybe I don't see the whole picture, so I'm sorry if I ask the wrong question.

What happens if we send a lot of messages using an AsyncLoggerConfig in the one Runnable in UI thread?

Each logged message is assigned its sequence number: 
com.lmax.disruptor.MultiProducerSequencer.next(int)

If we exceed the number of messages that can be saved in the RingBuffer, we will spin forever until we get the correct sequence number when someone released the Buffer.
However, this won't happen if we get stuck in the method:
org.eclipse.ui.internal.console.IOConsolePartitioner.streamAppended(IOConsoleOutputStream, String)
because, as I understood, the release occurs after exiting this method.
If the (fBuffer > 160000) condition is true, then the thread will sleep forever because we still hold the UI thread while trying to get the sequence number. (The UI Job will not be able to wake up us)

With changes in this ticket, the thread will sleep only 1 second and after that, release the Buffer. But what if we need to log messages in UI thread several times more than Buffer can hold? As I understood, in this case, we will spin in the com.lmax.disruptor.MultiProducerSequencer.next(int) for every extra message for more than 1 second. So if we want to send 10,000 messages in addition to this, the thread will hang for a few hours?

In any case, this is very close to the truth in my case and changing the parameter from 
pendingPartitions.wait(1000);
to 
pendingPartitions.wait(10);
significantly improve the situation.

Could anyone clarify this point?

Thanks & Regards,
Sergey
Comment 7 Paul Pazderski CLA 2020-01-15 09:35:02 EST
One thing first generally the following javadoc from IOConsole.newOutputStream and IOConsoleOutputStream applies to this topic:
> * Clients should avoid writing large amounts of output to this stream in the UI
> * thread. The console needs to process the output in the UI thread and if the client
> * hogs the UI thread writing output to the console, the console will not be able
> * to process the output.

So the best is to not write in UI and not block UI. The wait was introduced to prevent filling memory with pending console content. The shown console content can be limited in preferences, the pending content not. The max pending was hard limited using the old wait(). With a wait(10) we can get the situation that a thread is faster appending content than UI can process leading to a OOM in worst case.


I cannot fully judge your use case. It contains to much non eclipse stuff I don't know but I think you missed one important part in streamAppended: it blocks a *non* UI thread (the 'Display.getCurrent() == null' condition). If you append >160000 in UI thread it will immediately start to process pending content. 
So yes your scenario is possible if *another* thread is appending a lot of content in distinct writes and at the same time blocks UI thread. But I don't see an easy solution for this. The best is to not block UI thread while writing console. An option is to allow setting these buffer thresholds. To be honest they were introduced in a time with harder memory constrains.


Feel free to tell me if I misunderstood the question.
Comment 8 Sergey Kh CLA 2020-01-15 10:44:42 EST
Hi Paul,
Thank you for your detailed answer, I really appreciate it.

I understand that this may not be your area of responsibility since there are many things that are not directly related to the Eclipse, so just ignore me if this is the case.

However, I want to clarify a little my case to make sure that we understand each other.

1. UI Thread tries to log 10000 messages using AsyncLogger with ConsoleAppender

2. AsyncLogger tries to assign the sequence number for each message but prepares only 4086 LogEvents (default RingBuffer buffer size). This happens in the same thread in which we started log messages (the UI thread) so we got stuck on the message 4086+1. 

3. Log4jEventWrapperHandler (asynchronously, not in the UI thread) handles LogEvents and calls appenders. In our case, it's just ConsoleAppender which writes messages 1 by 1 to IOConsoleOutputStream.
*Usually*, IOConsolePartitioner.streamAppended(..) works immediately and we go back to ConsoleAppender and then, to Log4jEventWrapperHandler's thread. Log4jEventWrapperHandler considers that we just handled one LogEvent and releases one slot for the new message.

4. AsyncLogger prepares one more LogEvent (4086+1) and UI Thread finishes the logging (4086+1). After that UI thread tries to log the next message.

However, in case *3*, when we append >160000 in the streamAppended() method we get stuck for 1 second. So Log4jEventWrapperHandler will process the LogEvent for more than a second. And this will happen for every additional LogEvent after we have reached >160000. Since Log4jEventWrapperHandler takes a long time to process LogEvents, the AsyncLogger takes a long time to prepare new LogEvents. 

In simple words, the problem will occur when we write so many messages from the UI thread that we filled RingBuffer and reached >160000.
I understand that we shouldn't log so many messages from the UI thread, but unfortunately, I can not guarantee this in our RCP application. 
Do I understand correctly, that it's expectable behavior and in my case, I should change the RingBuffer size?

Thanks & Regards,
Sergey
Comment 9 Stefan Stern CLA 2020-01-15 11:43:47 EST
Hey,

since we hit the bug a few years back, I wanted to share our workaround. We couldn't make sure, that we wouldn't get too many log messages from the UI thread, either. 

So what we implemented a custom ConsoleAppender. It hat a queue on its own, not logging the messages directly, but stacking them in a Queue with a worker job to push the messages to the console. By using a normal job, we ensured that the UI thread was unblocked. Our queue didn't have a size limit, since our machines were equipped with a lot of memory, so there was no need for the 160000 characters limit. 

I agree with Paul. Touching the multithreading code is tough. If you can't ensure your logs events are triggered from outside the UI thread, maybe changing to a custom appender will help. It worked for us and solved the case for the last 6 years. :) 

Regards,
Stefan
Comment 10 Paul Pazderski CLA 2020-01-15 16:08:08 EST
Thanks Sergey! Your additional explanation helped a lot to understand your problem.

(In reply to Sergey Kh from comment #8)
> Do I understand correctly, that it's expectable behavior and in my case, I
> should change the RingBuffer size?

Well, it is not really described that this blocking can happen if content is not processed and I'm thinking about adding a better warning in newOutputStream's javadoc.
I would say, even if that is not the best answer for you, that it is acceptable behavior for Eclipse. So yes, if you can change the buffer size or introduce a potential unlimited buffer (thanks Stefan for sharing your solution!) it would be the fastest solution for your problem.

I still opened bug 559221 to keep track of this problematic scenario. But I cannot give any guarantee or time estimate for a solution.
Comment 11 Sergey Kh CLA 2020-01-16 03:39:03 EST
Hi Stefan and Paul,

Stefan, thank you very much for sharing your solution! 
I also had this idea and it's nice to know that someone even used this solution.

Paul, thank you for your time and help!
I'm fine with increasing buffer size or using some workaround now and I agree that this issue is not critical. 

Thanks and all the best!