Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jetty-dev] Jetty Client Problems

Hi, 

I have a customer who is using a camel route inside a fuse container that has a jetty:http producer.  We've suddenly started seeing problems where all of our requests are timing out due to lack of response. 

When messages are flowing smoothly, we see about 16 connections to the backend server using LSOF to query connection state. All of these are in an ESTABLISHED state.  

At some point, things go awry and we are not getting responses in a timely fashion (httpClient.timeout=15000). Once the timeouts start occurring, the number of connections spike in the hundreds - all in an ESTABLISHED state.  All of the requests are timing out as if we did not get a response. We ran a tcp trace and were able to see that the request  made it to the back end server and that the server responded in milliseconds, however, the jetty client didn't recognize the response. 

Restarting our bundle (inside service mix) or running JSTACK seems to free things up.  The latter was surprising to me but we believe that suspending the process momentarily reaps some of the connections. 
In the first thread dump there are some threads that look problematic: 

Thread t@20457: (state = BLOCKED)
- java.nio.channels.spi.AbstractSelector.cancel(java.nio.channels.SelectionKey) @bci=6, line=71 (Compiled frame)
- java.nio.channels.spi.AbstractSelectionKey.cancel() @bci=24, line=56 (Compiled frame)
- java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel() @bci=50, line=207 (Compiled frame)
- java.nio.channels.spi.AbstractInterruptibleChannel.close() @bci=23, line=97 (Compiled frame)
- org.eclipse.jetty.io.nio.ChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=199, line=258 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=2, line=325 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.fill() @bci=322, line=1035 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.parseNext() @bci=84, line=280 (Compiled frame)
- org.eclipse.jetty.http.HttpParser.parseAvailable() @bci=1, line=235 (Compiled frame)
- org.eclipse.jetty.client.AsyncHttpConnection.handle() @bci=400, line=133 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle() @bci=10, line=627 (Compiled frame)
- org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run() @bci=4, line=51 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=608 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=47, line=543 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)

We also see a fair few of these:

Thread t@20488: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=196 (Compiled frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(long) @bci=68, line=2025 (Compiled frame)
 - org.eclipse.jetty.util.BlockingArrayQueue.poll(long, java.util.concurrent.TimeUnit) @bci=53, line=342 (Compiled frame)
 - org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll() @bci=12, line=526 (Compiled frame)
 - org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(org.eclipse.jetty.util.thread.QueuedThreadPool) @bci=1, line=44 (Compiled frame)
 - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=275, line=572 (Compiled frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame)


Unfortunately, the only place this is repeatable is in my customers production environment.  We did see there are some interesting threads that are sort of in this area. 

https://bugs.eclipse.org/bugs/show_bug.cgi?id=387487
https://bugs.eclipse.org/bugs/show_bug.cgi?id=416477

http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02221.html
http://dev.eclipse.org/mhonarc/lists/jetty-users/msg02224.html

https://java.net/jira/browse/GRIZZLY-547
Removed the selectionkey.cancel() we did right before the socket.close().
The cancel allowed for the selector thread to trigger too early, while the 
other thread was still doing its socket.close(), hence becoming stuck at the 
synchronized (stateLock) in the SocketChannelImpl.

However, I have nothing definitive to tie to my customer issue. I was wondering if someone might be able to say yes, that's this bug or issue. 

One of my colleagues thought he may have seen an issue that JDK HTTP connection's code cannot recover from the timeout by reading the JDK HTTP connection's code.  

this issue suddenly started occurring after months of successful processing. The only changes we've been able to identify was an upgrade to the latest Solaris kernel (https://getupdates.oracle.com/readme/150401-03) which included an upgrade to Java 1.6  65. 

Any insight greatly appreciated. 

Susan 

Back to the top