Community
Participate
Working Groups
Build Identifier: We are using Jetty 7/8 with WebSocket in production, using the Atmosphere Framework. I do see a lot of CLOSE_WAIT socket when I load balance Jetty using Amazon ELB: CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:55149 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:24941 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:24429 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:24682 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:55403 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:24939 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:24427 CLOSE_WAIT tcp 1 0 10.168.175.224:8000 10.160.41.231:24937 CLOSE_WAIT I do see those CLOSE_WAIT count increasing for an approximate load of 400 requests/seconds. I did try 8.1.0.RC1 and latest SNAPSHOT and the issue is still there. Not all the CLOSE_WAIT are reclaimed by the OS. If I don't front Jetty with ELB, The number of CLOSE_WAIT is close to 5 times lower, but I still see persietent one. I will update this issue with network/traffic information between ELB and Jetty soon. Reproducible: Always
I've added some unit tests to jetty-websocket/src/test/java/org/eclipse/jetty/websocket/WebSocketMessageRFC6455Test.java to try to reproduce, but no joy. Neither testTCPClose nor testTCPHalfClose leaves a connection in CLOSE_WAIT. Could you try to reproduce in a test harness? or capture a TCP/IP trace?
OK I will try, it need to run behind ELB. Now as you expected, I've set the websocket/continuation timeout lower than the ELB timeout and almost all the CLOSE_WAIT are gone. So it's clearly ELB that cause that. More information coming today or next year :-)
Joakim, maybe you can take over looking at this issue, given your focus on websocket? thanks Jan
Considering the websocket side of this fixed. Moving to documentation component for addition to documentation about this situation. Leaving open and assigned to me.
We also experience the same issue since Jetty 6.1.26 We patched this problem by adding a call to _socket.close() in SocketEndpoint.java in shutdownoutput. The only reason we figured out what the problem was is because we kept getting exceptions method not allowed for SSL Sockets. We performed the following diff to note the following changes: http://grepcode.com/file_/repo1.maven.org/maven2/org.mortbay.jetty/jetty/6.1.26/org/mortbay/io/bio/SocketEndPoint.java/?v=diff&id2=6.1.25 This issue does caused leaked file descriptors which eventually crashes applications or degrades the performance significantly. Originally, we thought the issue was related to HTTP 1.0's lack of connection keep alive when sending requests to the SSL listener. We were able to reproduce on jetty 6.1.26 using the following python script (but we cannot reproduce the same behavior on 7). We do know that it happens when using Load Balancers, proxies, or other intermediate devices. import urllib import time while (1): f = urllib.urlopen("http://<host>:port/resource") print f.read() #time.sleep(1) What we do know is that this issue does affect NIO as well as BIO and that we might patch it by adding a call to close the socket in nio/channelendpoint.java "shutdownoutput" method.
I found the following comment in the documentation here: http://www.eclipse.org/jetty/documentation/current/configuring-connectors.html soLingerTime A value >=0 set the socket SO_LINGER value in milliseconds. Jetty attempts to gently close all TCP/IP connections with proper half close semantics, so a linger timeout should not be required and thus the default is -1. with a link to the following discussion: http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required I believe however that load balancers / proxies are not allowing for clean connection closes, resulting in hundreds of connections in TIME_WAIT and for the jetty process to eventually hit the process file descriptor limit. Are there any other prescribed workarounds for this other than setting soLingerTime?
Joakim, Assigning back to you to do the doco. Jan
Closing as documentation is updated about timeouts.