Bug 332728 - [Net] UnixProxyProvider execs a process to discover whether env contains 'proxy' on each socket connect
Summary: [Net] UnixProxyProvider execs a process to discover whether env contains 'pro...
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Team (show other bugs)
Version: 3.6.1   Edit
Hardware: PC Linux-GTK
: P3 normal (vote)
Target Milestone: 3.7 M5   Edit
Assignee: James Blackburn CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
: 313597 326029 333962 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-12-16 07:53 EST by James Blackburn CLA
Modified: 2011-01-25 08:14 EST (History)
5 users (show)

See Also:
Szymon.Brandys: review+


Attachments
potential patch 1 (2.42 KB, patch)
2010-12-16 08:28 EST, James Blackburn CLA
no flags Details | Diff
patch 2 (2.48 KB, patch)
2010-12-16 13:57 EST, James Blackburn CLA
no flags Details | Diff
patch2 (2.80 KB, patch)
2011-01-11 07:07 EST, James Blackburn CLA
no flags Details | Diff
patch 4 (2.64 KB, patch)
2011-01-11 08:11 EST, James Blackburn CLA
jamesblackburn+eclipse: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Blackburn CLA 2010-12-16 07:53:35 EST
I was trying to work out why many CCRC operations are so slow, and this looks like one major culprit:

FileInputStream.readBytes(byte[], int, int) line: not available [native method]	
FileInputStream.read(byte[], int, int) line: 199	
BufferedInputStream.read1(byte[], int, int) line: 256	
BufferedInputStream.read(byte[], int, int) line: 317	
BufferedInputStream(FilterInputStream).read(byte[]) line: 90	
Properties$LineReader.readLine() line: 418	
Properties.load0(Properties$LineReader) line: 337	
Properties.load(InputStream) line: 325	
ProxyProvider(UnixProxyProvider).getEnv(String) line: 196	
ProxyProvider(UnixProxyProvider).getSystemProxyInfo(String) line: 135	
ProxyProvider(UnixProxyProvider).getProxyData(URI) line: 58	
ProxyProvider(AbstractProxyProvider).select(URI) line: 43	
ProxyManager.getProxyDataForHost(String, String) line: 366	
EclipseProxySelector.select(URI) line: 46	
SocksSocketImpl.connect(SocketAddress, int) line: 358	
Socket.connect(SocketAddress, int) line: 529	

For each Socket.connect, UnixProxyProvider#getEnv exec's 
"/bin/sh", "-c", "env | grep -i proxy"
This is pretty bad as forking a process and waiting for it to finish takes many 100s of milliseconds.

It seems for many operations CCRC does java.net.Socket connect for each resource.
Comment 1 James Blackburn CLA 2010-12-16 08:28:10 EST
Created attachment 185312 [details]
potential patch 1

I added a try finally timing around getSystemProxyInfo:

getSystemProxyInfo(socks) took 171
getSystemProxyInfo(socks) took 131
getSystemProxyInfo(socks) took 161
getSystemProxyInfo(socks) took 154
getSystemProxyInfo(socks) took 130
getSystemProxyInfo(socks) took 204
getSystemProxyInfo(socks) took 165
getSystemProxyInfo(socks) took 164
getSystemProxyInfo(socks) took 145
getSystemProxyInfo(socks) took 285
getSystemProxyInfo(socks) took 209

Which makes this API call very expensive.  

System.getEnv was replaced as part of bug 257172. 
We should likely only fall-back to execing when the System.getEnv is known not to work...

If we fall-back to running env in the Exception handler when System.getenv fails, we can remain performant while still supporting non-compliant JVMs.  

The attached patch reduces time to <1ms.
Comment 2 James Blackburn CLA 2010-12-16 08:29:05 EST
Pawel you made the original change in bug 257172. Can you review this?
Comment 3 James Blackburn CLA 2010-12-16 09:00:18 EST
Note that #getEnv went from being deprecated in 1.4.2 to not deprecated in 1.5.0+ (due to developer complaints).
Comment 4 James Blackburn CLA 2010-12-16 13:57:52 EST
Created attachment 185357 [details]
patch 2

Looks like they used to throw Error, so catch all Throwable.

Also the synchronized on #getEnv looks wrong. There's no state accessed concurrently.  Likely the I/O errors received were due to not correctly destroying the process and closing the file descriptors.
Comment 5 James Blackburn CLA 2010-12-20 07:43:48 EST
Szymon this is a simple patch, can you review?

As for the impact of this performance problem, CCRC spent 50 mins. synchronizing my view.  Approximately 9mins of this was spent in ProxyProvider(UnixProxyProvider).getSystemProxyInfo (3500 calls * 150ms).
Comment 6 Szymon Brandys CLA 2011-01-11 05:51:35 EST
The method System.getenv(String) is not defined in the bundle's required execution environment: CDC-1.1/Foundation-1.1. We should try to call it using reflection and if it fails, fall back on the old code.
Comment 7 James Blackburn CLA 2011-01-11 07:07:04 EST
Created attachment 186477 [details]
patch2

Updated the patch to use reflection. Installed the "PDE/API Tools Environment Descriptions" to provide the baselines which caused the API tooling to flag the error.
Interestingly before bug 257172 System.getenv was called directly without reflection.
Comment 8 James Blackburn CLA 2011-01-11 07:23:26 EST
*** Bug 333962 has been marked as a duplicate of this bug. ***
Comment 9 James Blackburn CLA 2011-01-11 07:37:01 EST
In Robert's bug 333962 waitFor is hanging indefinitely. This usually happens if one of the pipe buffers is full.  Perhaps something has gone wrong and the input stream hasn't been emptied, or the error stream has become full. 

This causes the thread to block forever...

Robert, what does:
  /bin/sh -c env | grep -i proxy
do on your system? And what goes to stdout and to stderr?
Comment 10 Robert Munteanu CLA 2011-01-11 07:40:37 EST
James, I get absolutely no output for that command.

For reference

robert@raptor ~$ /bin/sh -c env  | wc -c
3884
Comment 11 James Blackburn CLA 2011-01-11 08:03:32 EST
(In reply to comment #10)
> James, I get absolutely no output for that command.
> 
> For reference
> 
> robert@raptor ~$ /bin/sh -c env  | wc -c
> 3884

It's difficult to know what's going wrong without being able to reproduce. 
 
In the fallback catch (Throwable) case, I think the best solution is to remove the Process#waitFor() call.  It seems to add no value as the input stream has already been processed in-line by the time waitFor is called.  All it will do is increase the time needed for the #getEnv call, and potentially cause it to block forever.
Comment 12 Robert Munteanu CLA 2011-01-11 08:06:53 EST
(In reply to comment #11)
> (In reply to comment #10)
> > James, I get absolutely no output for that command.
> >
> > For reference
> >
> > robert@raptor ~$ /bin/sh -c env  | wc -c
> > 3884
> 
> It's difficult to know what's going wrong without being able to reproduce.
> 
> In the fallback catch (Throwable) case, I think the best solution is to remove
> the Process#waitFor() call.  It seems to add no value as the input stream has
> already been processed in-line by the time waitFor is called.  All it will do is
> increase the time needed for the #getEnv call, and potentially cause it to block
> forever.

If I do find a wait to reproduce the issue I will let you know. But it's not even on a daily basis.
Comment 13 James Blackburn CLA 2011-01-11 08:11:31 EST
Created attachment 186485 [details]
patch 4

Looking at the CVS history, this code has been changed again and again incorrectly to correct the issue.

waitFor() was added in 1.19 to fix bug 258941.  The reason the fix 'worked' was because getEnv was made synchronized serializing accesses and minimizing the number of in-progress Process execs.  The waitFor between each maximised the chance that the garbage collector cleans up the file-descriports.

The real fix for hitting the fd limits was by adding proc.destroy() in 1.21.  As such I don't believe the waitFor, nor the synchronized are required.
Comment 14 Szymon Brandys CLA 2011-01-11 10:26:19 EST
To sum up:

It seems that the initial problem was caused by not cleaning up processes and OS resources. Adding synchronization and waitFor just added extra time to gc resources what somehow worked on IBM vm, but not on Sun/Oracle one. There are some changes in UnixProcess implementation in IBM which clean some resources on process exit. I can't see it in Sun vm impl. Anyway, just adding proc.destroy as introduced by the fix for bug 293085 seems to be a good fix.

I committed it to HEAD.

BTW interesting investigation.
Comment 15 Szymon Brandys CLA 2011-01-11 10:26:39 EST
Fixed.
Comment 16 James Blackburn CLA 2011-01-11 12:22:56 EST
*** Bug 313597 has been marked as a duplicate of this bug. ***
Comment 17 James Blackburn CLA 2011-01-11 12:23:39 EST
*** Bug 326029 has been marked as a duplicate of this bug. ***
Comment 18 James Blackburn CLA 2011-01-25 08:14:43 EST
Verified in I20110124-1800