Community
Participate
Working Groups
Attaching log file. This only occures with RHEL 3.1 for DBCS users. Opened by ivorychang / IISI/ TradChinese / 2005-07-28 10:47:32 OS : RHEL 3.1 MUST FIX: Yes Severity 2 Build date: 20050727 Blocking: Yes Language: CHT Bitmap Location: 516_24001110.gif Tester Name: Ivory Chang Problem Description: Unable to launch Eclipse Application Re-create Procedure: 1. Select "Run As/Eclipse Application" from the "Run" toolbar drop-down or menu. ==> Unable to launch Eclipse Application Best Regards, Ivory Chang TVT, Taiwan
Created attachment 25862 [details] log file
This has been labeled as a must fix defect for sign off.
NOTE: No more handles [gtk_init_check() failed]
I could not reproduce this problem with LANG set to zh_TW.Big5 on Red Hat Enterprise Linux ES release 3 (Taroon Update 5). I also tried giving a DBCS name to the run configuration, as was done according to the log file. 1. Does this only occur when you use a DBCS name for the run configuration? 2. Does this only occur when the host eclipse is in a DBCS locale? 3. What version of GTK+ is installed (rpm -q gtk2)? 4. Are you running over remote X, or are you using Eclipse locally on the RHEL machine? Is the DISPLAY environment variable set correctly?
Removing TVT tag as this is not blocking translation. Adding TCT tag to defect as per Cam's request: TCT 516
I answer(In reply to comment #4) > I could not reproduce this problem with LANG set to zh_TW.Big5 on Red Hat > Enterprise Linux ES release 3 (Taroon Update 5). I also tried giving a DBCS > name to the run configuration, as was done according to the log file. > 1. Does this only occur when you use a DBCS name for the run configuration? > 2. Does this only occur when the host eclipse is in a DBCS locale? > 3. What version of GTK+ is installed (rpm -q gtk2)? > 4. Are you running over remote X, or are you using Eclipse locally on the RHEL > machine? Is the DISPLAY environment variable set correctly? I answered all the questions on 8/15, but somehow they were lost. I will answer them again. 1. Same problem for English configuration name 2. Same problem for English locale 3. rpm -q gtk2=gtk2-2.4-15 4. running locally, DISPLAY=:0.0
There aren't many ways gtk_init_check() can fail. It's pretty much only if XOpenDisplay() fails, so something wrong with the DISPLAY authentication or server resources. Can you try creating a run configuration in external tools that executes "xterm"? Is there something wrong with the environment variables set by Eclipse when launching applications? Can you try not running as root?
I just tried as a normal user, not root, same problem. > Can you try creating a run configuration in external tools that executes "xterm"? Is there something wrong with the environment variables set by Eclipse when launching applications? If you could tell me the steps, I'd try it.
1. Run > External Tools > External Tools... 2. Select Program and click on New 3. In the location field, enter /usr/bin/xterm 4. Click Run Try the same for /usr/bin/gedit. While it's not the same as launching a runtime workspace, this tests the theory that maybe something in the environment is preventing applications to launch. Do you have additional plugins installed? If so, can you try with just the Eclipse SDK?
Running external tools xterm and gedit worked fine. Running just the Eclipse SDK also worked fine. That means one of the plugins in IES features is causing this. We just received the "org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]" message and nothing much else. Can you give us more advice and help us narrow that down so we can forward this bug to the right component? Thanks.
You could try this: 1) From Menu choose Help -> Software Updates -> Manage Configuration 2) Expand the IES items, select one at a time and from the right mouse menu, select Disable. Disable one feature at a time until you can identify the feature causing the error.
I was able to isolate the offending feature - it's WTP. All other combinations of features that we use do not cause a no more handles error. Should we reassign this bug to WTP and allow them to investigate?
I did more testings. The problem is caused by WTP in the target. We see this gtk_init_check error when the target is set to the Eclipse installation we are running, or a second copy of Eclipse installation with WTP. Run As Eclipse Application works okay when the target is a plain Eclipse with no WTP. Billy please switch component to WTP. Jeff, please take a look.
I'm using a RHEL 3 machine. Select "Run As/Eclipse Application" from the "Run" toolbar drop-down or menu is working for me. My target platform includes all the WTP plug-ins/features. Is there any setup instructions that I missed?
Just want to update everyone, Jeff was able to connect to our machine and saw the problem. He is investigating to see if this is a WTP problem.
Hi Billy, one thing we didn't mention, we saw the following messages in the console right before we see the "org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]" error. Does it give you any clue? _X11TransSocketOpen: socket() failed for local _X11TransSocketOpenCOTSClient: Unable to open socket for local _X11TransOpen: transport open failed for local/nls144.rtp.raleigh.ibm.com:0 Jeff, I also tried this on another RHEL V3 machine outside the test lab. Same problem. And, I tried changing the file handle bufer size to 8192. That didn't help.
What does "ulimit -n" return? Is this what you tried changing to 8192? Xlib is trying to open a UNIX socket and it is failing. Unfortunately, the Xlib error message does not indicate what the errno value actually was. To find out, you will need to run the application under "strace". Steps: 1. Set a breakpoint before the call to Display 2. Run under a debugger 3. In another terminal, use "ps" to determine the PID of the launched java process 4. Run "strace -p <PID> &> strace.log". 5. Resume execution in the debugger. 6. Grep for calls to socket in the strace.log file, find the correct one and see what the return value was. I did a google search for the error message and found an interesting result. Try googling for "JAVA_HIGH_ZIPFDS". With this query, I found a copy of a set of release notes for some IBM product that had this text: "Note: For Linux, you must create an environment variable called JAVA_HIGH_ZIPFDS and set its value to a high number, such as 500, to tell the IBM JRE to create that many file descriptors in a separate area of memory for later use. The rich client executable launcher sets this value as part of its execution process, but when you run the client within Eclipse or Rational Application Developer, you must set this value manually or you may encounter the following errors: _X11TransSocketOpen: socket() failed for local ..." However, I have been unable to find further documentation on this environment variable, what it actually does, or what versions and settings of the Linux kernel it applies to.
Hi Billy, yes, I was using "ulimit -n" to set the file handle bufer size to 8192, but that didn't help. Setting JAVA_HIGH_ZIPFDS environment variable to 500 before launching Eclipse fixed the problem!!! Now, who is responsible for adding a readme for that? Thanks to everyone for the time investigating this problem!!!
I have been unable to find any documentation on the JAVA_HIGH_ZIPFDS environment variable. I would really like to know what it actually does.
On Linux I believe it tells the IBM JRE to create that many file descriptors in a separate area of memory for later use. I think it was added as a work around to problems like this that many were having. I am not sure of the support details around its use.
So it could be a kernel bug, a VM bug, or just that WTP/eclipse/whatever has too many files open at the time the display is created (but if that's true, why did ulimit not help?). Can you put a breakpoint at the point where the Display is created and check /proc/<pid_of_java>/fd/ to see how many files are open at that point? Knowing the output from the strace might also be useful. Given that you're happy with the workaround, and that this seems rather obscure, it's tempting to stop investigating and move on. Your call, I think. I am just worried that this is a symptom of something more fundamental.
*** Bug 109716 has been marked as a duplicate of this bug. ***
Hi Billy, Now that I think about it, this problem should happen to all locales (not just DBCS) as long as we have a big enough set of plugins and/or NL fragments installed in Eclipse, and running on RHEL 3.1. This scenario should be more common than we thought. I think it deserves a paragraph in the "3.1.1 Release Notes" mentioning the system setup (running IBM JRE on RHEL 3.1 and probably other distributions of Linux with the same level of Linux kernel), the symptoms ("_X11TransSocketOpen: socket() failed for local" and "gtk_init_check"), and the workaround (create an environment variable called JAVA_HIGH_ZIPFDS and set its value to a high number, such as 500, to tell the IBM JRE to create that many file descriptors in a separate area of memory for later use).
I have requested that an entry be added to the readme. I still want to figure out what's behind this problem though. :)
FYI, I just moved my Eclipse 3.2 M5 to IBM J2RE 1.5.0 J9 on RedHat 3.0 and have this issue. org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed] export JAVA_HIGH_ZIPFDS=500 solves it for me.
*** Bug 133593 has been marked as a duplicate of this bug. ***
This is exactly my problem. Setting JAVA_HIGH_ZIPFDS=500 worked around the problem.
(In reply to comment #17) > > However, I have been unable to find further documentation on this environment > variable, what it actually does, or what versions and settings of the Linux > kernel it applies to. Billy, et. al., I did find a bit more detail at http://www-128.ibm.com/developerworks/java/jdk/linux/50/sdkandruntimeguide.lnx.en.html oddly ... it makes it sound like there's an absolute limit of 1024 jar files?! <quote> The X Windows System is unable to use file descriptors above |255. Because the JVM holds file descriptors for open jar files, X can run |out of file descriptors. As a workaround, you can set the JAVA_HIGH_ZIPFDS environment variable to tell the JVM to use higher file |descriptors for jar files. | To use the JAVA_HIGH_ZIPFDS environment variable, set it to a value between 0 and 512. The JVM will |then open the first jar files using file descriptors up to 1024. For example, |if your program is likely to load 300 jar files: export JAVA_HIGH_ZIPFDS=300 | The first 300 jar files will then be loaded using the file descriptors |724 to 1023. Any jar files opened after that will be opened in the normal |range. </quote>
Thanks David, this explains what's going on. There are limits for standard calls such as select which take an fd_set tructure. I could see this depending also on how the X server was compiled. A good solution would be to ensure that the SWT plugin is loaded early to ensure that the X connection is opened on a low file descriptor. I don't know enough about the Eclipse plugin system to know if this is possible. I'm moving this to Platform > Runtime to see what their thoughts are. I don't think there's anything SWT can do to avoid this problem.
*** Bug 138271 has been marked as a duplicate of this bug. ***
I had the same error occur during restart after installing all components of Eclipse Callisto 3.2 RC1.
Something needs to be done to force the UI plugins, or whatever creates the SWT Display, to occur as early as possible before too many plugins are loaded.
some more context here. - the runtime is not really in a position to know much anything about specific bundles it is running. Someone gives us bundles, we run them. The markup and the control flow in the bundles completely drives the behaviour - The UI bundles should be getting tickled very early in the process as they are running the application. The very first thing we do after starting the framework is call the application (the UI budnle in the common case). Pretty much the first thing it does is create a Display. - In first start scenarios all of the bundles are being isntalled so we have to open each bundle JAR and look at the manifest. Currently we do not close JARs. Note that this is on first run of a configuration. - Equinox does have a parameter (osgi.bundlefile.limit) you can set to bound the number of JARs that are open at one time. See bug 138182 for more details. However, there are some potential issues with VMs systems that take advantage of the fact that JARs are left open (e.g., IBM 1.5 with Shared Classes). - Adding Tom cause he knows lots of stuff
Some thoughts and questions: - Is this an issue only with initialization? What happens when you run "eclipse -clean -initialize"? Does it fail? If it succeeds can you launch eclipse normally after you initialized it? John A, can you try this in your installation from comment 31? - Please try setting the osgi.bundlefile.limit option in you you config.ini. Try setting it to something between 50-100.
The failure does not occur with -clean -initialize (presumably because the UI is never called in this case). However, I get the failure every time I start with -clean without -initialize. Another strange side-effect is that it writes the log file in eclipse/workspace/.log, even though I did not have a workspace in that directory... it makes it difficult to figure out what went wrong. It starts fine with -Dosgi.bundlefile.limit=50 or -Dosgi.bundlefile.limit=100
I don't know if it's related, but I started getting GPFs with the osgi.bundlefile.limit option specified. I had three GPF in a row shortly after startup, and the error log doesn't give much details (problem in "System thread", but no stack trace). This was using an IBM 1.4.2 VM. I am now running with a Sun VM and osgi.bundlefile.limit and it is ok so far.
I'm nominating this as a greatbug. Even though it was reported awhile ago, it only becomes critical in the context of a much larger install such as Callisto where many bundles are present. Now all we need is a "greatfix" ;)
Not sure we can do much for 3.2 other than have them set the osgi.bundlefile.limit value. - The GPFs seem like a VM bug, right? - Moving forward is setting the bundlefile limit acceptable? Jeff mentions issues with this and Shared Classes VMs, but this is only an issue if you are trying to support replacing jar files while the VM is running. That is not something Equinox/OSGi supports anyway. We have investgated other solutions but they would likely be complicated, error prone, unstable and impact performance. (see bug 138182).
> The GPFs seem like a VM bug, right? It may be, but I get the GPF consistently when the osgi.bundlefile.limit property is set, and it never occurs when the property is not set. It would be interesting and useful for other people experiencing this problem to try it out with this argument: eclipse -vmargs -Dosgi.bundlefile.limit=100 If this is the best fix, there is not a lot of time to test it. Having others try it out with different machines/VMs would be good. Is there any chance that this would affect startup performance? Just throwing this out there, but another way to avoid this is restarting the VM after initialization. I.e., on first start initialize the bundle state and then exit and restart. On the next session very few jars are read and the problem doesn't occur. I can consistently start without failure by doing: ./eclipse -clean -initialize ./eclipse It's a bit of a hack, but it's at least a workaround.
Is there anything on tap here for 3.2?
At minimal there should be a readme. At the moment, setting JAVA_HIGH_ZIPFDS seems to be safest workaround.
Reducing priority since this is really an XWindows problem. We have not yet found a suitable "fix" but the above workarounds will be documented in the README.
John pointed out this has been added to the SWT section of the readme. Removing the target milestone as we are going to keep this bug open in case we can implement a work-around.
I'm seeing a similar problem both with the Sun and JRockit JVM's (1.5) when running the Eclipse automated tests on RHEL3 and RHEL4 machines. The Eclipse IDE starts up just fine but when running the 'jdttext' tests from the command line I immediately encounter the "No more handles [gtk_init_check() failed]" exception. We've tried the JAVA_HIGH_ZIPFDS but that doesn't help (presumably because it's IBM specific). An increased ulimit doesn't help (no surprise if the real problem is that X cannot address file handles below 255) and changing the osgi.bundlefile.limit has no positive effect either. Things that might or might not have an effect: 1. We're using a vnc server. All other X apps works just fine (including the Eclipse IDE). 2. Both machines that we have tested on are SMP machines. Right now this is a real showstopper for us. We're trying to get JRockit certified for Eclipse on all platforms and this prevents us from testing on essential Linux platforms. We get the exact same errors using the Sun JVM. Any advice on how to proceed with this would be very valuable to us.
Thomas, given that the work around seems to be coming from the VM, I would recommend you to talk to the jrockit team.
Yes, the thought occurred to me :-). But the Sun JVM is certified on Linux platforms so evidentely there must be a way to run those tests on Linux. Still I'm unable to, even when using the Sun JVM. Also, the VM I'm trying to get certified is already released so I will not be able to get any patches in. I tested using a normal X-windows setup b.t.w. (i.e., no VNC) but there's no difference.
What kind of test are you running to get Jrockit to qualify as a regular VM? Are you only testing the SDK or callisto?
You must be somehow running the tests differently. We run the automated tests here on the Sun VM and don't encounter this problem. The set of plugins that is installed when running the tests will be important - this bug never happens to my knowledge when only the Eclipse SDK is installed (you need more plugins before it runs above the magic 256 handle level).
I run the tests in the Eclipse Automated Tests framework (testing SDK, not Callisto). They are installed per the instruction in the readme.html file included in the tests distro. I'm using Eclipse 3.2 (the release) and it's associated tests. I'm running the same tests on Windows without problems and all tests but the SWT related tests seem to work just fine on Linux. The set of plugins installed is controlled by the automated test framework since it starts from scratch and creates a completely new eclipse installation between each test that it runs. I simply run the tests using this command line: sh runtests -os linux -ws gtk -arch x86 all -properties sun.properties The sun.properties file defines two properties: J2SE-1.4 J2SE-5.0 They are set to appoint the two Sun jre installations that I have installed. The one appointed by J2SE-5.0 is also the one appointed by my JAVA_HOME and the only one included in my PATH.
Kim, Sonia, would have any recommendations on running the tests? Are we running the tests on Sun VM? Thomas seems to have problem with getting the tests to complete on linux.
The problem described in this bug regarding the file decriptor issues is a really strange and rare issue with Xlib idenified by these messages: _X11TransSocketOpen: socket() failed for local _X11TransSocketOpenCOTSClient: Unable to open socket for local Usually, if gtk_init_check() is failing, it means the DISPLAY environment variable is not pointing at a running X server. I would first advise you to make absolutely sure that when you're running the tests on the command line that the environment variables are being passed down correctly to the command that actually runs the tests. A good way to do this is to find exactly where "java" is executed and substitute an X application like "xterm", then make sure an xterm correctly appears on your screen.
I changed the runtests.sh script and xterm did indeed not start. Turns out that the script contains the following lines: #set the DISPLAY for running tests on Linux DISPLAY=`$HOST`:0.0;export DISPLAY which in my case is totally wrong. I don't have the HOST variable set and even if I did it would fail since my dislpay number is 2.0, not 0.0. Hard problems often have simple solutions (embarrasing really). Thanks Billy! Sonia, this must be considered a bug in the test framework. Either the documentation needs a very visible notation of this or, even better, these two lines should be conditional (skipped if DISPLAY is set) or perhaps removed altogether. Will you file that report or should I? If so, where should I file it?
Thomas, please file in a bug report against platform / releng.
I filed the bug #150497.
With no target milestone set and given the age of this problem, is it reasonable to assume that it will not be getting fixed, and the workaround of using osgi.bundlefile.limit=100 should be used by teams hitting this issue? I'd like to know if this is the recommended course of action.
There is a bug in 3.2 that causes failures when using the osgi.bundlefile.limit. This has been fixed in 3.2.1 in bug 153699.
Thanks for the info - so given that I will be using 3.2.1, is this the recommended parameter to use, or will this bug be fixed and I should avoid using the bundlefile.limit? For now I'm using it but I want to know if I should plan on completing my product with this in place.
the same problem with sun jdk 1.6, got starting eclipse : _X11TransSocketOpen: socket() failed for local _X11TransSocketOpenCOTSClient: Unable to open socket for local _X11TransOpen: transport open failed for local/localhost:0
Apparently *removing* -Dosgi.bundlefile.limit=100 can help prevent the Europa crash bug 194943 ... bug 153699 doesn't talk about the same symptoms, but could bug 194943 be caused by multi-thread access to zip files? PW
The error _X11TransSocketOpen: socket() failed for local _X11TransSocketOpenCOTSClient: Unable to open socket for local _X11TransOpen: transport open failed for local/XXXXXX:0 occurs with Java 1.6 but NOT with Java 1.5 (both from Sun) on my system! I have a lot of plugins installed (last time I checked it were something around 90 different "plugins as packages", leading something around 1500 "plugins as folders"). None of the workarounds helped, only starting Eclipse (3.2.2) with Java 1.5, instead of Java 1.6. I've written down all details of this superbug at http://www.karakas-online.de/forum/viewtopic.php?t=9776 so you may want to check the link if you have any questions.
I also had the problem _X11TransSocketOpen: socket() failed for tcp _X11TransSocketOpenCOTSClient: Unable to open socket for tcp _X11TransOpen: transport open failed for tcp/localhost:0 with java 1.6 on eclipse 3.2.2 In my case the problem comes from plugin wst (web standard tools). If wst is not used by your application, the solution consist in unselect wst plugins from the launching plugins of your application. Two approach are possible from the subwindow plugins of the application configuration window: 1/ Deselect All and then Add Required Plugins 2/ Deselect only plugins: - org.eclipse.wst.command.* - org.eclipse.wst.common.* - org.eclipse.wst.dtd.* Else if wst plugins are required you can start Eclipse (3.2.2) with Java 1.5, instead of Java 1.6 as said above.
I believe this is no longer relevant now that we set osgi.bundlefile.limit to 100 by default.