494598 – Hudson builds failing with "insufficient memory for the Java Runtime Environment to continue"

Bug 494598 - Hudson builds failing with "insufficient memory for the Java Runtime Environment to continue"

Summary: Hudson builds failing with "insufficient memory for the Java Runtime Environm...

Status:	RESOLVED FIXED

Alias:	None

Product:	Community
Classification:	Eclipse Foundation
Component:	CI-Jenkins (show other bugs)
Version:	unspecified
Hardware:	PC Windows 7

Importance:	P1 normal (vote)
Target Milestone:	---
Assignee:	CI Admin Inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-05-25 19:21 EDT by Sam Davis
Modified:	2016-05-31 14:35 EDT (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sam Davis

2016-05-25 19:21:55 EDT

Recently a lot of builds are failing with errors like the one below. This is now so frequent that it happened several times in a row. I've tried restarting HIPP but it didn't help. See https://hudson.eclipse.org/mylyn/view/Gerrit/job/gerrit-mylyn-tasks/1340/console for an example

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 515899392 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /jobs/genie.mylyn/gerrit-mylyn-tasks/workspace/hs_err_pid1698.log
[ERROR] Failure: hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
Terminating xvnc.
FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
org.hudsonci.utils.tasks.OperationFailure: hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
	at org.hudsonci.utils.tasks.PerformOperation.execute(PerformOperation.java:64)
	at org.hudsonci.maven.plugin.builder.MavenBuilder.perform(MavenBuilder.java:169)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:34)
	at hudson.model.AbstractBuild$AbstractRunner.perform(AbstractBuild.java:646)
	at hudson.model.Build$RunnerImpl.build(Build.java:181)
	at hudson.model.Build$RunnerImpl.doRun(Build.java:136)
	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:434)
	at hudson.model.Run.run(Run.java:1390)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:40)
	at hudson.model.ResourceController.execute(ResourceController.java:82)
	at hudson.model.Executor.run(Executor.java:137)
Caused by: hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.Request.call(Request.java:142)
	at hudson.remoting.Channel.call(Channel.java:643)
	at org.hudsonci.maven.plugin.builder.internal.PerformBuild.doExecute(PerformBuild.java:198)
	at org.hudsonci.utils.tasks.PerformOperation.execute(PerformOperation.java:50)
	... 10 more
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.Request.abort(Request.java:262)
	at hudson.remoting.Channel.terminate(Channel.java:743)
	at hudson.slaves.Channels$1.terminate(Channels.java:71)
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1042)
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1023)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1017)

Comment 1 Eclipse Webmaster

2016-05-26 11:37:57 EDT

My guess is that the host for you HIPP is really busy(I'm seeing some OOM events in the logs).

About the only thing I can suggest is moving your HIPP to another host(we have a couple of spare slots).  If you'd like me to do that I'll need to shut your HIPP down for a few hours(depending on disk space usage) while I transfer the instance, so we'll need to pick a time.

-M.

Comment 2 Sam Davis

2016-05-26 13:45:50 EDT

Yes, please move it to another host as soon as you can. We can deal with HIPP being down for a few hours.

Comment 3 Eclipse Webmaster

2016-05-26 16:10:31 EDT

Ok I've finished the move.

-M.

Comment 4 Sam Davis

2016-05-26 16:31:16 EDT

Thanks. But I think there's a problem with XVNC. The console output has a bunch of errors like "** (java:17420): WARNING **: Could not open X display" and the Eclipse log says "org.eclipse.swt.SWTError: No more handles [gtk_init_check() failed]."

https://hudson.eclipse.org/mylyn/job/mylyn-commons-nightly/43/consoleFull

Comment 5 Sam Davis

2016-05-26 17:51:25 EDT

In another build, we have:

Warning: VNC extension does not support -reset, terminating instead. Use -noreset to prevent termination.
Unable to init server: Could not connect: Connection refused
An error has occurred. See the log file
/jobs/genie.mylyn/gerrit-mylyn-tasks/workspace/org.eclipse.mylyn.bugzilla.tests/target/work/data/.metadata/.log.

The log file contains the same message about  no handles.

Comment 6 Eclipse Webmaster

2016-05-27 11:02:45 EDT

Missing new Xvnc options.  Fixed.

-M.

Comment 7 Sam Davis

2016-05-27 14:44:53 EDT

Thanks, that seems to have worked. But now it seems that the builds cannot access /usr/bin/sign:

     [exec] + /usr/bin/sign /home/data/httpd/download-staging.priv/tools/mylyn/signing/mylyn/3.20.0-SNAPSHOT/site.zip nomail /home/data/httpd/download-staging.priv/tools/mylyn/signing/mylyn/3.20.0-SNAPSHOT/output
     [exec] /jobs/genie.mylyn/mylyn-3.20.x-release/workspace/org.eclipse.mylyn/org.eclipse.mylyn-site/pack-and-sign/sign-and-wait.sh: line 42: /usr/bin/sign: No such file or directory
     
https://hudson.eclipse.org/mylyn/job/mylyn-3.20.x-release/10/console

Comment 8 Sam Davis

2016-05-27 17:50:39 EDT

Also, it seems this host may have a buggy version of wget? [1] We keep getting this failure in https://hudson.eclipse.org/mylyn/job/mylyn-tasks-nightly/

+ wget -O org.eclipse.mylyn.commons.stamp https://hudson.eclipse.org/mylyn/job/mylyn-commons-nightly/lastSuccessfulBuild/artifact/org.eclipse.mylyn.commons.stamp
--2016-05-27 17:05:12--  https://hudson.eclipse.org/mylyn/job/mylyn-commons-nightly/lastSuccessfulBuild/artifact/org.eclipse.mylyn.commons.stamp
Resolving proxy.eclipse.org (proxy.eclipse.org)... 172.30.206.220
Connecting to proxy.eclipse.org (proxy.eclipse.org)|172.30.206.220|:9898... connected.
Proxy tunneling failed: Bad RequestUnable to establish SSL connection.

[1] https://www.reddit.com/r/sysadmin/comments/34a9sz/what_the_hell_wget/

Comment 9 Sam Davis

2016-05-30 18:18:26 EDT

Any update on this? We can work around the wget issue by disabling fingerprinting but we can't do a release build if we can't access  /usr/bin/sign.

Comment 10 Sam Davis

2016-05-31 12:42:53 EDT

Raising to P1 because the deadline for RC3 +3 builds is tomorrow.

Comment 11 Denis Roy

2016-05-31 13:25:32 EDT

We're on this, Sam. Sorry for the delay.

I think many projects have moved to the cbi signing maven plugin.
https://www.eclipse.org/cbi/sitedocs/index.html

Comment 12 Eclipse Webmaster

2016-05-31 13:28:34 EDT

Fixed, sorry about that.

-M.

Comment 13 Sam Davis

2016-05-31 14:35:29 EDT

Thanks very much. The release build is working now, and I can work around wget being broken (comment 8) by using curl instead. And it seems as though moving hosts fixed the insufficient memory problem.