Bug 447072 - Node goes "off line"...?
Summary: Node goes "off line"...?
Status: NEW
Alias: None
Product: Hudson
Classification: Technology
Component: Core (show other bugs)
Version: 3.2.1   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Winston Prakash CLA
QA Contact: Geoff Waymark CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-14 07:39 EDT by Stuart Lorber CLA
Modified: 2014-10-14 07:39 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stuart Lorber CLA 2014-10-14 07:39:43 EDT
Ran a large amount of jobs over the weekend.

Got an exception / job failures because a node was "unavailable" (not sure how to describe it).

At the time the exception occurred only a few jobs would have been running across our entire system.

Happened on three "multi-configuration" jobs.

Node was not offline on Monday and later job submissions ran without any intervention with the node (restarting, etc).

This has happened before but it is infrequent.  This is the first time it affected multiple job flows.

Exception / console output:

20:01:29  Triggering q7_Mac,1
20:01:29  Triggering q7_Windows,5
20:01:29  Triggering q7_Mac,4
20:01:29  Triggering q7_Linux,3
20:01:29  Triggering q7_Windows,2
20:01:29  Triggering q7_Linux,2
20:01:29  Triggering q7_Linux,4
20:01:29  Triggering q7_Windows,3
20:01:29  Triggering q7_Windows,4
20:01:29  Triggering q7_Linux,1
20:01:29  Triggering q7_Linux,5
20:01:29  Triggering q7_Mac,2
20:01:29  Triggering q7_Mac,5
20:01:29  Triggering q7_Windows,1
20:01:29  Triggering q7_Mac,3
20:01:35  q7_Mac,1 is still in the queue: Waiting for next available executor on q7_Mac
02:05:49  q7_Windows,5 is still in the queue: Waiting for next available executor on q7_Windows
18:01:43  Interrupting #5
18:01:43  FATAL: channel is already closed
18:01:43  hudson.remoting.ChannelClosedException: channel is already closed
18:01:43  	at hudson.remoting.Channel.send(Channel.java:476)
18:01:43  	at hudson.remoting.Request.call(Request.java:104)
18:01:43  	at hudson.remoting.Channel.call(Channel.java:643)
18:01:43  	at hudson.Launcher$RemoteLauncher.kill(Launcher.java:772)
18:01:43  	at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:549)
18:01:43  	at hudson.model.Run.run(Run.java:1493)
18:01:43  	at hudson.matrix.MatrixBuild.run(MatrixBuild.java:164)
18:01:43  	at hudson.model.ResourceController.execute(ResourceController.java:82)
18:01:43  	at hudson.model.Executor.run(Executor.java:137)
18:01:43  	at hudson.model.OneOffExecutor.run(OneOffExecutor.java:61)
18:01:43  Caused by: java.io.IOException: Unexpected termination of the channel
18:01:43  	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1023)
18:01:43  Caused by: java.io.EOFException
18:01:43  	at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2598)
18:01:43  	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1318)
18:01:43  	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
18:01:43  	at hudson.remoting.Channel$ReaderThread.run(Channel.java:1017)