Bug 406362 - hudson master should investigate and report via log when a slave abruptly stops streaming to the master.
Summary: hudson master should investigate and report via log when a slave abruptly sto...
Status: NEW
Alias: None
Product: Hudson
Classification: Technology
Component: Core (show other bugs)
Version: unspecified   Edit
Hardware: Sun Solaris
: P3 enhancement with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Winston Prakash CLA
QA Contact: Geoff Waymark CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-23 13:46 EDT by David Katleman CLA
Modified: 2013-10-16 03:36 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Katleman CLA 2013-04-23 13:46:35 EDT
Ran into strange hudson errors today, when I ran multiple builds on the same slave.

The original job logs are at the bottom of the email.

The current JDK8 builds look at system memory and number of processors and figures out how much parallelism the build can do.

What seems to have happened is three builds were running on the same slave, each thinking it had full use of the 3G memory and the 4 processors, when in fact they were sharing.   So each likely sucked up more memory that was available.

I don't dispute that we shouldn't run so many such builds on a slave.

It's how hudson logs were cut of in midstream, no build error was reported in any of these job logs.

> On 4/23/2013 8:16 AM, Winston Prakash wrote:
>> Hi David,
>>
>> Master delegates the builds to slave. The slaves continue to run the builds and streams back the status, logs and build artifacts to the master. If the slave is completely starved off memory/resources then it may not be able to send back the logs to the master. Probably that is why the logs were cut of in midstream in the master.
>
> Is there anyway to have the master detect that the slave has stopped streaming the data abruptly and append the log with an explanation for the abrupt end of the logs? 

Unfortunately, in the current Hudson architecture it is only a push from slave. Means, after master starts the build at slave, it just waits for data to be pushed from the slave. If slave fails and stopped pushing the data, Hudson does not tries to find out what has happened to the slave and write info accordingly in the log file.

We could consider this as a Hudson architecture flaw and improve in future version.

Please file a bug at  https://bugs.eclipse.org/bugs/enter_bug.cgi?product=Hudson

- Winston