Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cross-project-issues-dev] Build2 Offline?

Konstantin,

Thanks for the feedback.  I've heeded your advice while devising a plan to overhaul our Hudson instance:

http://bugs.eclipse.org/315643


Denis


On 06/01/2010 01:14 PM, Konstantin Komissarchik wrote:
I will second that. I don't think there is anything inherently unstable
about Hudson slaves feature. We are running a 14 node cluster for our
Eclipse tooling efforts at Oracle. We have seen Hudson crashes, which
required a reboot of the entire cluster to correct, but they have all been
attributed to one of two things: (a) running out of disk space (especially
on the controller) and (b) unstable behavior of the source control system
plugin. We are running Perforce, so exact same problems will not apply here,
but considering how many different SCM systems Hudson installation
integrates with at Eclipse.org, I would look there first. What I've noticed
is that an SCM plugin can make the cluster controller unresponsive. Slaves
need a continuous responsive link to the controller (for streaming shell
output, for instance). Once the link becomes unresponsive, the slaves die
quickly. Sometimes, the controller will recover eventually, but slaves do
not. If the Hudson dash is still responsive, I try restarting dead slaves
first. That often takes care of the problem without restarting the entire
cluster.

Oh and one more thing... We found that it works better for cluster stability
if the controller is not tasked with running heavy-duty jobs. We use
virtualization to segment the available hardware and only run builds on
slave nodes.

For what it's worth...

- Konstantin


-----Original Message-----
From: cross-project-issues-dev-bounces@xxxxxxxxxxx
[mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On Behalf Of David
Carver
Sent: Tuesday, June 01, 2010 6:40 AM
To: Cross project issues
Subject: Re: [cross-project-issues-dev] Build2 Offline?

If we are having issues, I'll suggest what we in the eclipse community 
always suggest to our users/adopters.   File a bug against the Hudson 
project itself.   Work with the Hudson developers to address the situation.

Apache is using Hudson with one master server, and 14 slave machines:

http://hudson.zones.apache.org/hudson/

So I would suspect that if there were major issues with Slaves, Apache 
would be experiencing them as well.    If we are having connection and 
communication issues the first place to start is the Forums:

http://hudson.361315.n4.nabble.com/Hudson-users-f361316.html

The second is to open bugs reports:

http://issues.hudson-ci.org/secure/Dashboard.jspa

So let's work with the Hudson community to find out what is the cause of 
the issues.

Dave



On 05/31/2010 06:47 AM, Denis Roy wrote:
  
So let me get this straight:

- starting a Hudson Slave using SSH is problematic
- starting a Hudson Slave with JNLP is problematic

It's beginning to sound like Hudson Slaves are a great idea on paper, 
but in the Real World they don't work.  Perhaps we would be better 
served if the build2 Hudson Slave was simply a separate master server?

I'm truly disappointed in how unstable and unpredictable Hudson is.

Denis




On 05/31/2010 09:39 AM, Webmaster(Matt Ward) wrote:
    
I've restarted the JNLP service, which looked like it was stuck.

-Matt.

Eike Stepper wrote:
      
Hi,

Is there a reason for build2 slave being offline?

Cheers
/Eike

----
http://thegordian.blogspot.com
http://twitter.com/eikestepper
        


Back to the top