Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [cross-project-issues-dev] Builds consuming all CPU

David, thanks for the Linux 101.  Let's graduate to Linux 102 -- other bottlenecks. The build server CPU is only one of four potential reasons a build may be slow:

Reason 2: RAM bottleneck

If everyone maintains a 2GB Java process, the build server will a) have less memory for disk cache and b) eventually begin to swap.  With top, you can determine available memory by looking at these two lines:
Mem:  15924728k total, 15422960k used,   501768k free,   599732k buffers
Swap:  4196344k total,      180k used,  4196164k free,  9080944k cached

In this case, although there is 16G "used" and only 500M "free", there is 9.08G allocated to file cache.  This is good.


Reason 3: Disk bottleneck

When you run a build, you're using all kinds of disk resources: CVS/SVN disks, workspace disks, download.eclipse.org disks, /shared (or temp) disks, etc.  The 'nice' command is not aware of how busy any particular disk subsystem is, so even if you nice +19, you are still using many disk resources. 

If you're using top, you can determine how busy the disks are by looking at these two clues:

Cpu(s): 22.9%us,  2.9%sy,  0.0%ni, 68.6%id,  0.9%wa,  0.1%hi,  0.3%si,  4.2%st
CPU time spent in IO Wait--------------------^^^^^^

16944 hudsonbu  17   0  818m  23m 6076 D  40  0.2   0:00.81 /opt/public/common/ibm-java2-p
Look here -----------------------------^
Processes in "D" state are completely blocked, waiting for I/O  <-- bad
Processes in "R" state are Running, and using CPU cycles
Processes in "S" state are in interruptible Sleep
The higher the %wa value is, the more the build server's CPUs are wasting their time waiting for I/O.  In this case, since our Gigabit LAN is far from saturated, you can be assured that the IO Wait is related to one (or more) disk subsystems.


Reason 4: Network bottleneck

Since build.eclipse.org shares a Gigabit switch with everything else at Eclipse.org, our internal network is not a source of bottleneck.  Yet.



This concludes today's Linux 101 lesson.  There won't be a quiz.


Denis



On 03/17/2010 05:00 PM, David M Williams wrote:

> Ah, so wtpBuild is in fact the only one who is nice ....

Thanks for pointing this out ... we will strive to live up to community norms and end this aberrant behavior at once!  

:)

Well, we are probably leading the way because we were (are?) leading the way in hogging the build machine in the first place. Do let me know if you see "wtpBuild" misbehaving.

More constructively, for a little Linux 101, when I started using 'nice' I had a very hard time getting all the "commands" and "arguments" to pass through as expected but finally discovered the magic arguments string ("$@") ... it is somehow treated "special" by the interpreter and ends up "reconstructing" the right arguments with various spaces and quotes all preserved as intuitively expected.

In our "runAnt" script, I end with

exec nice --adjustment 15 "${directory_variable}/ant.sh" "$@"

It runs the ant script (as before) but at the lower priority (and anything ant spawns is at that same lower priority). We actually run our "build server" at normal priority, but it doesn't do much and (most) jobs it kicks off are at the lower priority.

In case anyone finds that helpful.

I'd strongly urge all "tests" (at least) to be ran at lower priority like +15 (yes, higher numbers mean lower priority ... increasing niceness to others, I guess). I settled on "15" because anything lower (like 5, 10) didn't seem to make any difference at all, and things higher (e.g. 20) seemed to make a really noticeable difference. If you're worried you'll run too slow, under "normal" load, we still complete in the same time ... but, under heavy load, we take maybe 25% longer, which is the way it should be (especially for "tests").  Off hand, I'd say anything that takes over 60 minutes to complete should be ran at lower priority, and let those little 10-20 minute jobs (still) finish quickly. [All based on informal observations ... I'm sure others might have different, better advice.]

HTH





From: Thomas Hallgren <thomas@xxxxxxx>
To: Cross project issues <cross-project-issues-dev@xxxxxxxxxxx>
Date: 03/17/2010 01:55 PM
Subject: Re: [cross-project-issues-dev] Builds consuming all CPU
Sent by: cross-project-issues-dev-bounces@xxxxxxxxxxx





Ah, so wtpBuild is in fact the only one who is nice (aside from the jarsigner) ?

- thomas

On 03/17/2010 06:51 PM, Denis Roy wrote:

You're seeing "15" as the nice value, not -15.  Only root can lower the nice value beyond zero.

Denis


On 03/17/2010 01:38 PM, Thomas Hallgren wrote:

While I'm at it, I should also complain about this:

21771 wtpBuild  30  15  583m  89m 7408 S  145  0.6   0:23.77 java

Perhaps wtpBuild was inadvertently started with a negative nice value, i.e., nice -n -15? The effect of that is that it's not so nice :-). It tries to steal all resources that are available.

- thomas


On 03/17/2010 06:32 PM, Thomas Hallgren wrote:

Over the last couple of weeks, I've done a 'top' from time to time when I feel that my builds take longer then they should. Very often, I see this at the top:

12252 egwin     20   0  668m 168m  11m S  115  1.1   0:09.27 java

A hint to egwin and others that run very heavy builds. The message of the day on the build machine states:

"If you run continuous builds, you should start your shell processes with nice -n 10 (command) to be kind to others."

Another entry that isn't that uncommon at the top is:

29702 hudsonbu  17   0  584m 101m 7372 R  124  0.7   0:25.09 java

which of course raises the question, why isn't Hudson running its jobs with nice -10?

The jarsigner seems to be one of the few that actually does this, and my builds always waits _very_ long times for it to complete.

Regards,
Thomas Hallgren

_______________________________________________



_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
 

_______________________________________________
cross-project-issues-dev mailing list
cross-project-issues-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev


_______________________________________________ cross-project-issues-dev mailing list cross-project-issues-dev@xxxxxxxxxxx https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev


Back to the top