Bug 324559 - "No space left on device"
Summary: "No space left on device"
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: Cross-Project (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 blocker (vote)
Target Milestone: ---   Edit
Assignee: Nicolas Richeton CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 324692
  Show dependency tree
 
Reported: 2010-09-06 05:55 EDT by Nicolas Bros CLA
Modified: 2010-11-12 13:05 EST (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nicolas Bros CLA 2010-09-06 05:55:20 EDT
Hudson builds fail on build.eclipse.org because the disk is full.
Comment 1 Denis Roy CLA 2010-09-07 09:05:37 EDT
Reassigning to cross-project.  I can clean up, but you won't like what I decide to remove.
Comment 2 David Williams CLA 2010-09-07 09:48:39 EDT
It would help, Denis, if you could tabulate the biggest offenders. I would, but I get a lot of "access denied" errors when doing "du" so don't think my numbers would be that accurate? 

But, here's what I can see/do (for those of you who don't know). 

Overall, there's 209G reserved for /opt/users (this includes the "build users", such as hudsonbuild, wtpBuild, e4Build, etc.). wtpBuild is only a few Megabytes (17 M). 

From what I can see of hudsonbuild (su'ing to it) it takes up about 82 Gigs. That leaves a whole lot used somewhere else! 

In hundsonbuild id, I did remove some JVM dump trace files (Snap.*.trc) which were a gig or two. 

Also, there's a bunch of directories named something like 
rseTest1271013394120 
I can't "see" what's in them .... but they should all be removed. 
they contain a file named "noPerm.txt" ... but even "hudsonbuild" can not "see" them. They are (fairly) obviously left over from some RSE unit tests ... and the RSE team should use a TMP directory to write their files to (the default is probably $user.home, hence goes to 'hudsonbuild', which is a bad practice. 
So, I'd say safe to remove those rseTest* direcotories. 

I'm sure there is a lot of space used under .hudson/jobs ... and some of those could be cleaned up, I'm sure, by those job owners removing old builds, etc., but I didn't tabulate those, since there's at least 100G used by other, individual "build ids" which seems a little odd to me. 

So ... I think a tabulation of first level directories of /opt/users would help. That is, what's result of 
du -sh /opt/users/*  

If there's more that _I_ can do, let me know ... but I think "root" is needed to see the detail of who is using how much? 

Thanks,
Comment 3 Martin Oberhuber CLA 2010-09-07 10:06:05 EDT
I have filed bug 324657 to investigate the rseTest* left-overs.

I'm very sure that none of these contains much data ... is there I way I could become user hudsonbuild such that I could manually clean up ?
Comment 4 David Williams CLA 2010-09-07 10:20:58 EDT
(In reply to comment #3)
> I have filed bug 324657 to investigate the rseTest* left-overs.
> 
> I'm very sure that none of these contains much data ... is there I way I could
> become user hudsonbuild such that I could manually clean up ?

I don't think that'd help? I was logged on as husonbuild, and could not clean them up. I think there's something odd about them. When I try to "list" one, this was the reply: 

hudsonbuild@build:~> ll rseTest1274795487065
ls: cannot access rseTest1274795487065/noPerm.txt: Permission denied
total 0
?????????? ? ? ? ?                ? noPerm.txt

Not sure if its showing me the '?' because I don't have access ... or if _no one_ has access (literally, no permission fields set?). But seems like a job for 'root' to me :)
Comment 5 Martin Oberhuber CLA 2010-09-07 11:42:33 EDT
(In reply to comment #4)
> I don't think that'd help? I was logged on as husonbuild, and could not clean
> them up. 

The point is that the directories are not executable. You need to
   chmod a+x rseTest*
then you'll be able to see and delete them.

BTW, I verified that the original bug has been fixed in RSE 3.2, so this hasn't happened since May-25. All those left-overs are old and shouldn't appear again once manually cleaned.
Comment 6 David Williams CLA 2010-09-07 12:16:20 EDT
 
> 
> The point is that the directories are not executable. You need to
>    chmod a+x rseTest*
> then you'll be able to see and delete them.
> 

Directories must be executable to see and delete their contents. Sigh ... no matter how much I learn, I'll never understand or know Linux. :) 
Thanks Martin. All gone. (and, yes, didn't amount to anything measurable).
Comment 7 David Williams CLA 2010-09-07 13:48:34 EDT
I decided to go ahead and do a du on everything ... with the idea that even if I couldn't count everything, it might show the "heavy hitters". And from what I can see there's only two. hudsonBuild (kind of understandable, since artifacts go there ... instead of /shared partition). But 'nebulaBuild' .. what's that? Why's it using half the partition?  (There's only about 20 G unaccounted for ... so I think the solution lies in cleaning up nebulaBuild (and maybe a little in hudsonBuild). 




hudsonbuild@build:~> du /opt/users/* -sh 2>/dev/null
1.0K    /opt/users/dashBuild
12M     /opt/users/e4Build
69K     /opt/users/eclipselinkBuild
82G     /opt/users/hudsonbuild
37M     /opt/users/jettyBuild
1.0K    /opt/users/modelingBuild
101G    /opt/users/nebulaBuild
512     /opt/users/orbitBuild
69K     /opt/users/ormfBuild
0       /opt/users/phpBuild
77K     /opt/users/rienaBuild
4.0K    /opt/users/runphpbuild
0       /opt/users/tempdroy
81K     /opt/users/virgoBuild
1.0K    /opt/users/wtpBuild
Comment 8 David Williams CLA 2010-09-07 14:13:11 EDT
Nicolas (Richeton), you appear to "own" most of the large directories in "nebulaBuild". 

Can you remove/clean those up? Perhaps a little (20 G?) short term, but long term, you know large amounts of storage or build artifacts are supposed to go the "/shared" location. Not a "user'd home directory". You can ask webmasters for a "nebula" directory under /shared and do your builds there? It (by design) has a lot more space: 

$ df -h ./
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0             1000G  603G  398G  61% /opt/public
Comment 9 Denis Roy CLA 2010-09-07 15:12:25 EDT
> There's only about 20 G unaccounted for

The local mysql instance on that server is using about 23G

build:/opt/local/data # du mysql/ -sh
23G     mysql/


I'm not sure what is in there ... I think modeling is maintaining a searchCVS function, and there are some eclipselink tables.  We'd probably need to clean it up, and for good measure, dump it and reimport it to compact the data files.
Comment 10 Denis Roy CLA 2010-09-07 15:14:35 EDT
(In reply to comment #8)
> /dev/md0             1000G  603G  398G  61% /opt/public

Gee, even /shared is filling up nicely.
Comment 11 David Williams CLA 2010-09-07 18:40:42 EDT
I received an email from Nicolas (pasted below), and he asked for help from webmasters, as he is on vacation now, and only has access to email. 

Specifically, he asked that his crontab be stopped/commented out, and the following two directories to be removed:

/opt/users/nebulaBuild/workdir/nebula_builds/technology/
/opt/users/nebulaBuild/workdir/nebula-builds/technology/

From what I can tell, this will immediately free up 100 G!

 = = = = = = 

From:	Nicolas Richeton <nicolas.richeton@gmail.com>
To:	David M Williams/Raleigh/IBM@IBMUS
Date:	09/07/2010 04:39 PM
Subject:	Nebula disk usage




Hi David,

I'm on vacation right now and I don't have a full internet access, only mail.

Can you ask the webmaster to : 
- Comment my crontab (builds are failing anyway - and they don't clean up on failure)
- Delete all builds folders in
   nebulaBuild/work/Nebula-build/technology... and nebulaBuild/work/nebula_build/technology ...

I'll try to fix this when I get back
Thanks,

Nicolas
Comment 12 Denis Roy CLA 2010-09-08 09:25:17 EDT
I've commented out the cron jobs, and am in the process of deleting the directories.

Do we have any idea of what is being used by the local MySQL instance?  Can it be cleaned/purged?  23G seems to be a lot.

I can ask on cross-project if need be.
Comment 13 David Williams CLA 2010-09-08 09:34:07 EDT
(In reply to comment #12)
> I've commented out the cron jobs, and am in the process of deleting the
> directories.
> 
> Do we have any idea of what is being used by the local MySQL instance?  Can it
> be cleaned/purged?  23G seems to be a lot.
> 
> I can ask on cross-project if need be.

I don't. I think if anyone would it'd be Nick. I'll add him to CC, but if he doesn't know you'll have to "ask around" on cross project list.
Comment 14 David Williams CLA 2010-11-12 13:05:09 EST
just noticed this blocker still open ... I think we can resolve as fixed.