Bug 492493 - Need help with permissions (group owner) on build machine
Summary: Need help with permissions (group owner) on build machine
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: Servers (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 minor (vote)
Target Milestone: ---   Edit
Assignee: Eclipse Webmaster CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 487044
  Show dependency tree
 
Reported: 2016-04-26 16:54 EDT by David Williams CLA
Modified: 2016-05-07 19:50 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2016-04-26 16:54:47 EDT
I have an issue with some directories on the build.eclipse.org machine I am hoping you can explain or help me fix. 

The files and directories in question are those _under_ a "build directory" such as 

/opt/public/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/N20160423-1500

There, the "group" is "common" instead of eclipse.platform.releng. 

As far as I can see, "drops4" directory and those above it have the correct "guid" bit set to "sticky", which I thought would make "new directories" have the sticky bit set too. Seems to be correct in file access lists too (give my reading of it) for example

$ getfacl drops4/
# file: drops4/
# owner: e4Build
# group: eclipse.platform.releng
# flags: -s-
user::rwx
user:e4Build:rwx
user:hudsonBuild:rwx
group::rwx
mask::rwx
other::r-x
default:user::rwx
default:user:e4Build:rwx
default:user:hudsonBuild:rwx
default:group::rwx
default:mask::rwx
default:other::r-x


But, for some reason when the N2016* directories are created, they do not have it the sticky bit. 

It is created by the e4Build id (as are the parents) with a simply bash command of 
mkdir -p ${buildDirectory} 
(where buildDirectory is set to the proper "type" and "date". 

Any ideas? 

I have noticed, that if I simply login to e4Build and create the directory, it does have the sticky bit set. But, when our builds run, they are started from a cron job. Is there some environment variable/setting that controls if the sticky bit is inherited? Or, is the cron job sort of equivalent to "no user"? 
Perhaps I need to set "USER=e4Build"? I didn't think of that sort of option until just now. I will also google for an answer. 

Just thought it might be something obviously wrong with the setup. 

= = = = = = = = = 

The reason I ask (besides not liking to have 'common' have rw access) is that I am trying to move some of *my* cronjobs to the "releng hipp" instance. 

Several problems doing that, but the most recent attempt simply tried to move one of the "cleanup" cron jobs there. It is executed by genie.releng which is a member of eclipse.platform.releng, but not apparently of common, so it gets a "permission denied" when it tries to do the cleanup.
Comment 1 David Williams CLA 2016-04-27 04:14:44 EDT
Another relevant fact is that I do explicitly set umask to 0002 in the cron job. (The default is 0022). I am fairly sure that is the right value, but it has been set that way since the PPC days. 

Some google hits mentioned for it to "take effect", they had to put on the same line as the main command, something like 

0 10 * * 3 * uname 0002; /main/path/of/interest

They were always vague about which system, and many of the posts old, so not sure if that applies to our Suse or not. Might be worth an experiment one of these days. (Unless someone knows for sure).
Comment 2 David Williams CLA 2016-05-02 16:15:32 EDT
I think I have found a solution, but if you ever hear (or know) of more technical information about this issue, let me know. 

First, I even tried 
mkdir -p --verbose --mode u=rwx,g=rwxs,o=rx  ${buildDirectory}
but that did not help. 

I think the problem is related to "depth" of directories. I assume "relative to current directory" -- but, do not know for sure, might be related to absolute depth, somehow. 

I will give a concrete example, to help explain. 

I am "executing" with current directory equal to 
/shared/eclipse/builds

The "parent" of what I was trying to create (all with correct GID bit and group ownership) is  
/shared/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/

And what I am trying to create (with correct permissions) is 
/shared/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/N20160502-1550

I was using "make directory" (from current directory) with something like the following (all "variables", of course in my bash scripts). 

mkdir -p "/shared/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/N20160502-1550"

All other attempts to fix this failed, except I discovered (by guessing) that if I first did a 
pushd "/shared/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/" 
then
mkdir "N20160502-1550"

then the GID bit was set correctly, and the directory and its children had the correct group set. 

Pretty weird, eh? Especially that there was no mention of this sort of problem I could find by searching the web. 

Even a little stranger, if, following this initial creation, I then create other children, still from 
/shared/eclipse/builds
such as 
mkdir -p "/shared/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/N20160502-1550/buildlogs"

Then "buildlogs" is correct. 

I am still doing a full build to make sure this simple fix holds. Some directories we create will get "deeper", so am curious if the "fix" holds for all those cases too. 

But, if my guesses in my fix are correct, it may be that anything over "5" levels deep -- relative to current directory -- will have issues. (That doesn't explain why I can go "deeper" once that sixth level exists. 

I am sure all this interacts with our "funny" e4Build id, acls, NFS, etc, but, I might have fixed it so will be optimistic and mark as such. 

I will comment again if I find even deeper directories end up wrong.
Comment 3 David Williams CLA 2016-05-02 21:17:32 EDT
(In reply to David Williams from comment #2)

> I will comment again if I find even deeper directories end up wrong.

There was one other directory that was given the wrong permissions and GID bit. 
It was a direct child of 
/shared/eclipse/builds/4N/siteDir/eclipse/downloads/drops4/N20160502-1550 
(to continue above example). 

There were lots of other directories created under the same parent, and some went 7 levels deeper! 

Some of the deepest ones, though, are admittedly created by Ant, instead of bash, but some of the other direct children were from bash. 

So, I do not know how to explain it, but glad it is (sort of) working. 

I say "sort of" since I won't really trust it until I see it work in many builds. My greatest fear is there is some semi-random timing issue involving NFS or something odd.
Comment 4 David Williams CLA 2016-05-02 21:18:56 EDT
(In reply to David Williams from comment #3)
> (In reply to David Williams from comment #2)
> 
> > I will comment again if I find even deeper directories end up wrong.
> 
> There was one other directory that was given the wrong permissions and GID
> bit. 

I meant to say I could cure this case in the same way as previously mentioned -- first cd to parent, then mkdir the child.
Comment 5 David Williams CLA 2016-05-07 19:50:50 EDT
In an ?interesting? twist, the fix that I thought initially fixed this did not, as I made a typo and an empty directory was being made elsewhere (and never used). 

This implies, to me, that perhaps there is a matter of "time" involved, instead "depth". That perhaps if a directory or its parent is created, but not written immediately to disk, that subsequent directory creation may not get the correct permissions. 

That seems like quite a stretch. But all I can think of.