407545 – Hudson aggregation job getting "old" content of b3aggrcon files

Bug 407545 - Hudson aggregation job getting "old" content of b3aggrcon files

Summary: Hudson aggregation job getting "old" content of b3aggrcon files

Status:	RESOLVED FIXED

Alias:	None

Product:	Community
Classification:	Eclipse Foundation
Component:	Cross-Project (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P1 normal (vote)
Target Milestone:	---
Assignee:	David Williams
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-05-08 10:51 EDT by David Williams
Modified:	2013-10-07 03:21 EDT (History)
CC List:	9 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Williams

2013-05-08 10:51:35 EDT

We've seen this morning that both MAT and Riena have updated contributions, but according to logs, the "hudson job" (or, more exactly, the build.eclipse.org job in /shared/simrel area, triggered by hudson job), is pulling "old" content. 

I tried "cleaning" work space and build.eclipse.org "working" area, but that didn't solve the problem. So, will need some more investigation.

Comment 1 David Williams

2013-05-08 11:50:48 EDT

We use CGit to "get" the latests content from git repo (for main part of our build). 

We currently use this URL: 

http://git.eclipse.org/c/simrel/org.eclipse.simrel.build.git/snapshot/master.zip

(and that's the one that gives us "old" content). 

I've noticed if I use "full form" of CGit URL

http://git.eclipse.org/c/simrel/org.eclipse.simrel.build.git/snapshot/org.eclipse.simrel.build-master.zip

Then the content is accurate. 

Webmasters, has there been any changes lately in the setting of CGit, related to amount of time is "caches" archive requests? 

I don't mind changing to "full form" if that is (somehow) more the correct way to do it ... but, concerned that if there is simply a long caching window that it too will "get out of date" after "I" make lots of requests. 

I've always assumed the "short form" we currently use was just a "short hand" for the longer form ... but, if the long form is "more correct" for some reason, I'd appreciate knowing that.

Comment 2 Denis Roy

2013-05-08 13:24:56 EDT

(In reply to comment #1)
> Then the content is accurate. 

I suspect it is only accurate because the URL is different, therefore cgit is not using the cached data.

> Webmasters, has there been any changes lately in the setting of CGit,
> related to amount of time is "caches" archive requests? 

No, but someone else did report a cgit cache issue -- however, it was for content that was only minutes old, and it eventually showed up.


> I don't mind changing to "full form" if that is (somehow) more the correct

I didn't even know there was a short and long form.

> way to do it ... but, concerned that if there is simply a long caching
> window that it too will "get out of date" after "I" make lots of requests. 

The root problem is likely that cgit was not designed to scale beyond one server.  Currently the cache is shared amongst all the servers.  Perhaps we could split it up, but that means there will be three copies of the cache  :/

Is there no way you can fetch from git:// instead, or use the filesystem?  That would be much more reliable.

Comment 3 David Williams

2013-05-08 14:01:35 EDT

(In reply to comment #2)
> (In reply to comment #1)
> > Then the content is accurate. 
> 
> I suspect it is only accurate because the URL is different, therefore cgit
> is not using the cached data.
> 
> > Webmasters, has there been any changes lately in the setting of CGit,
> > related to amount of time is "caches" archive requests? 
> 
> No, but someone else did report a cgit cache issue -- however, it was for
> content that was only minutes old, and it eventually showed up.
> 

Yes, I sometimes seem inconvenient delays of several minutes ... which is reasonable given constraints ... but, never hours (which, is why I think this is something "special"). 


> Is there no way you can fetch from git:// instead, or use the filesystem? 
> That would be much more reliable.

Sure, given enough programming work ... but, I've got to admit, I find it conceptually wrong to "clone a repo" only because I want the most recent files from it. CVS had "export", Git doesn't. It does have some "get archive" function, but that's not "publically" available, and assume CGit is using that behind the scenes ... so seems like a require capability (Plus, while not related to this build, making Git clones on Windows always results in "undeletable files" preventing a clean workspace.)

So, in other words, we need CGit to work ... and can tolerate a few inconvenient delays of several minutes sometimes ... but, hours means something is wrong.

Comment 4 Denis Roy

2013-05-08 14:04:46 EDT

I'll check the cgit source to see how it manages to only pull the most recent files and zip them up nicely.

Comment 5 David Williams

2013-05-08 14:48:17 EDT

I've fixed to use "long form". The first build failed due to the usual reasons, of some "incompatible" contributions. 

I changed a b3aggrcon file to disable one feature, waited a few minutes, and started a new build ... the change was "picked up", just fine, and build proceeding (though, I expect it to fail a little further down the line, for different reasons, the point is -- at least for now -- we seem to be getting current content of b3aggrcon file "right away".). 

So, closing this bug as fixed. Will consider further changes/improvements if this turns out to be a frequent, unavoidable problem that simply can not be supported by CGit or infrastructure.

Comment 6 Matthias Sohn

2013-05-08 18:08:44 EDT

usually fetching updates into a clone is faster than copying latest source zip since git fetch will only transfer the changes which happened since the last fetch

Comment 7 David Williams

2013-05-10 13:37:32 EDT

Just to make a note on another occurrence, happened for our "last build" of M7 ... someone's final udpate was "missed" due to stale content. In that case, it was a matter of "minutes", though, and not sure how long we would have continued to retry if we had noticed. And today its fine. 

Just wanted to make notes here, when I notice it, to have a better idea of frequency ... which in turn will tell me how much effort into changing the build.

Comment 8 Ed Willink

2013-08-20 00:33:29 EDT

I just made one change for OCL and hit this. Rebuilt and the build then showed my problem. Eventually I disabled my contribution, but was able to do two builds both using the old aggrcon. So in three changes I hit this twice.

Since changes to aggrcon are small and quick it seems likely that many releng's will have this problem.

IT is particularly confusing given that the Hudson changes clearly shows that the change hs been picked up. Why can't the build use what Hudson already knows?

Comment 9 David Williams

2013-08-20 10:09:11 EDT

The "delay" between Git changes and CGit currency seems to be happening frequently enough that I will devote the time needed to "fix" the build to use Git only. But, not sure when ... I'd guess "this week", but in the mean time, be patient, CGit will be "refresh" its caches eventually. (If it seems to take more than an hour or so, please say so, since pretty sure it should never take that long, and would indicate some infrastructure issue).

Comment 10 Ed Willink

2013-08-20 10:14:06 EDT

My best guess at the delays that I observed were up to 5 minutes.

Comment 11 David Williams

2013-10-01 12:17:25 EDT

To give some status here, I've been testing the "clone and pull" method of getting files, so "old content" should no longer be a problem. 

Will still be building in /shared/simrel area (instead of directly in Hudson workspace) ... but ... one (small) step at a time. :/

I plan to make this change after M2 is complete, since there might be a few "failed builds", just due to me forgetting one thing or another.

Comment 12 David Williams

2013-10-07 03:21:22 EDT

Fixed/deployed this weekend, for both master (Luna) and R4_3_maintenance (Kepler maintenance). 

Once unanticipated complication is that some project(s) (sphinx.b3aggrcon?) did something special with EOL characters ... perhaps with .gitattribute? ... so the files immediately appear "modified" and won't allow normal pull or reset. 

Perhaps they didn't set autocrlf=false?  (Or, the main place I've seen it before was when user specified someing in .gitattributes ... but, don't recall the details right off. 

Or ... I'm doing something wrong? :) 

An any case, I converted all 2 effected files (sphinx.b3aggrcon, and simrel.b3aggr) to use "Unix LF" for line endings, re-cloned the repo, and things worked normally then, repeatedly. 

I'll open a separate bug if I see again. 

Just to document it, I did the get checkout/pull/reset type stuff for only 

org.eclipse.simrel.build

Not the other two "simrel repos": but they rarely change, and used only by "releng", so "users" will notice no problem for these, and they can be fixed later if desired.  

org.eclipse.simrel.tests
org.eclipse.simrel.tools