Bug 496282 - Simplify "collect and process tests" logic to avoid "cron job" on Releng HIPP
Summary: Simplify "collect and process tests" logic to avoid "cron job" on Releng HIPP
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.6   Edit
Hardware: PC Linux
: P3 enhancement (vote)
Target Milestone: 4.7 M1   Edit
Assignee: Sravan Kumar Lakkimsetti CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 487044
Blocks:
  Show dependency tree
 
Reported: 2016-06-16 22:57 EDT by David Williams CLA
Modified: 2016-08-31 12:23 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Williams CLA 2016-06-16 22:57:36 EDT
When moving jobs to Releng HIPP we duplicated the same logic we used when "e4Build" was doing the buiblds. Namely, the test Hudson instance write a "data file" to /shared/eclipse/testjobdata 
Then a "cronjob" checks for some new data in that queue (like every 10 minutes) and if found, reads that data, which provides enough information to fetch the test results, process and summarize them, and upload them to the proper (matching) build location. 

Now that we have a "Releng Hipp" instead of a cron job, we can have the "Test Hudson's" trigger a job on the Releng HIPP (but like we trigger the jobs on the test Hudsons to begin with ... sending a "curl post" request to the Releng HIPP to the right job, with the right data wrapped in JSON arguments. 

This is more efficient since the current cron job must run fairly frequently, but does nothing 95% of the time. So would be better to be "event driven" instead of "poll and loop" with a cron job.
Comment 1 Eclipse Genie CLA 2016-07-20 08:27:03 EDT
New Gerrit change created: https://git.eclipse.org/r/77595
Comment 2 Sravan Kumar Lakkimsetti CLA 2016-07-20 08:29:43 EDT
this is two stage fix. currently the file is taken as input for collect.sh. this needs to change to accept command line arguments. Then change the ep-collectResults job to call collect.sh directly, without having a file generated and running another cronjob.

The patch attached fixes the first part accepting the command line arguments. Once this goes in I will make change to the ep-collectResults job.

The command will be
1. clone utilities 

$ <path to collect.sh>/collect.sh $triggeringJob $triggeringBuildNumber $buildId $eclipseStream $EBUILDER_HASH
Comment 3 David Williams CLA 2016-07-20 09:48:03 EDT
(In reply to Sravan Kumar Lakkimsetti from comment #2)
> this is two stage fix. currently the file is taken as input for collect.sh.
> this needs to change to accept command line arguments. Then change the
> ep-collectResults job to call collect.sh directly, without having a file
> generated and running another cronjob.
> 
> The patch attached fixes the first part accepting the command line
> arguments. Once this goes in I will make change to the ep-collectResults job.
> 
> The command will be
> 1. clone utilities 
> 
> $ <path to collect.sh>/collect.sh $triggeringJob $triggeringBuildNumber
> $buildId $eclipseStream $EBUILDER_HASH

Can you explain the logic or "workflow" a bit more? Or, can you say if you have tested this locally? 

Previously, when "collect.sh" was running on the build machine (under /shared/eclipse somewhere) then the "output" of triggeringJob and triggeringBuildBumber eventually went to somewhere such as 
/shared/eclipse/builds/4I/sitedir/eclipse/downloads/drops/${BUILDID}/testResults

So now if collect.sh is running on Hudson, do you need to first "pull" those results from triggering job? and then "push" (copy) them to /shared/eclipse.... 

To ask the question another way, I am not sure the "zip" file of the results literally exists until it is requested. It might, I am just not sure. If it does, then a "copy" would work, but if it doesn't it seems like a "pull" of just the zip will be required first, and then it copied somewhere. 

So, I am just curious if you have worked with this locally enough to know that it works. 

= = = = = 

Changing the "input" to collect from a file to command line arguments seems like a 50/50 sort of thing -- that is, does not harm, but would not literally be required. Am I seeing that wrong?
Comment 4 David Williams CLA 2016-07-20 09:54:58 EDT
(In reply to David Williams from comment #3)

I should also mention, part of the reason this "worked" before is that the files that did the heavy lifting were already on the build server. 

Remember, some days you will not get test results in a nice and neat order. 

You might get unit tests for Mac from an I-build, for example, then Windows for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed up". 

And each of those "streams" *might* at times have different versions of the files that "process" the results correctly. 

None of this is probably new news to you, I am just confused how the changes are proposed to work. 

Are they supposed to work entirely from the Hudson machine, and from there be uploaded to "downloads"? Or do they still have to go back to the build machine for processing.
Comment 5 Sravan Kumar Lakkimsetti CLA 2016-07-20 13:47:39 EDT
(In reply to David Williams from comment #4)
> (In reply to David Williams from comment #3)
> 
> I should also mention, part of the reason this "worked" before is that the
> files that did the heavy lifting were already on the build server. 
> 
> Remember, some days you will not get test results in a nice and neat order. 
> 
> You might get unit tests for Mac from an I-build, for example, then Windows
> for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed
> up". 
> 
> And each of those "streams" *might* at times have different versions of the
> files that "process" the results correctly. 
> 
> None of this is probably new news to you, I am just confused how the changes
> are proposed to work. 
> 
> Are they supposed to work entirely from the Hudson machine, and from there
> be uploaded to "downloads"? Or do they still have to go back to the build
> machine for processing.

My idea here is to call collect.sh with the command line options in the job ep-collectResults.

triggeringJob=$JOB_NAME
triggeringBuildNumber=$BUILD_NUMBER
buildId=$buildId
eclipseStream=$eclipseStream
EBUILDER_HASH=$EBUILDER_HASH

this way we have control which test results we are promoting. 
current code requires an intermediate file with the same command line options. 

Current behaviour is ep-collectResults creates a intermediate file in test results queue with above command line options

the job  eclipse.releng.checkAndCollectTestResults checks the queue and calls collect.sh with the above command line options. 

My idea is to call the collect.sh directly instead of the intermediate files and cron jobs.

To execute collect.sh we still need access to /shared/eclipse folder so that we can create the test results folder. 

the other enhancement I have is moving ep-collectResults to releng hipp and call this using curl commands from the test jobs. this I am planning to work on tomorrow
Comment 6 David Williams CLA 2016-07-20 15:03:12 EDT
(In reply to Sravan Kumar Lakkimsetti from comment #5)
> (In reply to David Williams from comment #4)
> > (In reply to David Williams from comment #3)
> > 
> > I should also mention, part of the reason this "worked" before is that the
> > files that did the heavy lifting were already on the build server. 
> > 
> > Remember, some days you will not get test results in a nice and neat order. 
> > 
> > You might get unit tests for Mac from an I-build, for example, then Windows
> > for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed
> > up". 
> > 
> > And each of those "streams" *might* at times have different versions of the
> > files that "process" the results correctly. 
> > 
> > None of this is probably new news to you, I am just confused how the changes
> > are proposed to work. 
> > 
> > Are they supposed to work entirely from the Hudson machine, and from there
> > be uploaded to "downloads"? Or do they still have to go back to the build
> > machine for processing.
> 
> My idea here is to call collect.sh with the command line options in the job
> ep-collectResults.
> 
> triggeringJob=$JOB_NAME
> triggeringBuildNumber=$BUILD_NUMBER
> buildId=$buildId
> eclipseStream=$eclipseStream
> EBUILDER_HASH=$EBUILDER_HASH
> 
> this way we have control which test results we are promoting. 
> current code requires an intermediate file with the same command line
> options. 
> 
> Current behaviour is ep-collectResults creates a intermediate file in test
> results queue with above command line options
> 
> the job  eclipse.releng.checkAndCollectTestResults checks the queue and
> calls collect.sh with the above command line options. 
> 
> My idea is to call the collect.sh directly instead of the intermediate files
> and cron jobs.

I guess this is what I was confused about. Where are you going to call collect.sh from? 

At the end of each test? Note: currently for performance tests, we do (in concept) call it at the end of each job since that machine is restricted to 1 executor, by design. That is, we do not use the ep-collectResults job on the performance machine. 

> To execute collect.sh we still need access to /shared/eclipse folder so that
> we can create the test results folder. 

Ok, so "runs on Hudson" and uses /shared/eclipse for the "data". 
Does that mean even the Mac and Windows machines are executing the "generateIndex" type functions? That's brave of you. :) 
 
> the other enhancement I have is moving ep-collectResults to releng hipp and
> call this using curl commands from the test jobs. this I am planning to work
> on tomorrow

= = = = = = = = = 

I am still unclear, if I commit your one gerrit patch 77595 will we be "broken" until you finish the rest?
Comment 7 Sravan Kumar Lakkimsetti CLA 2016-07-21 02:22:21 EDT
(In reply to David Williams from comment #6)
> (In reply to Sravan Kumar Lakkimsetti from comment #5)
> > (In reply to David Williams from comment #4)
> > > (In reply to David Williams from comment #3)
> > > 
> > > I should also mention, part of the reason this "worked" before is that the
> > > files that did the heavy lifting were already on the build server. 
> > > 
> > > Remember, some days you will not get test results in a nice and neat order. 
> > > 
> > > You might get unit tests for Mac from an I-build, for example, then Windows
> > > for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed
> > > up". 
> > > 
> > > And each of those "streams" *might* at times have different versions of the
> > > files that "process" the results correctly. 
> > > 
> > > None of this is probably new news to you, I am just confused how the changes
> > > are proposed to work. 
> > > 
> > > Are they supposed to work entirely from the Hudson machine, and from there
> > > be uploaded to "downloads"? Or do they still have to go back to the build
> > > machine for processing.
> > 
> > My idea here is to call collect.sh with the command line options in the job
> > ep-collectResults.
> > 
> > triggeringJob=$JOB_NAME
> > triggeringBuildNumber=$BUILD_NUMBER
> > buildId=$buildId
> > eclipseStream=$eclipseStream
> > EBUILDER_HASH=$EBUILDER_HASH
> > 
> > this way we have control which test results we are promoting. 
> > current code requires an intermediate file with the same command line
> > options. 
> > 
> > Current behaviour is ep-collectResults creates a intermediate file in test
> > results queue with above command line options
> > 
> > the job  eclipse.releng.checkAndCollectTestResults checks the queue and
> > calls collect.sh with the above command line options. 
> > 
> > My idea is to call the collect.sh directly instead of the intermediate files
> > and cron jobs.
> 
> I guess this is what I was confused about. Where are you going to call
> collect.sh from? 
I want to call collect.sh from hudson job ep-collectResults.
> 
> At the end of each test? Note: currently for performance tests, we do (in
> concept) call it at the end of each job since that machine is restricted to
> 1 executor, by design. That is, we do not use the ep-collectResults job on
> the performance machine. 
> 
> > To execute collect.sh we still need access to /shared/eclipse folder so that
> > we can create the test results folder. 
> 
> Ok, so "runs on Hudson" and uses /shared/eclipse for the "data". 
> Does that mean even the Mac and Windows machines are executing the
> "generateIndex" type functions? That's brave of you. :) 
collect.sh will run from hudson in ep-collectResults job. So the Mac and Windows test machines will not get involved.
>  
> > the other enhancement I have is moving ep-collectResults to releng hipp and
> > call this using curl commands from the test jobs. this I am planning to work
> > on tomorrow
> 
> = = = = = = = = = 
> 
> I am still unclear, if I commit your one gerrit patch 77595 will we be
> "broken" until you finish the rest?

It wont break I modified testdataCronjob.sh also so that it won't break
Comment 9 David Williams CLA 2016-07-21 09:27:24 EDT
Thanks for the extra explanations.

I can't say I understand you plan completely, but sounds "close enough" to give it a go. 

I've merged your change into 'master'. 

Thanks.
Comment 10 Sravan Kumar Lakkimsetti CLA 2016-07-25 05:57:43 EDT
Here is the complete solution used.

Created a new ep-collectResults job on releng hipp with a quiet time if 2 minutes(to allow the the results to be copied to correct folders)
Changed the ep-collectResults job on shared to call ep-collectResults in releng through curl command(this is still used since curl is not available on windows by default).
Changed the Test jobs of linux and Mac to call ep-collcetResults on releng hipp directly. 

The ep-collectResults Job collect results from the hudson jobs and populates them to download page. 

after this change we donot need the "eclipse.releng.checkAndCollectTestResults" job. So it is disabled. also we donot write to test queue any more
Comment 11 David Williams CLA 2016-07-25 17:10:08 EDT
Now that this is fixed, I have removed the directory 
/shared/eclipse/testjobqueue

Actually, I renamed it to testjobqueueOLD because it had many "data files" in it that *might* need to be examined (but probably not). 

There were 6 from 7/23 that were never "processed". I assume this was after the main fixes, but before everything "turned off". 

There were 13 from 7/21 and 7/22 that resulted in "ERROR". I assume this was while the change was in the process of being made? 

Nothiong appears from 7/24 
= = = = = =

If the directory is not recreated and not data files show up there, then I'll assume we are done and no further cleanup (nor investigation) is needed.
Comment 12 David Williams CLA 2016-07-25 17:20:56 EDT
Also, I deleted the ep-collectResults job on the performance machine. 

In most cases I would have just "disabled it", but the last time that job ran was 
Dec 18, 2014 4:22:20 AM
so I think we can safely say we do not need it. :) 

= = = = = = 

On the Platform HIPP, there are two jobs that mention "collect results". 
Since your name is associated with one of them, I left them alone. Not sure what, if anything, you are using these jobs for. 

trigger-ep-collectResults
Sravan-ep-collectResults
trigger-SravancollectJob

But, one of them, trigger-ep-collectResults, ran recently (e.g "5 hours ago") so it must be "in use"? 

The ones with your name: one has a generic description that sounds old, the other has no description. I suggest if you make "experimental jobs" for some reason that a) first choice is to do those on your local test instance and not clutter the production machines, but b) if you can not do that, then at least add a brief "description" so others might have an idea of what they are for and/or when they can be disabled or removed.
Comment 13 Sravan Kumar Lakkimsetti CLA 2016-07-28 03:03:53 EDT
(In reply to David Williams from comment #12)
> Also, I deleted the ep-collectResults job on the performance machine. 
> 
> In most cases I would have just "disabled it", but the last time that job
> ran was 
> Dec 18, 2014 4:22:20 AM
> so I think we can safely say we do not need it. :) 
> 
> = = = = = = 
> 
> On the Platform HIPP, there are two jobs that mention "collect results". 
> Since your name is associated with one of them, I left them alone. Not sure
> what, if anything, you are using these jobs for. 
> 
> trigger-ep-collectResults
> Sravan-ep-collectResults
> trigger-SravancollectJob
> 
> But, one of them, trigger-ep-collectResults, ran recently (e.g "5 hours
> ago") so it must be "in use"? 
> 
> The ones with your name: one has a generic description that sounds old, the
> other has no description. I suggest if you make "experimental jobs" for some
> reason that a) first choice is to do those on your local test instance and
> not clutter the production machines, but b) if you can not do that, then at
> least add a brief "description" so others might have an idea of what they
> are for and/or when they can be disabled or removed.

trigger-ep-collectResults is in use from here we call the collectJob available in releng hipp. This is there to avoid duplication of curl commands in each of the test jobs.

The remaining ones I created for testing.I removed them now.
Comment 14 Sravan Kumar Lakkimsetti CLA 2016-08-31 12:23:10 EDT
This has been fixed in 4.7 M1. We use same jobs for 4.6.1. no need for backport