Community
Participate
Working Groups
When moving jobs to Releng HIPP we duplicated the same logic we used when "e4Build" was doing the buiblds. Namely, the test Hudson instance write a "data file" to /shared/eclipse/testjobdata Then a "cronjob" checks for some new data in that queue (like every 10 minutes) and if found, reads that data, which provides enough information to fetch the test results, process and summarize them, and upload them to the proper (matching) build location. Now that we have a "Releng Hipp" instead of a cron job, we can have the "Test Hudson's" trigger a job on the Releng HIPP (but like we trigger the jobs on the test Hudsons to begin with ... sending a "curl post" request to the Releng HIPP to the right job, with the right data wrapped in JSON arguments. This is more efficient since the current cron job must run fairly frequently, but does nothing 95% of the time. So would be better to be "event driven" instead of "poll and loop" with a cron job.
New Gerrit change created: https://git.eclipse.org/r/77595
this is two stage fix. currently the file is taken as input for collect.sh. this needs to change to accept command line arguments. Then change the ep-collectResults job to call collect.sh directly, without having a file generated and running another cronjob. The patch attached fixes the first part accepting the command line arguments. Once this goes in I will make change to the ep-collectResults job. The command will be 1. clone utilities $ <path to collect.sh>/collect.sh $triggeringJob $triggeringBuildNumber $buildId $eclipseStream $EBUILDER_HASH
(In reply to Sravan Kumar Lakkimsetti from comment #2) > this is two stage fix. currently the file is taken as input for collect.sh. > this needs to change to accept command line arguments. Then change the > ep-collectResults job to call collect.sh directly, without having a file > generated and running another cronjob. > > The patch attached fixes the first part accepting the command line > arguments. Once this goes in I will make change to the ep-collectResults job. > > The command will be > 1. clone utilities > > $ <path to collect.sh>/collect.sh $triggeringJob $triggeringBuildNumber > $buildId $eclipseStream $EBUILDER_HASH Can you explain the logic or "workflow" a bit more? Or, can you say if you have tested this locally? Previously, when "collect.sh" was running on the build machine (under /shared/eclipse somewhere) then the "output" of triggeringJob and triggeringBuildBumber eventually went to somewhere such as /shared/eclipse/builds/4I/sitedir/eclipse/downloads/drops/${BUILDID}/testResults So now if collect.sh is running on Hudson, do you need to first "pull" those results from triggering job? and then "push" (copy) them to /shared/eclipse.... To ask the question another way, I am not sure the "zip" file of the results literally exists until it is requested. It might, I am just not sure. If it does, then a "copy" would work, but if it doesn't it seems like a "pull" of just the zip will be required first, and then it copied somewhere. So, I am just curious if you have worked with this locally enough to know that it works. = = = = = Changing the "input" to collect from a file to command line arguments seems like a 50/50 sort of thing -- that is, does not harm, but would not literally be required. Am I seeing that wrong?
(In reply to David Williams from comment #3) I should also mention, part of the reason this "worked" before is that the files that did the heavy lifting were already on the build server. Remember, some days you will not get test results in a nice and neat order. You might get unit tests for Mac from an I-build, for example, then Windows for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed up". And each of those "streams" *might* at times have different versions of the files that "process" the results correctly. None of this is probably new news to you, I am just confused how the changes are proposed to work. Are they supposed to work entirely from the Hudson machine, and from there be uploaded to "downloads"? Or do they still have to go back to the build machine for processing.
(In reply to David Williams from comment #4) > (In reply to David Williams from comment #3) > > I should also mention, part of the reason this "worked" before is that the > files that did the heavy lifting were already on the build server. > > Remember, some days you will not get test results in a nice and neat order. > > You might get unit tests for Mac from an I-build, for example, then Windows > for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed > up". > > And each of those "streams" *might* at times have different versions of the > files that "process" the results correctly. > > None of this is probably new news to you, I am just confused how the changes > are proposed to work. > > Are they supposed to work entirely from the Hudson machine, and from there > be uploaded to "downloads"? Or do they still have to go back to the build > machine for processing. My idea here is to call collect.sh with the command line options in the job ep-collectResults. triggeringJob=$JOB_NAME triggeringBuildNumber=$BUILD_NUMBER buildId=$buildId eclipseStream=$eclipseStream EBUILDER_HASH=$EBUILDER_HASH this way we have control which test results we are promoting. current code requires an intermediate file with the same command line options. Current behaviour is ep-collectResults creates a intermediate file in test results queue with above command line options the job eclipse.releng.checkAndCollectTestResults checks the queue and calls collect.sh with the above command line options. My idea is to call the collect.sh directly instead of the intermediate files and cron jobs. To execute collect.sh we still need access to /shared/eclipse folder so that we can create the test results folder. the other enhancement I have is moving ep-collectResults to releng hipp and call this using curl commands from the test jobs. this I am planning to work on tomorrow
(In reply to Sravan Kumar Lakkimsetti from comment #5) > (In reply to David Williams from comment #4) > > (In reply to David Williams from comment #3) > > > > I should also mention, part of the reason this "worked" before is that the > > files that did the heavy lifting were already on the build server. > > > > Remember, some days you will not get test results in a nice and neat order. > > > > You might get unit tests for Mac from an I-build, for example, then Windows > > for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed > > up". > > > > And each of those "streams" *might* at times have different versions of the > > files that "process" the results correctly. > > > > None of this is probably new news to you, I am just confused how the changes > > are proposed to work. > > > > Are they supposed to work entirely from the Hudson machine, and from there > > be uploaded to "downloads"? Or do they still have to go back to the build > > machine for processing. > > My idea here is to call collect.sh with the command line options in the job > ep-collectResults. > > triggeringJob=$JOB_NAME > triggeringBuildNumber=$BUILD_NUMBER > buildId=$buildId > eclipseStream=$eclipseStream > EBUILDER_HASH=$EBUILDER_HASH > > this way we have control which test results we are promoting. > current code requires an intermediate file with the same command line > options. > > Current behaviour is ep-collectResults creates a intermediate file in test > results queue with above command line options > > the job eclipse.releng.checkAndCollectTestResults checks the queue and > calls collect.sh with the above command line options. > > My idea is to call the collect.sh directly instead of the intermediate files > and cron jobs. I guess this is what I was confused about. Where are you going to call collect.sh from? At the end of each test? Note: currently for performance tests, we do (in concept) call it at the end of each job since that machine is restricted to 1 executor, by design. That is, we do not use the ep-collectResults job on the performance machine. > To execute collect.sh we still need access to /shared/eclipse folder so that > we can create the test results folder. Ok, so "runs on Hudson" and uses /shared/eclipse for the "data". Does that mean even the Mac and Windows machines are executing the "generateIndex" type functions? That's brave of you. :) > the other enhancement I have is moving ep-collectResults to releng hipp and > call this using curl commands from the test jobs. this I am planning to work > on tomorrow = = = = = = = = = I am still unclear, if I commit your one gerrit patch 77595 will we be "broken" until you finish the rest?
(In reply to David Williams from comment #6) > (In reply to Sravan Kumar Lakkimsetti from comment #5) > > (In reply to David Williams from comment #4) > > > (In reply to David Williams from comment #3) > > > > > > I should also mention, part of the reason this "worked" before is that the > > > files that did the heavy lifting were already on the build server. > > > > > > Remember, some days you will not get test results in a nice and neat order. > > > > > > You might get unit tests for Mac from an I-build, for example, then Windows > > > for an N-build, the "performance" from the I-build, etc. ie.e. "all mixed > > > up". > > > > > > And each of those "streams" *might* at times have different versions of the > > > files that "process" the results correctly. > > > > > > None of this is probably new news to you, I am just confused how the changes > > > are proposed to work. > > > > > > Are they supposed to work entirely from the Hudson machine, and from there > > > be uploaded to "downloads"? Or do they still have to go back to the build > > > machine for processing. > > > > My idea here is to call collect.sh with the command line options in the job > > ep-collectResults. > > > > triggeringJob=$JOB_NAME > > triggeringBuildNumber=$BUILD_NUMBER > > buildId=$buildId > > eclipseStream=$eclipseStream > > EBUILDER_HASH=$EBUILDER_HASH > > > > this way we have control which test results we are promoting. > > current code requires an intermediate file with the same command line > > options. > > > > Current behaviour is ep-collectResults creates a intermediate file in test > > results queue with above command line options > > > > the job eclipse.releng.checkAndCollectTestResults checks the queue and > > calls collect.sh with the above command line options. > > > > My idea is to call the collect.sh directly instead of the intermediate files > > and cron jobs. > > I guess this is what I was confused about. Where are you going to call > collect.sh from? I want to call collect.sh from hudson job ep-collectResults. > > At the end of each test? Note: currently for performance tests, we do (in > concept) call it at the end of each job since that machine is restricted to > 1 executor, by design. That is, we do not use the ep-collectResults job on > the performance machine. > > > To execute collect.sh we still need access to /shared/eclipse folder so that > > we can create the test results folder. > > Ok, so "runs on Hudson" and uses /shared/eclipse for the "data". > Does that mean even the Mac and Windows machines are executing the > "generateIndex" type functions? That's brave of you. :) collect.sh will run from hudson in ep-collectResults job. So the Mac and Windows test machines will not get involved. > > > the other enhancement I have is moving ep-collectResults to releng hipp and > > call this using curl commands from the test jobs. this I am planning to work > > on tomorrow > > = = = = = = = = = > > I am still unclear, if I commit your one gerrit patch 77595 will we be > "broken" until you finish the rest? It wont break I modified testdataCronjob.sh also so that it won't break
Gerrit change https://git.eclipse.org/r/77595 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.releng.aggregator.git/commit/?id=a7b557e20f2767d005e882ca4564a5abfc2c1ff9
Thanks for the extra explanations. I can't say I understand you plan completely, but sounds "close enough" to give it a go. I've merged your change into 'master'. Thanks.
Here is the complete solution used. Created a new ep-collectResults job on releng hipp with a quiet time if 2 minutes(to allow the the results to be copied to correct folders) Changed the ep-collectResults job on shared to call ep-collectResults in releng through curl command(this is still used since curl is not available on windows by default). Changed the Test jobs of linux and Mac to call ep-collcetResults on releng hipp directly. The ep-collectResults Job collect results from the hudson jobs and populates them to download page. after this change we donot need the "eclipse.releng.checkAndCollectTestResults" job. So it is disabled. also we donot write to test queue any more
Now that this is fixed, I have removed the directory /shared/eclipse/testjobqueue Actually, I renamed it to testjobqueueOLD because it had many "data files" in it that *might* need to be examined (but probably not). There were 6 from 7/23 that were never "processed". I assume this was after the main fixes, but before everything "turned off". There were 13 from 7/21 and 7/22 that resulted in "ERROR". I assume this was while the change was in the process of being made? Nothiong appears from 7/24 = = = = = = If the directory is not recreated and not data files show up there, then I'll assume we are done and no further cleanup (nor investigation) is needed.
Also, I deleted the ep-collectResults job on the performance machine. In most cases I would have just "disabled it", but the last time that job ran was Dec 18, 2014 4:22:20 AM so I think we can safely say we do not need it. :) = = = = = = On the Platform HIPP, there are two jobs that mention "collect results". Since your name is associated with one of them, I left them alone. Not sure what, if anything, you are using these jobs for. trigger-ep-collectResults Sravan-ep-collectResults trigger-SravancollectJob But, one of them, trigger-ep-collectResults, ran recently (e.g "5 hours ago") so it must be "in use"? The ones with your name: one has a generic description that sounds old, the other has no description. I suggest if you make "experimental jobs" for some reason that a) first choice is to do those on your local test instance and not clutter the production machines, but b) if you can not do that, then at least add a brief "description" so others might have an idea of what they are for and/or when they can be disabled or removed.
(In reply to David Williams from comment #12) > Also, I deleted the ep-collectResults job on the performance machine. > > In most cases I would have just "disabled it", but the last time that job > ran was > Dec 18, 2014 4:22:20 AM > so I think we can safely say we do not need it. :) > > = = = = = = > > On the Platform HIPP, there are two jobs that mention "collect results". > Since your name is associated with one of them, I left them alone. Not sure > what, if anything, you are using these jobs for. > > trigger-ep-collectResults > Sravan-ep-collectResults > trigger-SravancollectJob > > But, one of them, trigger-ep-collectResults, ran recently (e.g "5 hours > ago") so it must be "in use"? > > The ones with your name: one has a generic description that sounds old, the > other has no description. I suggest if you make "experimental jobs" for some > reason that a) first choice is to do those on your local test instance and > not clutter the production machines, but b) if you can not do that, then at > least add a brief "description" so others might have an idea of what they > are for and/or when they can be disabled or removed. trigger-ep-collectResults is in use from here we call the collectJob available in releng hipp. This is there to avoid duplication of curl commands in each of the test jobs. The remaining ones I created for testing.I removed them now.
This has been fixed in 4.7 M1. We use same jobs for 4.6.1. no need for backport