Community
Participate
Working Groups
We're getting failures on the Gerrit and integration runs and it isn't clear what the cause is. In all of these cases, the tests ran properly against local Gerrit (2.5.4) instances as well as on previous Hudson tests for the same change sets. Please see: https://git.eclipse.org/r/#/c/13877/ <-- Builds off of current master https://git.eclipse.org/r/#/c/13878/ <-- Builds off of last change before significant change to binary format. All of these seem to be possibly due to time out issues. Also, note failures on e.g. https://git.eclipse.org/r/#/c/13857/ that could also possibly be timeout related.
Tomek, are you seeing this failure on your local machine by chance? (If you don't have local Mylyn gerrit instance set up, let me know and I'll dig up the bug that explains.)
MIles, please fix the tests. You were the only one submitting changes lately.
Steffen, see the reviews above. These are tests for changes that previously passed, so it looks like something is going wrong with the test bed itself. I'll go back even further to verify this is the case.
(In reply to comment #2) > You were the only one submitting changes lately. Note that Tomek's tsts are failing too. ;) https://git.eclipse.org/r/#/c/13886/
Steffen, I think the issue might be simply that http://mylyn.org/gerrit-2.4.2/ is down and or running very slowly. I can't reach it w/ a browser, but I can 2.5.4. Can we get a reboot on it or something? Note also integration tests: https://hudson.eclipse.org/hudson/view/Mylyn/job/mylyn-reviews-nightly/2281/ This is failing even though previous two tests didn't have significant changes.
Thanks for looking into it. Sam, Tomek, can you restart the affected process? You can also re-run the provisioning job at http://ci.mylyn.org/view/Provisioning/job/mylyn-puppet-gerrit/ which will restart all Gerrit servers.
(In reply to comment #6) > You can also re-run the provisioning job Done. I'm retriggering all aborted builds now as 2.4.2 is responsive again: * https://git.eclipse.org/r/#/c/13886/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/612/ => 616 * https://git.eclipse.org/r/#/c/13877/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/608/ => 617 * https://git.eclipse.org/r/#/c/13878/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/609/ => 618 * https://git.eclipse.org/r/#/c/13857/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/610/ => 619 When running locally, I still get the failures in PatchSetRemoteFactoryTest, first time seen in https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/607/
(In reply to comment #7) > first time seen in > https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/607/ Should read: https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/604/ or "last time seen" :)
From looking at the very long list of reviews it seems that we need to add clean up logic to the tests. Otherwise the server will get slow and consume too much memory over time.
(In reply to comment #9) > From looking at the very long list of reviews it seems that we need to add clean > up logic to the tests. Otherwise the server will get slow and consume too much > memory over time. Yes I was wondering the same thing. I'm not sure how to delete reviews from API, actually.. Another thought is to simply nuke the vm periodically.
Darn, now we're getting a bad gateway.. https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/621/ :(
I cleared and re-provisioned the 2.4.2 and 2.5.4 repository as a one time measure so repositories were down for a moment. Tests should now be much faster again.
Thanks Steffen. Two questions for you and Tomek... How can we resubmit builds w/o "faking" a commit and push? For some reason the cancel build options isn't enabled for me. Is there any way to cancel running builds?
Build times are now down to 7 mins form a high of 24 mins. Thanks Steffen, this makes a big difference in productivity! Let me know if anyonre has any quick ideas about how to delete those existing reviews programmatically, otherwise I'll try to figure it out when we get these other issues squashed.
I'm not sure what to do with this one. I'm stumped on how to delete tests. I don't think this one should block 2.0.1 -- it was only there because the tests were failing altogether -- so I'm de-escalating, and removing milestone. Hope that's ok..
It's...back.. https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/660/testReport/org.eclipse.mylyn.gerrit.tests.ui/GerritUrlHandlerTest/testOpenUrl/ So we need to solve the issue of deleting these old reviews but I have no idea how to do that programmatically, or indeed if that's possible. Tomasz is away this week, so perhaps it would be helpful for me to get admin access to this machine so that I can destroy the vm and bring up a new one when this occurs.
(In reply to comment #16) > Tomasz is away this week, so perhaps it would be helpful for me to get admin > access to this machine so that I can destroy the vm and bring up a new one > when this occurs. Sorry, but no. It's not a VM and I don't want anyone who doesn't know the details of setup go in and destroy stuff. We'll have to wait for either Sam or Tomek to help out who have admin access.
(In reply to comment #17) > (In reply to comment #16) > > Tomasz is away this week, so perhaps it would be helpful for me to get admin > > access to this machine so that I can destroy the vm and bring up a new one > > when this occurs. > > Sorry, but no. It's not a VM and I don't want anyone who doesn't know the > details of setup go in and destroy stuff. Ok, I assumed the Gerrit Server was a Vagrant instance just like the local ones we use for testing. Perhaps we need to increase the time-out for that URL test to begin with. > We'll have to wait for either Sam or > Tomek to help out who have admin access.
(In reply to comment #18) > Ok, I assumed the Gerrit Server was a Vagrant instance just like the local ones > we use for testing. Perhaps we need to increase the time-out for that URL test > to begin with. It's provisioned with puppet but it's a hosted Linux vServer. I don't think Gerrit actually supports deleting of reviews so we maybe best off clearing and re-provisioning the server every once in a while. Miles, if you want to move this forward feel free to push a review for the puppet scripts that supports clearing Gerrit instances prior to re-provisioning them. The Bugzilla scripts do something similar (by checking for a magic file) since Bugzilla doesn't support task deletion either.
(In reply to comment #19) > Miles, if you want to move this forward feel free to push a review for the > puppet scripts that supports clearing Gerrit instances prior to re-provisioning > them. The Bugzilla scripts do something similar (by checking for a magic file) > since Bugzilla doesn't support task deletion either. Ouch, hot potato. ;D Just a note that I'm not likely to get time to get up to speed on puppet etc. in the near future so perhaps this is something that Tomek can take a look at when he returns. Catch!
(In reply to comment #20) > Ouch, hot potato. ;D Just a note that I'm not likely to get time to get up > to speed on puppet etc. in the near future so perhaps this is something that > Tomek can take a look at when he returns. Catch! I don't mind getting more familiar with puppet and Mylyn tests infrastructure in general, but I think I should focus on bug 395059 now.
Yep, makes sense. I don't think there is a particular rush, esp. given that you have the keys to the server and can give it a kick if you need to.
We should get this fixed as soon as possible since tests and hence builds will continue to fail. We shouldn't be publishing builds until this is fixed (even if the temporary solution is to clean up the database and restart the service again).
(In reply to comment #23) > We should get this fixed as soon as possible since tests and hence builds will > continue to fail. We shouldn't be publishing builds until this is fixed (even if > the temporary solution is to clean up the database and restart the service > again). Fair point. I can play around w/ Puppet a bit as well, as I need to get up to speed on it too.
One strategy would be to delete the project. That seems possible, but kind of heinous: http://code.google.com/p/gerrit/issues/detail?id=349
It looks that GerritUrlHandlerTest.testOpenUrl() fails, because it blocks while calling RemoteUiService.modelExec(Runnable, boolean). Once the call is non-blocking you get an error msg saying "Problem while retrieving Gerrit review.: Not Found null," which means that the connector is unable open a review from Gerrit 2.6 (see bug 395059).
(In reply to comment #26) > It looks that GerritUrlHandlerTest.testOpenUrl() fails, because it blocks while > calling RemoteUiService.modelExec(Runnable, boolean). Once the call is > non-blocking you get an error msg saying "Problem while retrieving Gerrit > review.: Not Found null," which means that the connector is unable open a review > from Gerrit 2.6 (see bug 395059). Ok, so what I'm getting from this is that we have a failure that is surfacing for 2.6 and that failure is legitimate. But as we don't support 2.6 yet, why is this test active, and more importantly, how can we be passing all of the other tests? ;)
(In reply to comment #27) > But as we don't support 2.6 yet, why is this test active, > and more importantly, how can we be passing all of the other tests? ;) The test uses a hard-coded [1] fixture pointing to git.eclipse.org [2]. [1] https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.reviews.git/tree/org.eclipse.mylyn.gerrit.tests/src/org/eclipse/mylyn/gerrit/tests/ui/GerritUrlHandlerTest.java#n52 [2] https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.reviews.git/tree/org.eclipse.mylyn.gerrit.tests/src/org/eclipse/mylyn/gerrit/tests/support/GerritFixture.java#n24
I'd be helpful to provision a Gerrit instance on the test server that didn't rely on HTTP authentication but also allowed anonymous access so we don't have to rely on Eclipse.org for testing.
(In reply to comment #29) > I'd be helpful to provision a Gerrit instance on the test server that didn't > rely on HTTP authentication but also allowed anonymous access so we don't have > to rely on Eclipse.org for testing. +1, especially if we are to properly test for changes in authorization configuration from anon and back again. See https://git.eclipse.org/r/#/c/14294/6/org.eclipse.mylyn.gerrit.tests/src/org/eclipse/mylyn/gerrit/tests/core/client/GerritClientTest.java.
Shouldn't we mark the test as ignored for now and re-enable it on the first occasion? With the test failing Hudson is going to -1 all changes, making it hard to spot a new test failure.
(In reply to comment #31) > Shouldn't we mark the test as ignored for now and re-enable it on the first > occasion? With the test failing Hudson is going to -1 all changes, making it > hard to spot a new test failure. Please feel free to do that and schedule a bug to re-enable it.
(In reply to comment #32) > (In reply to comment #31) > > Shouldn't we mark the test as ignored for now and re-enable it on the first > > occasion? With the test failing Hudson is going to -1 all changes, making it > > hard to spot a new test failure. > > Please feel free to do that and schedule a bug to re-enable it. +1, I was almost afraid to suggest that. :)
(In reply to comment #32) > Please feel free to do that and schedule a bug to re-enable it. It's bug 412620.
The recent failures in testPerformQueryAnonymous (the test is hanging forever) can have two reasons: * cloning and push a new review takes very long time, see GerritHarness#ensureOneReviewExists() * the list of reviews is huge, making the query for all open reviews run very long In https://git.eclipse.org/r/#/c/15135/ I'm trying to fix them both by cloning/pushing only when there's no review available and by narrowing the query to return a single review (instead of all).
(In reply to comment #35) > In https://git.eclipse.org/r/#/c/15135/ I'm trying to fix them both by > cloning/pushing only when there's no review available and by narrowing the query > to return a single review (instead of all). Apparently it wasn't enough, since https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/878/console timed out not even getting to testPerformQueryAnonymous. FWIW, the whole (passing) build takes now ~23 mins.
I've just finished running a profiler on Gerrit tests to see if there is anything that can be done on our side to make the tests run faster. Over 3/4 of the time threads are sleeping, waiting for a server response. Which kind of confirms what Miles noticed in comment 14, where tests without that extra IO overhead were 3 times faster.
There is a Gerrit plugin [1] for deleting projects which we could run as part of the provisioning job. [1] https://gerrit.googlesource.com/plugins/delete-project/
(In reply to comment #38) > There is a Gerrit plugin [1] for deleting projects which we could run as > part of the provisioning job. > > [1] https://gerrit.googlesource.com/plugins/delete-project/ Yep, see comment 25. :)
A work-in-progress review with delete-project plugin: https://git.eclipse.org/r/#/c/15295/
We could also extend the tests to reuse existing reviews in cases where the review is not modified. Assuming that's the vast majority of cases it would slow down the problem.
Deleting projects could be an interesting option. I would require tests to first create a unique project though to avoid conflicts with tests running in parallel. I pushed a simple work-around here which extends the puppet scripts with an option to force re-provisining. We could simply run the job once a night for now: https://git.eclipse.org/r/#/c/15354/ .
(In reply to comment #41) > We could also extend the tests to reuse existing reviews in cases where the > review is not modified. I like the idea, but currently the tests don't do much except for pushing a review and then reading it. As they continue to cover more scenarios (like aborting, submitting, reviewing, etc) I can imagine a situation when we spend more time looking for a useful review rather than executing the test itself. (In reply to comment #42) > I would require tests to first create a unique project though > to avoid conflicts with tests running in parallel. Nice, that should help.
(In reply to comment #43) > (In reply to comment #41) > > We could also extend the tests to reuse existing reviews in cases where the > > review is not modified. > > I like the idea, but currently the tests don't do much except for pushing a > review and then reading it. As they continue to cover more scenarios (like > aborting, submitting, reviewing, etc) I can imagine a situation when we spend > more time looking for a useful review rather than executing the test itself. I would cache that result and share it between tests so you only do the lookup once for a pre-defined review. Even if some tests still need to create separate reviews to set certain properties it could still help (but it's certainly not a general solution to the problem).
(In reply to comment #41) > We could also extend the tests to reuse existing reviews in cases where the > review is not modified. This reminded me about https://git.eclipse.org/r/#/c/15135/ from comment 35. Steffen's work-around from comment 42 worked great, so I don't think the issue is worth investing more time at the moment. I will try to tackle some of the remaining bugs for 2.0.1.
Closing as stale. The test server is reprovisioned every weekend so this is not an issue anymore.