Bug 411052 - tests should delete created reviews in tear down (was: tests failing on Hudson when run against Gerrit 2.4.2)
Summary: tests should delete created reviews in tear down (was: tests failing on Hudso...
Status: RESOLVED WONTFIX
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: Other Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-18 14:15 EDT by Miles Parker CLA
Modified: 2016-01-05 18:26 EST (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Miles Parker CLA 2013-06-18 14:15:01 EDT
We're getting failures on the Gerrit and integration runs and it isn't clear what the cause is. In all of these cases, the tests ran properly against local Gerrit (2.5.4) instances as well as on previous Hudson tests for the same change sets. Please see:

https://git.eclipse.org/r/#/c/13877/  <-- Builds off of current master
https://git.eclipse.org/r/#/c/13878/  <-- Builds off of last change before significant change to binary format.

All of these seem to be possibly due to time out issues.

Also, note failures on e.g. https://git.eclipse.org/r/#/c/13857/  that could also possibly be timeout related.
Comment 1 Miles Parker CLA 2013-06-18 17:06:34 EDT
Tomek, are you seeing this failure on your local machine by chance? (If you don't have local Mylyn gerrit instance set up, let me know and I'll dig up the bug that explains.)
Comment 2 Steffen Pingel CLA 2013-06-18 17:22:54 EDT
MIles, please fix the tests. You were the only one submitting changes lately.
Comment 3 Miles Parker CLA 2013-06-18 17:35:15 EDT
Steffen, see the reviews above. These are tests for changes that previously passed, so it looks like something is going wrong with the test bed itself. I'll go back even further to verify this is the case.
Comment 4 Miles Parker CLA 2013-06-18 17:41:23 EDT
(In reply to comment #2)
> You were the only one submitting changes lately.

Note that Tomek's tsts are failing too. ;) https://git.eclipse.org/r/#/c/13886/
Comment 5 Miles Parker CLA 2013-06-18 19:02:34 EDT
Steffen, I think the issue might be simply that http://mylyn.org/gerrit-2.4.2/ is down and or running very slowly. I can't reach it w/ a browser, but I can 2.5.4. Can we get a reboot on it or something?

Note also integration tests:

https://hudson.eclipse.org/hudson/view/Mylyn/job/mylyn-reviews-nightly/2281/

This is failing even though previous two tests didn't have significant changes.
Comment 6 Steffen Pingel CLA 2013-06-19 04:40:20 EDT
Thanks for looking into it. Sam, Tomek, can you restart the affected process? You can also re-run the provisioning job at http://ci.mylyn.org/view/Provisioning/job/mylyn-puppet-gerrit/ which will restart all Gerrit servers.
Comment 7 Tomasz Zarna CLA 2013-06-19 06:33:30 EDT
(In reply to comment #6)
> You can also re-run the provisioning job

Done. I'm retriggering all aborted builds now as 2.4.2 is responsive again:

* https://git.eclipse.org/r/#/c/13886/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/612/ => 616
* https://git.eclipse.org/r/#/c/13877/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/608/ => 617
* https://git.eclipse.org/r/#/c/13878/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/609/ => 618
* https://git.eclipse.org/r/#/c/13857/ : https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/610/ => 619

When running locally, I still get the failures in PatchSetRemoteFactoryTest, first time seen in https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/607/
Comment 8 Tomasz Zarna CLA 2013-06-19 06:37:03 EDT
(In reply to comment #7)
> first time seen in
> https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/607/

Should read: https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/604/ or "last time seen" :)
Comment 9 Steffen Pingel CLA 2013-06-19 12:06:43 EDT
From looking at the very long list of reviews it seems that we need to add clean up logic to the tests. Otherwise the server will get slow and consume too much memory over time.
Comment 10 Miles Parker CLA 2013-06-19 12:14:40 EDT
(In reply to comment #9)
> From looking at the very long list of reviews it seems that we need to add clean
> up logic to the tests. Otherwise the server will get slow and consume too much
> memory over time.

Yes I was wondering the same thing. I'm not sure how to delete reviews from API, actually.. Another thought is to simply nuke the vm periodically.
Comment 11 Miles Parker CLA 2013-06-19 12:52:16 EDT
Darn, now we're getting a bad gateway.. https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/621/ :(
Comment 12 Steffen Pingel CLA 2013-06-19 13:10:34 EDT
I cleared and re-provisioned the 2.4.2 and 2.5.4 repository as a one time measure so repositories were down for a moment. Tests should now be much faster again.
Comment 13 Miles Parker CLA 2013-06-19 13:28:57 EDT
Thanks Steffen. Two questions for you and Tomek...

How can we resubmit builds w/o "faking" a commit and push?
For some reason the cancel build options isn't enabled for me. Is there any way to cancel running builds?
Comment 14 Miles Parker CLA 2013-06-19 14:20:48 EDT
Build times are now down to 7 mins form a high of 24 mins. Thanks Steffen, this makes a big difference in productivity! Let me know if anyonre has any quick ideas about how to delete those existing reviews programmatically, otherwise I'll try to figure it out when we get these other issues squashed.
Comment 15 Miles Parker CLA 2013-06-24 15:19:52 EDT
I'm not sure what to do with this one. I'm stumped on how to delete tests.

I don't think this one should block 2.0.1 -- it was only there because the tests were failing altogether -- so I'm de-escalating, and removing milestone. Hope that's ok..
Comment 16 Miles Parker CLA 2013-06-25 13:35:09 EDT
It's...back.. https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/660/testReport/org.eclipse.mylyn.gerrit.tests.ui/GerritUrlHandlerTest/testOpenUrl/

So we need to solve the issue of deleting these old reviews but I have no idea how to do that programmatically, or indeed if that's possible.

Tomasz is away this week, so perhaps it would be helpful for me to get admin access to this machine so that I can destroy the vm and bring up a new one when this occurs.
Comment 17 Steffen Pingel CLA 2013-06-25 13:38:18 EDT
(In reply to comment #16)
> Tomasz is away this week, so perhaps it would be helpful for me to get admin
> access to this machine so that I can destroy the vm and bring up a new one
> when this occurs.

Sorry, but no. It's not a VM and I don't want anyone who doesn't know the details of setup go in and destroy stuff. We'll have to wait for either Sam or Tomek to help out who have admin access.
Comment 18 Miles Parker CLA 2013-06-25 13:47:40 EDT
(In reply to comment #17)
> (In reply to comment #16)
> > Tomasz is away this week, so perhaps it would be helpful for me to get admin
> > access to this machine so that I can destroy the vm and bring up a new one
> > when this occurs.
> 
> Sorry, but no. It's not a VM and I don't want anyone who doesn't know the
> details of setup go in and destroy stuff.

Ok, I assumed the Gerrit Server was a Vagrant instance just like the local ones we use for testing. Perhaps we need to increase the time-out for that URL test to begin with.

 

> We'll have to wait for either Sam or
> Tomek to help out who have admin access.
Comment 19 Steffen Pingel CLA 2013-06-25 13:50:34 EDT
(In reply to comment #18)
> Ok, I assumed the Gerrit Server was a Vagrant instance just like the local ones
> we use for testing. Perhaps we need to increase the time-out for that URL test
> to begin with.

It's provisioned with puppet but it's a hosted Linux vServer.

I don't think Gerrit actually supports deleting of reviews so we maybe best off clearing and re-provisioning the server every once in a while. 

Miles, if you want to move this forward feel free to push a review for the puppet scripts that supports clearing Gerrit instances prior to re-provisioning them. The Bugzilla scripts do something similar (by checking for a magic file) since Bugzilla doesn't support task deletion either.
Comment 20 Miles Parker CLA 2013-06-26 13:55:36 EDT
(In reply to comment #19)
> Miles, if you want to move this forward feel free to push a review for the
> puppet scripts that supports clearing Gerrit instances prior to re-provisioning
> them. The Bugzilla scripts do something similar (by checking for a magic file)
> since Bugzilla doesn't support task deletion either.

Ouch, hot potato. ;D Just a note that I'm not likely to get time to get up to speed on puppet etc. in the near future so perhaps this is something that Tomek can take a look at when he returns. Catch!
Comment 21 Tomasz Zarna CLA 2013-07-02 07:00:30 EDT
(In reply to comment #20)
> Ouch, hot potato. ;D Just a note that I'm not likely to get time to get up
> to speed on puppet etc. in the near future so perhaps this is something that
> Tomek can take a look at when he returns. Catch!

I don't mind getting more familiar with puppet and Mylyn tests infrastructure in general, but I think I should focus on bug 395059 now.
Comment 22 Miles Parker CLA 2013-07-02 12:36:53 EDT
Yep, makes sense. I don't think there is a particular rush, esp. given that you have the keys to the server and can give it a kick if you need to.
Comment 23 Steffen Pingel CLA 2013-07-02 12:50:58 EDT
We should get this fixed as soon as possible since tests and hence builds will continue to fail. We shouldn't be publishing builds until this is fixed (even if the temporary solution is to clean up the database and restart the service again).
Comment 24 Miles Parker CLA 2013-07-02 16:16:17 EDT
(In reply to comment #23)
> We should get this fixed as soon as possible since tests and hence builds will
> continue to fail. We shouldn't be publishing builds until this is fixed (even if
> the temporary solution is to clean up the database and restart the service
> again).

Fair point. I can play around w/ Puppet a bit as well, as I need to get up to speed on it too.
Comment 25 Miles Parker CLA 2013-07-05 13:42:01 EDT
One strategy would be to delete the project. That seems possible, but kind of heinous:

http://code.google.com/p/gerrit/issues/detail?id=349
Comment 26 Tomasz Zarna CLA 2013-07-08 04:43:25 EDT
It looks that GerritUrlHandlerTest.testOpenUrl() fails, because it blocks while calling RemoteUiService.modelExec(Runnable, boolean). Once the call is non-blocking you get an error msg saying "Problem while retrieving Gerrit review.: Not Found null," which means that the connector is unable open a review from Gerrit 2.6 (see bug 395059).
Comment 27 Miles Parker CLA 2013-07-08 14:10:33 EDT
(In reply to comment #26)
> It looks that GerritUrlHandlerTest.testOpenUrl() fails, because it blocks while
> calling RemoteUiService.modelExec(Runnable, boolean). Once the call is
> non-blocking you get an error msg saying "Problem while retrieving Gerrit
> review.: Not Found null," which means that the connector is unable open a review
> from Gerrit 2.6 (see bug 395059).

Ok, so what I'm getting from this is that we have a failure that is surfacing for 2.6 and that failure is legitimate. But as we don't support 2.6 yet, why is this test active, and more importantly, how can we be passing all of the other tests? ;)
Comment 28 Tomasz Zarna CLA 2013-07-08 15:00:48 EDT
(In reply to comment #27)
> But as we don't support 2.6 yet, why is this test active, 
> and more importantly, how can we be passing all of the other tests? ;)

The test uses a hard-coded [1] fixture pointing to git.eclipse.org [2].

[1] https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.reviews.git/tree/org.eclipse.mylyn.gerrit.tests/src/org/eclipse/mylyn/gerrit/tests/ui/GerritUrlHandlerTest.java#n52

[2] https://git.eclipse.org/c/mylyn/org.eclipse.mylyn.reviews.git/tree/org.eclipse.mylyn.gerrit.tests/src/org/eclipse/mylyn/gerrit/tests/support/GerritFixture.java#n24
Comment 29 Steffen Pingel CLA 2013-07-09 04:34:56 EDT
I'd be helpful to provision a Gerrit instance on the test server that didn't rely on HTTP authentication but also allowed anonymous access so we don't have to rely on Eclipse.org for testing.
Comment 30 Miles Parker CLA 2013-07-09 12:35:50 EDT
(In reply to comment #29)
> I'd be helpful to provision a Gerrit instance on the test server that didn't
> rely on HTTP authentication but also allowed anonymous access so we don't have
> to rely on Eclipse.org for testing.

+1, especially if we are to properly test for changes in authorization configuration from anon and back again. See https://git.eclipse.org/r/#/c/14294/6/org.eclipse.mylyn.gerrit.tests/src/org/eclipse/mylyn/gerrit/tests/core/client/GerritClientTest.java.
Comment 31 Tomasz Zarna CLA 2013-07-09 12:42:52 EDT
Shouldn't we mark the test as ignored for now and re-enable it on the first occasion? With the test failing Hudson is going to -1 all changes, making it hard to spot a new test failure.
Comment 32 Steffen Pingel CLA 2013-07-09 13:01:42 EDT
(In reply to comment #31)
> Shouldn't we mark the test as ignored for now and re-enable it on the first
> occasion? With the test failing Hudson is going to -1 all changes, making it
> hard to spot a new test failure.

Please feel free to do that and schedule a bug to re-enable it.
Comment 33 Miles Parker CLA 2013-07-09 13:20:34 EDT
(In reply to comment #32)
> (In reply to comment #31)
> > Shouldn't we mark the test as ignored for now and re-enable it on the first
> > occasion? With the test failing Hudson is going to -1 all changes, making it
> > hard to spot a new test failure.
> 
> Please feel free to do that and schedule a bug to re-enable it.

+1, I was almost afraid to suggest that. :)
Comment 34 Tomasz Zarna CLA 2013-07-09 15:23:24 EDT
(In reply to comment #32)
> Please feel free to do that and schedule a bug to re-enable it.

It's bug 412620.
Comment 35 Tomasz Zarna CLA 2013-08-05 09:24:22 EDT
The recent failures in testPerformQueryAnonymous (the test is hanging forever) can have two reasons:
* cloning and push a new review takes very long time, see GerritHarness#ensureOneReviewExists()
* the list of reviews is huge, making the query for all open reviews run very long

In https://git.eclipse.org/r/#/c/15135/ I'm trying to fix them both by cloning/pushing only when there's no review available and by narrowing the query to return a single review (instead of all).
Comment 36 Tomasz Zarna CLA 2013-08-06 07:51:40 EDT
(In reply to comment #35)
> In https://git.eclipse.org/r/#/c/15135/ I'm trying to fix them both by
> cloning/pushing only when there's no review available and by narrowing the query
> to return a single review (instead of all).

Apparently it wasn't enough, since https://hudson.eclipse.org/sandbox/job/mylyn-reviews-gerrit/878/console timed out not even getting to testPerformQueryAnonymous.

FWIW, the whole (passing) build takes now ~23 mins.
Comment 37 Tomasz Zarna CLA 2013-08-06 09:52:07 EDT
I've just finished running a profiler on Gerrit tests to see if there is anything that can be done on our side to make the tests run faster. Over 3/4 of the time threads are sleeping, waiting for a server response. Which kind of confirms what Miles noticed in comment 14, where tests without that extra IO overhead were 3 times faster.
Comment 38 Tomasz Zarna CLA 2013-08-08 09:43:24 EDT
There is a Gerrit plugin [1] for deleting projects which we could run as part of the provisioning job.

[1] https://gerrit.googlesource.com/plugins/delete-project/
Comment 39 Miles Parker CLA 2013-08-08 12:43:38 EDT
(In reply to comment #38)
> There is a Gerrit plugin [1] for deleting projects which we could run as
> part of the provisioning job.
> 
> [1] https://gerrit.googlesource.com/plugins/delete-project/

Yep, see comment 25. :)
Comment 40 Tomasz Zarna CLA 2013-08-09 12:10:40 EDT
A work-in-progress review with delete-project plugin: https://git.eclipse.org/r/#/c/15295/
Comment 41 Steffen Pingel CLA 2013-08-11 23:02:44 EDT
We could also extend the tests to reuse existing reviews in cases where the review is not modified. Assuming that's the vast majority of cases it would slow down the problem.
Comment 42 Steffen Pingel CLA 2013-08-12 04:26:02 EDT
Deleting projects could be an interesting option. I would require tests to first create a unique project though to avoid conflicts with tests running in parallel. I pushed a simple work-around here which extends the puppet scripts with an option to force re-provisining. We could simply run the job once a night for now: https://git.eclipse.org/r/#/c/15354/ .
Comment 43 Tomasz Zarna CLA 2013-08-12 04:40:08 EDT
(In reply to comment #41)
> We could also extend the tests to reuse existing reviews in cases where the
> review is not modified.

I like the idea, but currently the tests don't do much except for pushing a review and then reading it. As they continue to cover more scenarios (like aborting, submitting, reviewing, etc) I can imagine a situation when we spend more time looking for a useful review rather than executing the test itself.

(In reply to comment #42)
> I would require tests to first create a unique project though 
> to avoid conflicts with tests running in parallel.

Nice, that should help.
Comment 44 Steffen Pingel CLA 2013-08-12 07:23:45 EDT
(In reply to comment #43)
> (In reply to comment #41)
> > We could also extend the tests to reuse existing reviews in cases where the
> > review is not modified.
> 
> I like the idea, but currently the tests don't do much except for pushing a
> review and then reading it. As they continue to cover more scenarios (like
> aborting, submitting, reviewing, etc) I can imagine a situation when we spend
> more time looking for a useful review rather than executing the test itself.

I would cache that result and share it between tests so you only do the lookup once for a pre-defined review. Even if some tests still need to create separate reviews to set certain properties it could still help (but it's certainly not a general solution to the problem).
Comment 45 Tomasz Zarna CLA 2013-08-13 11:54:44 EDT
(In reply to comment #41)
> We could also extend the tests to reuse existing reviews in cases where the
> review is not modified.

This reminded me about https://git.eclipse.org/r/#/c/15135/ from comment 35.

Steffen's work-around from comment 42 worked great, so I don't think the issue is worth investing more time at the moment. I will try to tackle some of the remaining bugs for 2.0.1.
Comment 46 Sam Davis CLA 2016-01-05 18:26:42 EST
Closing as stale. The test server is reprovisioned every weekend so this is not an issue anymore.