399699 – Minimize number of patch set loading jobs

Bug 399699 - Minimize number of patch set loading jobs

Summary: Minimize number of patch set loading jobs

Status:	RESOLVED FIXED

Alias:	None

Product:	z_Archived
Classification:	Eclipse Foundation
Component:	Mylyn (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P3 normal (vote)
Target Milestone:	2.0
Assignee:	Steffen Pingel
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (1):	399697 (view as bug list)
Depends on:
Blocks:

Reported:	2013-01-31 19:48 EST by Miles Parker
Modified:	2013-02-04 13:55 EST (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Miles Parker

2013-01-31 19:48:25 EST

Currently, the editor requests all of the patch sets for a given editor at once. Each of these are loaded in a separate jobs. If a review has a large number of patch sets, or a number of reviews are opened at the same time (such as when the workbench starts up and there are a number of reviews open), a large number of jobs can be created. This could create performance issues. (Though the bug reporter hasn't seen any.) We should consider batching all of these changes into either one or a limited number of jobs.

Note that there is an advantage to having multiple calls against the Gerrit API, as the performance of these jobs is likely highly bound to the response time of the Gerrit server; under the assumption that there was no incremental cost to "spamming" the gerrit server, the best performance would be achieved by making all of the API calls at once. So we should trade off the gains from multiple calls against the potential memory and performance costs of multiple jobs. A pooling approach would allow us to support an arbitrary number of jobs and select an optimized value perhaps surfacing a preference for maximum number of jobs to the user.

Comment 1 Miles Parker

2013-01-31 20:08:38 EST

See https://bugs.eclipse.org/bugs/show_bug.cgi?id=334967#c26 and replies for more context.

Comment 2 Sam Davis

2013-02-01 13:59:21 EST

(In reply to comment #0)> 
> Note that there is an advantage to having multiple calls against the Gerrit API,
> as the performance of these jobs is likely highly bound to the response time of
> the Gerrit server; under the assumption that there was no incremental cost to
> "spamming" the gerrit server, the best performance would be achieved by making
> all of the API calls at once.

I think it depends on your definition of best. If you mean minimizing the time to get all patch sets, then maybe, but I think a better definition would be minimizing the time to get the most recent patch set. Does Gerrit allow you to take advantage of the similarity between patch sets by only fetching the deltas between them?

Really, once a patch set has been fetched, it should be cached and never need to be fetched again. This would reduce the impact of this problem a lot. However,  if you open a review for the first time and it already has 40 patch sets, it probably doesn't make sense to fetch them all.

Comment 3 Miles Parker

2013-02-01 15:01:16 EST

(In reply to comment #2)
> (In reply to comment #0)>
> I think it depends on your definition of best. If you mean minimizing the time
> to get all patch sets, then maybe, but I think a better definition would be
> minimizing the time to get the most recent patch set. Does Gerrit allow you to
> take advantage of the similarity between patch sets by only fetching the deltas
> between them?

I'm not sure, we should revisit. If you want, perhaps create an "investigate optrimizing patch set retrieval" bug or something like that?

> Really, once a patch set has been fetched, it should be cached and never need to
> be fetched again. This would reduce the impact of this problem a lot. However,

Exactly.

> if you open a review for the first time and it already has 40 patch sets, it
> probably doesn't make sense to fetch them all.

Yeah. Those should really be demand loaded or at most loaded in a deep background job. This is something that *needs* to be taken care of for Kepler. But just a little history here to be clear on why we did things this way...

1. We needed to be able to display the items in arbitrary patch sets within navigator.
2. It didn't seem to make any sense at all to load these seperatly in navigator and editor, so editor manages that for now.
3. There wasn't any way to favor one vs. the other, though I suppose we could do these in reverse order.
4. We could write a bunch of code to optimize this, but that would end up being dead code because..
5. We couldn't really demand load them easily in navigator, because creating a notification mechanism for navigator would require tight coupling and again most of that code would end up being tossed.
6. We're going to be implementing a complete model side load for Gerrit. It should be quite easy to tune this for any load and notifcation scenario we want.

We're working out the details on this and will have a more concrete plan of attack next week.

Comment 4 Steffen Pingel

2013-02-01 17:44:41 EST

(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #0)>
> > I think it depends on your definition of best. If you mean minimizing the time
> > to get all patch sets, then maybe, but I think a better definition would be
> > minimizing the time to get the most recent patch set. Does Gerrit allow you to
> > take advantage of the similarity between patch sets by only fetching the
> deltas
> > between them?
> 
> I'm not sure, we should revisit. If you want, perhaps create an "investigate
> optrimizing patch set retrieval" bug or something like that?

Let's not spent time on pre-mature optimizations. We should consider using git to fetch the patch sets which has a bunch of optimizations of that kind anyways. 

> > if you open a review for the first time and it already has 40 patch sets, it
> > probably doesn't make sense to fetch them all.
> 
> Yeah. Those should really be demand loaded or at most loaded in a deep
> background job. This is something that *needs* to be taken care of for Kepler.
> But just a little history here to be clear on why we did things this way...
> 
> 1. We needed to be able to display the items in arbitrary patch sets within
> navigator.
> 2. It didn't seem to make any sense at all to load these seperatly in navigator
> and editor, so editor manages that for now.
> 3. There wasn't any way to favor one vs. the other, though I suppose we could do
> these in reverse order.

Yes, that sounds like a good plan and doing that in a single job. It should be reasonably straight forward to change the code accordingly.

Comment 5 Miles Parker

2013-02-01 18:07:27 EST

(In reply to comment #4)
> (In reply to comment #3)
> > 3. There wasn't any way to favor one vs. the other, though I suppose we could
> do
> > these in reverse order.
> 
> Yes, that sounds like a good plan and doing that in a single job. It should be
> reasonably straight forward to change the code accordingly.

I think it's a bit more complicated to do this as we'd still need to load the item sets even if we differ the control creation. (It probably makes sense to tackle implement bug 399697 at the same time, since we'll be tearing apart the same code.) We'd need to do something like this:

1. Create the subsection.
2. Begin loading the item sets. (But in reverse order from the section creation code!)
3. On user expand, create the controls. (As it worked before)
4. When item sets are created update the values.

If we do that, it would probably be implementing _something_ sort of like my first attempt at bug 386705. See https://git.eclipse.org/r/#/c/10057/2/org.eclipse.mylyn.gerrit.ui/src/org/eclipse/mylyn/internal/gerrit/ui/editor/PatchSetSection.java.  That is, there is a tight coordination needed between item sets and their related pieces. Or actually, we probably just want to create a class for the SubSection itself.

Just to say that it isn't that straightforward. But OTOH, we may want to do something *like* this when we implement the model stuff anyway. I'm just still not sure it makes sense to tackle this before then..

Comment 6 Steffen Pingel

2013-02-01 18:19:34 EST

I'm not even sure that I agree with the basic assumption and all patch sets need to be downloaded eagerly. What if we restored the original behavior and only downloaded the open patch set which would default to the latest patch set? We could add an expand all button to easily fetch all patch sets as needed.

Comment 7 Miles Parker

2013-02-01 18:25:33 EST

(In reply to comment #6)
> I'm not even sure that I agree with the basic assumption and all patch sets need
> to be downloaded eagerly. What if we restored the original behavior and only
> downloaded the open patch set which would default to the latest patch set? We
> could add an expand all button to easily fetch all patch sets as needed.

Remember again that this need is being driven by Navigator *as it is currently implemented*, not Editor, and the reason that we have an issue w/ Navigator is not that we can't difer loading for that, but that we don't have a clean strategy for requesting and notification of changes that doesn't couple w/ editor. I'm just making the argument that we should differ implementation until after we have the model loading entirely decoupled from the interface. Then we can easily persue whatever loading strategies we want as appropriate.

Comment 8 Steffen Pingel

2013-02-01 19:26:51 EST

(In reply to comment #7)
> Remember again that this need is being driven by Navigator *as it is currently
> implemented*, not Editor, and the reason that we have an issue w/ Navigator is
> not that we can't difer loading for that, but that we don't have a clean
> strategy for requesting and notification of changes that doesn't couple w/
> editor. I'm just making the argument that we should differ implementation until
> after we have the model loading entirely decoupled from the interface. Then we
> can easily persue whatever loading strategies we want as appropriate.

Sure, but the changes have introduced a performance regression and I am starting to doubt the need to have all patch sets downloaded right away. Right now loading is still triggered by the editor and we have all infrastructure in place to load patch sets lazily.

This will certainly change in the future but for now I would prefer to go back to what we previously had and improve on that rather than release a significant performance regression and then try to find a solution later: https://git.eclipse.org/r/10127

Comment 9 Miles Parker

2013-02-01 19:37:12 EST

Yeah... It's a cost/benefit between a) the performance regression against b) the UI confusion for users who can't figure out why their patch set contents aren't being shown, and you're saying a) is better. It sort of makes a lot of the features of the review navigator pointless for now, but I can both sides.

Comment 10 Steffen Pingel

2013-02-02 17:44:45 EST

I have merged the change to restore the old behavior. We can reopen this bug if necessary once we look into decoupling fetching of change sets from the editor.

Comment 11 Steffen Pingel

2013-02-02 17:46:20 EST

*** Bug 399697 has been marked as a duplicate of this bug. ***

Comment 12 Sam Davis

2013-02-04 13:37:41 EST

(In reply to comment #9)
> Yeah... It's a cost/benefit between a) the performance regression against b) the
> UI confusion for users who can't figure out why their patch set contents aren't
> being shown, and you're saying a) is better. It sort of makes a lot of the
> features of the review navigator pointless for now, but I can both sides.

Perhaps it would be worth adding a button in the navigator to fetch all patch sets as an interim solution.

Comment 13 Miles Parker

2013-02-04 13:55:03 EST

(In reply to comment #12)
> (In reply to comment #9)
> Perhaps it would be worth adding a button in the navigator to fetch all patch
> sets as an interim solution.

Yeah, I think Steffen suggested something like that above. And we might want that in any case. But I don't think it makes sense to go through the excercise of coupling back to the editor at this point. Hopefully we'll get that remote API etc.. in soon enough to render the issue moot!