272475 – [DB] Increase DBStore performance for isMany() attributes

Bug 272475 - [DB] Increase DBStore performance for isMany() attributes

Summary: [DB] Increase DBStore performance for isMany() attributes

Status:	NEW

Alias:	None

Product:	EMF
Classification:	Modeling
Component:	cdo.db (show other bugs)
Version:	4.13
Hardware:	All All

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	Stefan Winkler
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:	272478 366686 367356
Blocks:
	Show dependency tree

Reported:	2009-04-16 05:58 EDT by Stefan Winkler
Modified:	2020-12-11 10:37 EST (History)
CC List:	4 users (show)

See Also:

Attachments
algorithm (8.51 KB, text/plain) 2009-04-19 11:05 EDT, Stefan Winkler	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stefan Winkler

2009-04-16 05:58:12 EDT

This is a container for discussion and realization of ideas and enhancements which target the improvement of isMany() attribute handling (both references and attributes) in the DBStore.

Comment 1 Stefan Winkler

2009-04-16 06:11:35 EDT

Idea#1:
include a hint in CDO

Comment 2 Stefan Winkler

2009-04-16 06:13:08 EDT

sorry - please ignore comment#1.

Idea#1: 
Provide native support for attributes having isOrdered() == false (see Bug 272478).

Comment 3 Stefan Winkler

2009-04-17 04:02:03 EDT

Idea #2:
Simon, is it possible to let CDOAddFeatureDelta and CDORemoveFeatureDelta carry a flag indicating that this is the last item in the list so that move1up/move1down do not have to be executed?

Comment 4 Stefan Winkler

2009-04-17 04:18:21 EDT

Idea #3:
This is a bit unconventional, but it came to me yesterday evening. 
Before really going into this, the DBStore should be profiled to get to know up to which degree  move1up/move1down operations are a performance issue. If they are we could implement a specific list mapping which reflects a linked-list structure (as opposed to the array-list-like structure, we currently use on the database).

This idea works as follows: Currently, we store lists in the database as (cdo_id, index, value).
The new structure would be (cdo_id, index, nextIndex, value) where nextIndex defaults to index+1. This way reading the ordered list by index is still equally fast in the best case.

However instead of adjusting all subsequent indices during random element add/remove operations, we implement add(id, newIndex, newValue) as 

INSERT into list_table VALUES(id, lastIndex, newIndex, newValue) 
UPDATE list_table SET nextIndex = lastIndex WHERE cdo_id = id  AND index = newIndex-1;

and remove(id, index) as 

UPDATE list_table SET nextIndex = (SELECT nextIndex FROM list_table WHERE index = indexToRemove and cdo_id = id) WHERE cdo_id = id AND index = indexToRemove-1;
DELETE FROM list_table WHERE cdo_id = id AND index = indexToRemove;

This way, random list modification would be quicker than before for huge collections. 
Of course, the indices (which are really some kind of ID) get more and more fragmented over time - so they should be reorganized from time to time by reading the list and rewriting it from scratch. This could be done as a maintenance job.

Of course this implementation may be slower in some other cases - however, we could implement it as an alternative and let the user choose (e.g. using EAnnotations as proposed in bug 256635.)

Comment 5 Simon Mc Duff

2009-04-17 06:55:29 EDT

(In reply to comment #3)
> Idea #2:
> Simon, is it possible to let CDOAddFeatureDelta and CDORemoveFeatureDelta carry
> a flag indicating that this is the last item in the list so that
> move1up/move1down do not have to be executed?
> 

humm.. can you know that by accessing the list in CDOListFeatureDelta.getListChanges().get(size()-1) ??


Simon

Comment 6 Stefan Winkler

2009-04-17 08:13:19 EDT

(In reply to comment #5)
> (In reply to comment #3)
> > Idea #2:
> > Simon, is it possible to let CDOAddFeatureDelta and CDORemoveFeatureDelta
> carry
> > a flag indicating that this is the last item in the list so that
> > move1up/move1down do not have to be executed?
> humm.. can you know that by accessing the list in
> CDOListFeatureDelta.getListChanges().get(size()-1) ??
Sorry, that was ambiguous. I did not mean "that this [delta] is the last [delta] in the list [of deltas for this feature]". What I meant was: if the client invokes either 
list.add(item)  // adds item to the end of the list
or
list.remove(item, list.size() -1) // removes item at the end of the list
then the resulting CDO{Add|Remove}FeatureDelta should have the wished-for flag set to true in order to indicate that this operation does not require subsequent list indexes to be shifted when applying the delta.

Stefan

Comment 7 Simon Mc Duff

2009-04-17 08:28:28 EDT

(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #3)
> > > Idea #2:
> > > Simon, is it possible to let CDOAddFeatureDelta and CDORemoveFeatureDelta
> > carry
> > > a flag indicating that this is the last item in the list so that
> > > move1up/move1down do not have to be executed?
> > humm.. can you know that by accessing the list in
> > CDOListFeatureDelta.getListChanges().get(size()-1) ??
> Sorry, that was ambiguous. I did not mean "that this [delta] is the last
> [delta] in the list [of deltas for this feature]". What I meant was: if the
> client invokes either 
> list.add(item)  // adds item to the end of the list
> or
> list.remove(item, list.size() -1) // removes item at the end of the list
> then the resulting CDO{Add|Remove}FeatureDelta should have the wished-for flag
> set to true in order to indicate that this operation does not require
> subsequent list indexes to be shifted when applying the delta.
> 
> Stefan
> 
Since many others operations could come after...We will  need to go through all the list to detect such change.
e.g.:
1- list.add(item1)
2- list.add(item2)
If I understand you correctly only the last one(2) should have the flags in this example. 



I think this is something really specific that you can calculate at the back-end since you have the complete list of the deltas. What do you think ?

In fact, you could calculate ranges of moves based on that list. (Delta)

index 0 to 5 increment by 1;
index 6 to 10 decrement by 5;
etc....

Simon

Comment 8 Stefan Winkler

2009-04-17 08:54:39 EDT

Maybe I have the wrong impression of the list feature dealtas - so let me descibe my impression first to be sure that we don't have a misunderstanding.

Let f be a feature having isMany()==true. In my impression any operation the client performs on the list results in a CDOFeatureDelta, namely:
object.getF().set(index, newValue) -> CDOSetFeatureDelta(index, newValue)
object.getF().add(index, newValue) -> CDOAddFeatureDelta(index, newValue)
object.getF().remove(index) -> CDOSetFeatureDelta(index)
object.getF().move(oldIndex, newIndex) -> CDOSetFeatureDelta(oldIndex, newIndex)
When committing (well, or maybe earlier), all of these feature deltas are added to a CDOListFeatureDelta which is then transmitted to the server which then applies each of these deltas in turn to the stored list on the server. Is this correct so far?

If yes, then consider my implementation on the server as follows:
void apply(CDOAddFeatureDelta delta) {
    move1up(index);
    insertValue(index, value);
}

where move1up is forall ( i > index ) { index[i] = index[i] + 1} so that after move1up there is a hole in the list at position _index_ which is then filled by the insertValue(index, value);

My point is that the call to move1up is unnecessary for each item that is added to the end of the list. E.g.
list = object.getF();  
list.add("one");
list.add("two");
list.add("three");
commit();
would result in 3 calls to move1up, all of which are unnecessary, because they all add to the end of the list as it is right then.
If I knew the list size at the server, I could implement an optimization such as:
void apply(CDOAddFeatureDelta delta) {
    if(index < listSize) {
      move1up(index);
    }
    insertValue(index, value);
}

My problem is, that I do not know the list size at this point because the FeatureDeltas do not give me the information and looking up the information in the database is costly. So what I want is a flag in CDOAddFeatureDelta that indicates that at the time when the item was added to the list, index was equal to list.size(), so that I can apply the optimization given above. 

Instead of the flag, I would also be happy with CDOAddFeatureDelta carrying the information, what list.size() was before or after the add operation was performed on the list.

Is it now more obvious what my request is?

Comment 9 Simon Mc Duff

2009-04-17 10:03:37 EDT

(In reply to comment #8)
> Maybe I have the wrong impression of the list feature dealtas - so let me
> descibe my impression first to be sure that we don't have a misunderstanding.
> 
> Let f be a feature having isMany()==true. In my impression any operation the
> client performs on the list results in a CDOFeatureDelta, namely:
> object.getF().set(index, newValue) -> CDOSetFeatureDelta(index, newValue)
> object.getF().add(index, newValue) -> CDOAddFeatureDelta(index, newValue)
> object.getF().remove(index) -> CDOSetFeatureDelta(index)
> object.getF().move(oldIndex, newIndex) -> CDOSetFeatureDelta(oldIndex,
> newIndex)
> When committing (well, or maybe earlier), all of these feature deltas are added
> to a CDOListFeatureDelta which is then transmitted to the server which then
> applies each of these deltas in turn to the stored list on the server. Is this
> correct so far?
[SIMON] yes. (Small correction... all of these feature deltas are added
to a CDOListFeatureDelta while we go... not when committing)

> 
> If yes, then consider my implementation on the server as follows:
> void apply(CDOAddFeatureDelta delta) {
>     move1up(index);
>     insertValue(index, value);
> }
> 
> where move1up is forall ( i > index ) { index[i] = index[i] + 1} so that after
> move1up there is a hole in the list at position _index_ which is then filled by
> the insertValue(index, value);
> 
> My point is that the call to move1up is unnecessary for each item that is added
> to the end of the list. E.g.
> list = object.getF();  
> list.add("one");
> list.add("two");
> list.add("three");
> commit();
> would result in 3 calls to move1up, all of which are unnecessary, because they
> all add to the end of the list as it is right then.
> If I knew the list size at the server, I could implement an optimization such
> as:
> void apply(CDOAddFeatureDelta delta) {
>     if(index < listSize) {
>       move1up(index);
>     }
>     insertValue(index, value);
> }
> 
> My problem is, that I do not know the list size at this point because the
> FeatureDeltas do not give me the information and looking up the information in
> the database is costly. 
[SIMON] Is that so costly ? Usually to count the number of record is really fast... doesn't need to bring too much record.

So what I want is a flag in CDOAddFeatureDelta that
> indicates that at the time when the item was added to the list, index was equal
> to list.size(), so that I can apply the optimization given above. 
> 
> Instead of the flag, I would also be happy with CDOAddFeatureDelta carrying the
> information, what list.size() was before or after the add operation was
> performed on the list.
> 
> Is it now more obvious what my request is?
> 

Yes I understand. 
Since to add what you requested on each featureDelta... is really specific for only one Store I'm thinking that it not necessary to be in the framework. 

Also,
- You can have that information in the back-end 
or 
- From a previous revision that is already loaded.

First, I'm really surprised that  it is so expensive?
 We only need to do it once per CDOListFeatureDelta ? Right? Not for every elements on that list.

If you tell me it is really expensive (with some explanation to understand why it is so expensive) we could think of a more generic solution.

What do you think ?

Comment 10 Simon Mc Duff

2009-04-17 10:48:32 EDT

Stefan, I really think we can do better.

With the listfeatureDelta we can calculate the index at the end.

We can do and update only once for all row instead of for each featuredelta.

-Calculate row we need to remove
- Remove them
- Calculate the range required
- update the DB 1 request by range

set index = index - 1
from index >= 10 to index <=18 

set index = index + 1
from index >= 0 to index <=5 

etc..etc..
For every element, we will update it's row only once instead of many like the moment.
It will be more efficient... What do you think ?

Comment 11 Stefan Winkler

2009-04-19 11:05:49 EDT

Created attachment 132347 [details]
algorithm

Ok, I squeezed my brain over the past 48 hors and this is, what I came up with.
I read the size of the original list from the database, then basically I create a mapping from source and destination indexes which are initialized as 0->0, 1->1, ... per default.
Then I apply each of the list-feature-deltas recording any effects and finally write the changes to the database based on the effects recorded.

Now, I have neither integrated the algorithm with the real implementation nor have I tested any of it (this attachment is more of a sketch and a swap space for my brains), but basically it could work.

Simon, I also have included handling for cases like

list.add(5, "item");
list.move(5, 7);
list.remove(7);

this sequence of operations results in no change at all.
Also

list.add(5, "item");
list.move(5, 7);

is equal to 

list.add(7, "item");

I'm just curious: Do you handle and simplify these cases somehow, or do you record list deltas as they occur?

Comment 12 Simon Mc Duff

2009-04-19 14:57:36 EDT

(In reply to comment #11)
> Created an attachment (id=132347) [details]
> algorithm
> 
> Ok, I squeezed my brain over the past 48 hors and this is, what I came up with.
> I read the size of the original list from the database, then basically I create
> a mapping from source and destination indexes which are initialized as 0->0,
> 1->1, ... per default.
> Then I apply each of the list-feature-deltas recording any effects and finally
> write the changes to the database based on the effects recorded.
> 
> Now, I have neither integrated the algorithm with the real implementation nor
> have I tested any of it (this attachment is more of a sketch and a swap space
> for my brains), but basically it could work.
> 
> Simon, I also have included handling for cases like
> 
> list.add(5, "item");
> list.move(5, 7);
> list.remove(7);
> 
At the moment, If this happens to the client, you will not received anything.
You can look at the algorithm in CDOListFeatureDeltaImpl.


> this sequence of operations results in no change at all.
> Also
> 
> list.add(5, "item");
> list.move(5, 7);
> 
> is equal to 
> 
> list.add(7, "item");
I don't believe we have this mechanism at the client. Do you treat these cases as well..
> 
> I'm just curious: Do you handle and simplify these cases somehow, or do you
> record list deltas as they occur?
> 
I simplify some cases... but I agree that we could add more sophisticated stuff... if it is not too much processing!
I will look at what you've done.

Also do you adjust the index only once per record ? Or it could still happen you update one row several times ?

Comment 13 Simon Mc Duff

2009-04-19 17:00:34 EDT

I double check the way we handle the following
list.add(5, "item");
list.remove(5);

it seems that we Do not handle these... we should .. :-(

Comment 14 Simon Mc Duff

2009-04-19 17:56:28 EDT

(In reply to comment #13)
> I double check the way we handle the following
> list.add(5, "item");
> list.remove(5);
> 
> it seems that we Do not handle these... we should .. :-(
> 

I mean, We will send to the back-end

- CDOAddFeatureDelta
- CDORemovefeatureDelta.

Simon

Comment 15 Stefan Winkler

2009-04-20 04:51:54 EDT

My algorithm should handle these cases. It is kind of an "list-delta-minimization" algorithm.
Maybe we could use parts of it in the core. (This is, what I meant Bug 266121 to be all about in the first place).

(Just as a side note, in the attachment, there is an error: the destinationIndex should not be assigned NONE (which is 0), but another constant (e.g. -1) which is out of range of regular indices).

However, I will postpone further integration of this algorithm into the DBStore, as I don't have the time right now. If anyone wants faster non-audit-list-delta-processing in the DBStore, wait for Bug 271444 to be committed to HEAD and do the integration based on that implementation.

Comment 16 Eike Stepper

2009-11-01 06:00:42 EST

Rebasing all unresolved enhancement requests to 3.0

Comment 17 Eike Stepper

2010-06-29 04:51:36 EDT

Rebasing all outstanding enhancements requests to version 4.0

Comment 18 Eike Stepper

2011-06-23 03:59:37 EDT

Moving all open enhancement requests to 4.1

Comment 19 Stefan Winkler

2011-12-14 07:05:20 EST

Just as a comment after this bug has remained open for 3 years now.
I want to keep this bug open, because it contains some valuable ideas on further performance improvements. So let this just be an idea-collection meta bug.

Comment 20 Eike Stepper

2012-08-14 22:54:30 EDT

Moving all open issues to 4.2. Open bugs can be ported to 4.1 maintenance after they've been fixed in master.

Comment 21 Eike Stepper

2013-06-27 04:09:34 EDT

Moving all outstanding enhancements to 4.3

Comment 22 Eike Stepper

2014-08-19 09:29:39 EDT

Moving all open enhancement requests to 4.4

Comment 23 Eike Stepper

2014-08-19 09:38:20 EDT

Moving all open enhancement requests to 4.4

Comment 24 Eike Stepper

2015-07-14 02:16:43 EDT

Moving all open bugzillas to 4.5.

Comment 25 Eike Stepper

2016-07-31 00:59:41 EDT

Moving all unaddressed bugzillas to 4.6.

Comment 26 Eike Stepper

2017-12-28 01:09:55 EST

Moving all open bugs to 4.7

Comment 27 Eike Stepper

2019-11-08 02:07:14 EST

Moving all unresolved issues to version 4.8-

Comment 28 Eike Stepper

2019-12-13 12:43:19 EST

Moving all unresolved issues to version 4.9

Comment 29 Eike Stepper

2020-12-11 10:37:36 EST

Moving to 4.13.