Bug 249681 - [DB] Decrease amount of database queries
Summary: [DB] Decrease amount of database queries
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.db (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P3 enhancement (vote)
Target Milestone: M4   Edit
Assignee: Stefan Winkler CLA
QA Contact:
URL:
Whiteboard: Lighter, Faster and Better
Keywords:
Depends on: 214487
Blocks:
  Show dependency tree
 
Reported: 2008-10-04 12:09 EDT by Stefan Winkler CLA
Modified: 2010-06-29 04:19 EDT (History)
2 users (show)

See Also:
stepper: galileo+
stepper: review+


Attachments
Proposed patch (32.17 KB, patch)
2008-10-04 12:09 EDT, Stefan Winkler CLA
no flags Details | Diff
Patch v2 (35.33 KB, patch)
2008-10-07 06:19 EDT, Eike Stepper CLA
no flags Details | Diff
New patch: batched prepared Statements. (14.78 KB, patch)
2008-11-28 07:32 EST, Stefan Winkler CLA
no flags Details | Diff
Patch v4 - ready to be committed (16.16 KB, patch)
2008-11-28 13:56 EST, Eike Stepper CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Winkler CLA 2008-10-04 12:09:56 EDT
Created attachment 114262 [details]
Proposed patch

As discussed in the newsgroup one week ago, CDO's DB backend does a lot of single SQL queries - even if the revision manager knows that it has to load a lot of revisions. 

This can be optimized by combining queries leading to less overhead at the database side (less SQL parsing, less - or more efficient - table accesses etc.).

E.g. I created a testcase for which I wrote a resource and 200 customers, 200 companies and 200 suppliers into a database (I am referring to test Model1 classes). 

Opening a new session and accessing the resource's contents results in 2634 single SQL queries!

Over the week-end I thought of a solution to the polymorphism-problem discussed in the newsgroup. I successfully implemented this in the horizontal mapping strategy. I repeated the same test as described above and came down to just 45 SQL queries!

This also led to a runtime boost of about 40% in the case above.

Please have a look at the patch - perhaps some details and special cases are missing, but it seems to work in my test setting.
Comment 1 Eike Stepper CLA 2008-10-07 06:19:48 EDT
Created attachment 114403 [details]
Patch v2

I've refactored your proposal a bit:

- Fixed API warnings
- Used interfaces in signatures
- Moved default impl of readRevisions() into StoreAccessor (so that the HibernateStore compiles)

I think the two remaining issues are:

1) AllTests does not pass without failures anymore
2) We would need the same optimization for readRevisionsByTime()

Another concern is the bloating of interfaces but I think I can consolidate this when everything is working fine. Apart from these comments, well done! ;-)
Comment 2 Stefan Winkler CLA 2008-10-07 06:43:13 EDT
Hi,

WRT the bloating of interfaces: Maybe the interfaces could be reworked so that we only have multi-reads (and as a next step multi-writes) at the internal level.
So that from the topmost layer getRevision(ID) would be replaced by getRevisions(List<ID>) thus making the single-item-methods (getRevision, readRevision, etc.) obsolete.

WRT testing, is there a procedure to test that a revision has been actually written correctly into the database? Just opening a view and rereading the revision at test-case-level only reads from the cache, right? 
Comment 3 Eike Stepper CLA 2008-10-07 06:51:46 EDT
(In reply to comment #2)
> WRT the bloating of interfaces: Maybe the interfaces could be reworked so that
> we only have multi-reads (and as a next step multi-writes) at the internal
> level.
> So that from the topmost layer getRevision(ID) would be replaced by
> getRevisions(List<ID>) thus making the single-item-methods (getRevision,
> readRevision, etc.) obsolete.

Yes, that's what I had in mind. As I said, I can do this when everything is working ;-)

 
> WRT testing, is there a procedure to test that a revision has been actually
> written correctly into the database? Just opening a view and rereading the
> revision at test-case-level only reads from the cache, right?

Opening a new session would at least bypass former client caches. I will add a method to the test framework to purge the server caches as well. Please watch bug #249937
Comment 4 Simon Mc Duff CLA 2008-10-07 08:29:51 EDT
(In reply to comment #3)
> (In reply to comment #2)
> > WRT the bloating of interfaces: Maybe the interfaces could be reworked so that
> > we only have multi-reads (and as a next step multi-writes) at the internal
> > level.
> > So that from the topmost layer getRevision(ID) would be replaced by
> > getRevisions(List<ID>) thus making the single-item-methods (getRevision,
> > readRevision, etc.) obsolete.
> 
> Yes, that's what I had in mind. As I said, I can do this when everything is
> working ;-)
> 
> 
> > WRT testing, is there a procedure to test that a revision has been actually
> > written correctly into the database? Just opening a view and rereading the
> > revision at test-case-level only reads from the cache, right?
> 
> Opening a new session would at least bypass former client caches. I will add a
> method to the test framework to purge the server caches as well. Please watch
> bug #249937
> 

I was using the following so far to purge the current revision.

  protected void ConfigTest.removeAllRevisions(IRevisionManager revisionManager)
  {
    CDORevisionCache cache = ((RevisionManager)revisionManager).getCache();
    for (CDORevision revision : cache.getRevisions())
    {
      cache.removeRevision(revision.getID(), revision.getVersion());
    }
  }
Comment 5 Stefan Winkler CLA 2008-10-10 08:10:12 EDT
Hi,

I just detected a problem with HSQLDB using both indexes (see Bug 250411) and this patch.
As described in this message (https://sourceforge.net/forum/message.php?msg_id=1978354 ), if an index TABLE(COLUMN1) exists in HSQLDB, the query SELECT * FROM TABLE WHERE COLUMN1 = 1 uses the index. Fine.
However, SELECT * FROM TABLE WHERE COLUMN1 = 1 OR COLUMN1 = 2 does NOT use the index, which renders the optimization proposed in this patch completely useless as it is more efficient (on HSQLDB) to issue two separate queries of the first form instead of on of the second.
As long as this issue is not fixed in HSQLDB, the patch above either has to be held back, or it has to be refactored so that optimizations go into specific DB adapters (BTW, the multi-select would work, if it would be issued as SELECT * FROM TABLE WHERE COLUMN1 = 1 UNION SELECT * FROM TABLE WHERE COLUMN1 = 2 but I am not sure if this is optimal for MySql or Derby). 
Comment 6 Eike Stepper CLA 2008-10-12 05:27:07 EDT
Is it correct that you refer to the SQL code generated in ClassMapping?

  protected void readAttributes(IDBStoreReader storeReader, List<InternalCDORevision> revisions, String where,
      boolean readVersion)
  {
    StringBuilder builder = new StringBuilder(readVersion ? selectPrefixWithVersion : selectPrefix);
    builder.append("( ");

    Iterator<InternalCDORevision> iterator = revisions.iterator();
    while (iterator.hasNext())
    {
      InternalCDORevision revision = iterator.next();
      long id = CDOIDUtil.getLong(revision.getID());
      builder.append(CDODBSchema.ATTRIBUTES_ID);
      builder.append(" = ");
      builder.append(id);
      if (iterator.hasNext())
      {
        builder.append(" OR ");
      }
    }

Questions:

1) Wouldn't it be a shorter SQL statement if we use " <column> IN (v1, v2, v3, ...)" ?
2) Would the HSQLDB problem apply to a statement like in 1) as well?
3) Could we circumvent the whole problem by using a parameterized PreparedStatement? I've cc'ed you to bug #214487 as well ...

Comment 7 Eike Stepper CLA 2008-10-12 05:36:57 EDT
Btw. There's a conflict now in the latest patch ;-(
Comment 8 Stefan Winkler CLA 2008-10-13 07:04:52 EDT
(In reply to comment #6)
> Is it correct that you refer to the SQL code generated in ClassMapping?
yes.

> 1) Wouldn't it be a shorter SQL statement if we use " <column> IN (v1, v2, v3,
> ...)" ?
Yes, you are right. I made a mistake in my first test so that I had the impression that hsql would not support the in () syntax.
So, yes, that code should be changed to in (...) in a next patch version.

> 2) Would the HSQLDB problem apply to a statement like in 1) as well?
Sadly, yes. (you can try using "EXPLAIN PLAN FOR (select * from ... where ...)" which lists access = [FULL_SCAN] in both cases :-( ) 

> 3) Could we circumvent the whole problem by using a parameterized
> PreparedStatement? I've cc'ed you to bug #214487 as well ...

As I understand prepared statements (someone correct me if I'm mistaken) they only come in the flavor of a fixed number of parameters.
Consequently, we would have to create a prepared statement depending on the number of revisions to be read which in my eyes contradicts the whole prepared statement issue.

Maybe it would be a better idea to use prepared statements INSTEAD of reading multiple revisions. 
Prepared statements pre-parse and pre-plan the execution and so reading multiple revisions using single prepared statements still save the overhead of parsing sql for every single query. 
Comment 9 Eike Stepper CLA 2008-10-13 07:37:55 EDT
> Maybe it would be a better idea to use prepared statements INSTEAD of reading
> multiple revisions. 
> Prepared statements pre-parse and pre-plan the execution and so reading
> multiple revisions using single prepared statements still save the overhead of
> parsing sql for every single query.

Yes, I think that was my intention ;-)
And we should also not forget that your approach doesn't scale so well, even if we change it to use "IN (...)"...

Comment 10 Simon Mc Duff CLA 2008-11-04 10:02:55 EST
Stefan, do you plan to use batch as well ? Since what it is very slow is every single call to the server ?

Or maybe just have that in mind when refactoring.. to be easier to add it afterward.
Comment 11 Stefan Winkler CLA 2008-11-05 03:10:43 EST
(In reply to comment #10)
> Stefan, do you plan to use batch as well ? Since what it is very slow is every
> single call to the server ?
> 
> Or maybe just have that in mind when refactoring.. to be easier to add it
> afterward.
Yes, that's what I was thinking, too.

Comment 12 Stefan Winkler CLA 2008-11-17 05:20:34 EST
Simon,

What do you think about the following idea:
As writing to the database is done all-at-once (at CDO commit time) we could do an addBatch instead of each executeUpdate and executeBatch just before doing the commit. This way we would collect all updates at the CDO server and send them to the database all at once. 

Do you see a problem with distributed transactions / XATransactions? Neither Eike nor me don't know very much about that topic.
Comment 13 Simon Mc Duff CLA 2008-11-17 07:39:30 EST
(In reply to comment #12)
> Simon,
> 
> What do you think about the following idea:
> As writing to the database is done all-at-once (at CDO commit time) we could do
> an addBatch instead of each executeUpdate and executeBatch just before doing
> the commit. This way we would collect all updates at the CDO server and send
> them to the database all at once. 
> 
> Do you see a problem with distributed transactions / XATransactions? Neither
> Eike nor me don't know very much about that topic.
> 

hi Stefan,

What is important is to separate to make available CDOID of the object separatly than the commit.


Into the 
IStoreAccessor.write (populate the addApplyMapping)

IStoreAccessor.Commit() commit changes.


Also, IStoreAccessor.commit should be really fast and no error is allowed there... means also it should never fail. So the validation from the back-end should be done before that. (IN the write)

I think if we do.

IStoreAccesssor.write
addBatch
....
addBatch
executeBatch (only once at the end)

IStoreAccessor.commit
commit


It should be efficient... What do you think ?
Comment 14 Stefan Winkler CLA 2008-11-17 08:05:32 EST
> I think if we do.
> 
> IStoreAccesssor.write
> addBatch
> ....
> addBatch
> executeBatch (only once at the end)
> 
> IStoreAccessor.commit
> commit
> 
> 
> It should be efficient... What do you think ?
> 

But how do we know that the end is reached? 
I agree that we want to do the executeBatch immediately before the commit, and that is why I have proposed initially to executeBatch in Store.commit(). But if that's not possible, I don't know how the store can figure out, if that's the end of data or if there's more coming before the commit.


Comment 15 Simon Mc Duff CLA 2008-11-17 08:09:08 EST
(In reply to comment #14)
> > I think if we do.
> > 
> > IStoreAccesssor.write
> > addBatch
> > ....
> > addBatch
> > executeBatch (only once at the end)
> > 
> > IStoreAccessor.commit
> > commit
> > 
> > 
> > It should be efficient... What do you think ?
> > 
> 
> But how do we know that the end is reached? 

at the end of IStoreAccessor.write.

> I agree that we want to do the executeBatch immediately before the commit, and
> that is why I have proposed initially to executeBatch in Store.commit(). But if
> that's not possible, I don't know how the store can figure out, if that's the
> end of data or if there's more coming before the commit.
(it doesn't happen so far) but in this case executeBatch will be done as many times StoreAccessor.write is called. write will not be called many many times before a commit.. I don't expect that. maybe one or two.. maybe three.
> 

Comment 16 Stefan Winkler CLA 2008-11-27 11:26:32 EST
Eike, Simon, 

I started working on the patch today but got stuck. Maybe one of you has enough insight to give a statement:
Does the DBStore as it currently is implemented rely on the statements being executed in the correct order?

Example:
INSERT ObjectV1
REVISE ObjectV1
INSERT ObjectV2

Could this happen during the same commit operation - or to be more specific, could this happen between two calls of IStoreAccessor.write() ?

What brought up this idea was the fact that detachObject currently results in the quite unspecified
UPDATE Foobar SET revised = <timestamp> WHERE cdoid = <id> AND revised = 0
Since there's no version in there, this sets all unrevised versions to revised, which is currently ok because there at most one unrevised version if the DB state is consistent. However, if we use deferred execution with batches and if the case above could happen, then we would get inconsistent states.

So, in short: is it possible that
a. an object is detached and then reattached?
b. with the same transaction?
c. and does it get the same CDOID or is a new ID assigned on reattaching?

The other case from above would be that a revision is inserted and (explicitly) revised (by version) within the same call of IStoreAccessor.write. Is this case possible? 
Comment 17 Simon Mc Duff CLA 2008-11-27 11:34:59 EST
(In reply to comment #16)
> Eike, Simon, 
> 
> I started working on the patch today but got stuck. Maybe one of you has enough
> insight to give a statement:
> Does the DBStore as it currently is implemented rely on the statements being
> executed in the correct order?
> 
> Example:
> INSERT ObjectV1
> REVISE ObjectV1
> INSERT ObjectV2
> 
> Could this happen during the same commit operation - or to be more specific,
> could this happen between two calls of IStoreAccessor.write() ?
> 
> What brought up this idea was the fact that detachObject currently results in
> the quite unspecified
> UPDATE Foobar SET revised = <timestamp> WHERE cdoid = <id> AND revised = 0
> Since there's no version in there, this sets all unrevised versions to revised,
> which is currently ok because there at most one unrevised version if the DB
> state is consistent. However, if we use deferred execution with batches and if
> the case above could happen, then we would get inconsistent states.
> 
> So, in short: is it possible that
> a. an object is detached and then reattached?
At the client side yes... but for CDO it is a different objects. The CDOID should not be reassign to another objects
> b. with the same transaction?
> c. and does it get the same CDOID or is a new ID assigned on reattaching?
b-c are not valid.. since we cannot reattached objects.

Maybe Eike have a different opinion ?
> 
> The other case from above would be that a revision is inserted and (explicitly)
> revised (by version) within the same call of IStoreAccessor.write. Is this case
> possible? 
A new objects cannot be revised in the same call. Is that your question ?

Basically an object in one single commit could be :
- NewObjects
Or
- Detach Objects
OR
- DirtyObjects

They are exclusive 

Let me know if I didn't answered some questions... :-) or if you need more details.

Simon


Comment 18 Eike Stepper CLA 2008-11-28 03:25:09 EST
(In reply to comment #16)
> Does the DBStore as it currently is implemented rely on the statements being
> executed in the correct order?

I think so, but this should never happen during one commit operation!

> Example:
> INSERT ObjectV1
> REVISE ObjectV1
> INSERT ObjectV2

An object can generally not be added twice to the repository with the same CDOID.

> Could this happen during the same commit operation - or to be more specific,
> could this happen between two calls of IStoreAccessor.write() ?
> 
> What brought up this idea was the fact that detachObject currently results in
> the quite unspecified
> UPDATE Foobar SET revised = <timestamp> WHERE cdoid = <id> AND revised = 0
> Since there's no version in there, this sets all unrevised versions to revised,
> which is currently ok because there at most one unrevised version if the DB
> state is consistent. However, if we use deferred execution with batches and if
> the case above could happen, then we would get inconsistent states.

If that, what you describe, is currently happening in the DBStore I guess we have a problem *before* that happens already. Only one of the revisions of the same object (CDOID) may have revised==0!! If that's not the case at any time, we have a bug there! Can you confirm this, Stefan?

> So, in short: is it possible that
> a. an object is detached and then reattached?
> b. with the same transaction?

Yes.

> c. and does it get the same CDOID 

No.

> or is a new ID assigned on reattaching?

Yes.
Comment 19 Stefan Winkler CLA 2008-11-28 07:18:35 EST
(In reply to comment #18)
> I think so, but this should never happen during one commit operation!
As just discussed on Skype, this is not entirely true, as 
- revising serves as a locking mechanism
- checkDuplicateResources queries the database after reviseRow has been called which does not work correctly if the revise operation is deferred.

As a consequence, only insert operations will be deferred, while revise operations will be executed immediately.
Comment 20 Stefan Winkler CLA 2008-11-28 07:32:08 EST
Created attachment 118988 [details]
New patch: batched prepared Statements.
Comment 21 Eike Stepper CLA 2008-11-28 13:56:26 EST
Created attachment 119038 [details]
Patch v4 - ready to be committed

For comparingly small commits I couldn't see any performance difference, but I guess that's expected ;-)
Comment 22 Eike Stepper CLA 2008-11-28 13:58:05 EST
Stefan, please check that you don't commit changes in some launch configs! I might have accidentally included them into patch v4 ;-(
Comment 23 Stefan Winkler CLA 2008-12-01 08:33:04 EST
Committed to HEAD.

I got a very significant performance gain for commits of 1000 objects.
Comment 24 Eike Stepper CLA 2008-12-19 12:36:49 EST
Fix available in CDO 2.0.0M4
Comment 25 Eike Stepper CLA 2009-06-27 11:53:22 EDT
Generally available.