Bug 255322 - [DB] Possible leakage with prepared statements
Summary: [DB] Possible leakage with prepared statements
Status: CLOSED FIXED
Alias: None
Product: EMF
Classification: Modeling
Component: cdo.db (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows Vista
: P3 normal (vote)
Target Milestone: M4   Edit
Assignee: Stefan Winkler CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 214487
  Show dependency tree
 
Reported: 2008-11-14 04:15 EST by Eike Stepper CLA
Modified: 2010-06-29 04:19 EDT (History)
2 users (show)

See Also:
stepper: review+


Attachments
Patch (15.46 KB, patch)
2008-11-17 10:13 EST, Stefan Winkler CLA
no flags Details | Diff
Patch v2 (15.99 KB, patch)
2008-11-21 08:57 EST, Eike Stepper CLA
no flags Details | Diff
Patch-v3 (33.57 KB, patch)
2008-11-21 15:48 EST, Stefan Winkler CLA
no flags Details | Diff
Patch v4 - ready to be committed (61.62 KB, patch)
2008-11-24 07:48 EST, Eike Stepper CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eike Stepper CLA 2008-11-14 04:15:57 EST
The test case org.eclipse.emf.cdo.tests.TransactionTest.testCreateManyTransactions() shows bad results with prepared statements. Starting with transaction 600 commit times go up considerably. Starting with transaction 750 or so there's nearly no visible progress anymore. I suspect that the prepared statements or their result sets are not closed properly.
Comment 1 Stefan Winkler CLA 2008-11-14 07:41:58 EST
We seem to have a general problem here:

Tested with PreparedStatements 
-> yields the result you gave above.

Tested with NonPreparedStatements 
-> yields almost the same result, only about 100 Transactions later.

Tested with HEAD before the JDBCDelegate commit
-> is ok, but performance seems to decrease with more transaction

Tested with HEAD before the JDBCDelegate commit, but raised the testing amount to 3000 transactions
-> starting at around 2500 transactions we get the same results.


=> it is natural that transaction duration increases over time, because tables get larger and every transaction we have one more reference to store. However in all three cases, performance drops drastically at a certain point (maybe GC starts working?)

I'm just installing TPTP and will try to nail down the leak(s). 
Comment 2 Stefan Winkler CLA 2008-11-14 12:50:27 EST
Strange ...

Earlier today I also got memory issues both with nonprepared statements and with prepared ones (see above). Now I even don't run into big issues with prepared statements. Will continue to investigate this.

However, I have a small advance: 
There seems to be no leak per se (at least that's what I find with TPTP and VisualVM). But what happens in the HSQLDB PreparedStatement case is that PS are not pooled, which leads to a statement being compiled for each and every single reference row that is written to the database. 

When I implemented the PreparedStatement-Delegate I was reading about JDBC that starting with JDBC Sepc. version 3, PS are pooled transparently. It seems that HSQLDB does not implement this standard. 

The solution would be to implement statement pooling in the delegate itself. I'll try that on Monday. 

Any thoughts on this are very welcome :-)
Comment 3 Eike Stepper CLA 2008-11-14 15:56:11 EST
This is what the HSQLDB docs say:

 Starting with HSQLDB 1.7.2, jdbcPreparedStatement objects are backed by a true compiled parameteric representation. Hence, there are now significant performance gains to be had by using a jdbcPreparedStatement object in preference to a jdbcStatement object, if a short-running SQL statement is to be executed more than a small number of times.

When it can be otherwise avoided, it is to be considered poor practice to fully prepare (construct), parameterize, execute, fetch and close a jdbcPreparedStatement object for each execution cycle. Indeed, under HSQLDB 1.8.0, this practice is likely to be noticably less performant for short-running statements than the equivalent process using jdbcStatement objects, albeit far more convenient, less error prone and certainly much less resource-intensive, especially when large binary and character values are involved, due to the optimized parameterization facility.

Instead, when developing an application that is not totally oriented toward the execution of ad hoc SQL, it is recommended to expend some effort toward identifing the SQL statements that are good candidates for regular reuse and adapting the structure of the application accordingly. Often, this is done by recording the text of candidate SQL statements in an application resource object (which has the nice side-benefit of isolating and hiding differences in SQL dialects across different drivers) and caching for possible reuse the PreparedStatement objects derived from the recorded text. 

http://hsqldb.org/doc/src/org/hsqldb/jdbc/jdbcPreparedStatement.html
Comment 4 Stefan Winkler CLA 2008-11-15 07:40:41 EST
(In reply to comment #3)
> This is what the HSQLDB docs say:
[...]

Yes I read that part, too. But what they don't say is what they didn't implement. I had a look into the HSQLDB JDBC Driver sources, and what they really say is:

We have implemented server-side prepared statements that can be called multiple times with different parameters (as opposed to "we simulate prepared statements by transforming them to native SQL statements which are sent to the server every time as a whole").

However, as soon as a preparedStatement object is closed, it is really gone (server-side). So there's definitely no pooling in HSQLDB.

Just to be sure, I had a look into the Mysql Connector JDBC driver (5.1.7), and this really does implement a preparedStatement cache.

So maybe, there should be either a flag parameter for the PreparedStatementJDBCDelegate which determines if the delegate itself should cache prepared statements, or there should be another JDBCDelegate implementation which caches prepared statements and depending on the JDBC driver used, the one or the other delegate should be used.
Comment 5 Stefan Winkler CLA 2008-11-17 10:13:52 EST
Created attachment 118052 [details]
Patch

This patch adds a statement cache to the PreparedStatementJDBCDelegate which should be automagically activated if the connection meta-data says that the JDBC connection does not support pooled statements.

Simon, can you please apply this patch and recheck, if the error you reported persists.

Also please make sure that your launch configuration includes the -Xmx512M flag which is also included in the AllTestsHSQLDB launch configuration. I accidentally tested with a new launch configuration this morning and got an out of memory exception at about 700 transactions, because the In-Memory HSQLDB gets quite large with that test case.
Comment 6 Simon Mc Duff CLA 2008-11-17 10:15:46 EST
(In reply to comment #5)
> Created an attachment (id=118052) [details]
> Patch
> 
> This patch adds a statement cache to the PreparedStatementJDBCDelegate which
> should be automagically activated if the connection meta-data says that the
> JDBC connection does not support pooled statements.
> 
> Simon, can you please apply this patch and recheck, if the error you reported
> persists.
Which errors ? Sorry Don't which one exactly! :-)


> 
> Also please make sure that your launch configuration includes the -Xmx512M flag
> which is also included in the AllTestsHSQLDB launch configuration. I
> accidentally tested with a new launch configuration this morning and got an out
> of memory exception at about 700 transactions, because the In-Memory HSQLDB
> gets quite large with that test case.
> 

Comment 7 Stefan Winkler CLA 2008-11-17 10:19:40 EST
s/Simon/Eike/ 

s/errors/issues from the description/

:-)

-->

(In reply to comment #0)
> The test case
> org.eclipse.emf.cdo.tests.TransactionTest.testCreateManyTransactions() shows
> bad results with prepared statements. Starting with transaction 600 commit
> times go up considerably. Starting with transaction 750 or so there's nearly no
> visible progress anymore. I suspect that the prepared statements or their
> result sets are not closed properly.

sorry, Simon, I remembered wrong that it was Eike who reported the bug. :-)
Comment 8 Eike Stepper CLA 2008-11-21 08:56:03 EST
Stefan, your patch looks good and the reported problem does not occur anymore. Well done!

Some comments, though:

1) We decided to mark whole bugzillas as pending for review (not particular patches). We have Mylyn queries to group bugzillas waiting for review.
2) I think AbstractJDBCDelegate (not IJDBCDelegate!) should extend Lifecycle and the store should activate/deactivate the configured instance.
3) Is the cacheStatements property configurable via xml file, too? We could simply use the store properties with a well-defined key...
4) Instead of providing a boolean property for caching configuration should we require the name of a Map class to switch on statement caching? I'm thinking about something like org.eclipse.net4j.util.ref.ReferenceValueMap<K, V>. But that could require some additional logic in the jdbc delegate to cope with cleared references. Maybe it's better to always use ReferenceValueMap (given that the caching switched on generally)...
5) I made some minor changes to your patch and will attach a new patch. Please review my changes...

I'll reject your patch mostly to discuss the above issues ;-)
Comment 9 Eike Stepper CLA 2008-11-21 08:57:26 EST
Created attachment 118467 [details]
Patch v2
Comment 10 Stefan Winkler CLA 2008-11-21 15:48:18 EST
Created attachment 118519 [details]
Patch-v3
Comment 11 Stefan Winkler CLA 2008-11-21 15:48:27 EST
2.) done
3.) done
4.) Used ReferenceValueMap.Soft - I don't know if it is sensible to make this configurable
5.) done
Comment 12 Eike Stepper CLA 2008-11-23 07:18:01 EST
Stefan, Is it possible that you mixed up the patches? Your v3 looks wrong to me...
Comment 13 Eike Stepper CLA 2008-11-24 07:48:41 EST
Created attachment 118584 [details]
Patch v4 - ready to be committed
Comment 14 Stefan Winkler CLA 2008-11-24 10:48:46 EST
Committed to HEAD.

(Yay, my first commit ;-))
Comment 15 Simon Mc Duff CLA 2008-11-24 11:19:09 EST
(In reply to comment #14)
> Committed to HEAD.
> 
> (Yay, my first commit ;-))
> 

YEAH!!

I will re-run my performance benchmark with MySql and compare it with previous test. (I will use Mysql)


Simon
Comment 16 Simon Mc Duff CLA 2008-11-24 11:36:40 EST
(In reply to comment #15)
> (In reply to comment #14)
> > Committed to HEAD.
> > 
> > (Yay, my first commit ;-))
> > 
> 
> YEAH!!
> 
> I will re-run my performance benchmark with MySql and compare it with previous
> test. (I will use Mysql)
> 
> 
> Simon
> 

Many testcase are the same value (traversing graph)..

I see a difference for commit process. (here some example)

Before

test_LongCommit	14328
test_LongCommit_Company	9641

After

test_LongCommit 11609 
test_LongCommit_Company 7407
Comment 17 Eike Stepper CLA 2008-11-27 03:23:00 EST
Fix available in CDO 2.0.0 I200811270213
Comment 18 Eike Stepper CLA 2009-06-27 11:49:03 EDT
Generally available.