Re: [eclipselink-users] JPA performance over JDBC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [eclipselink-users] JPA performance over JDBC

From: "Michael O'Brien" <michael.obrien@xxxxxxxxxx>
Date: Tue, 03 Aug 2010 07:32:36 -0400
Delivered-to: eclipselink-users@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/eclipselink-users>
List-help: <mailto:eclipselink-users-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/eclipselink-users>, <mailto:eclipselink-users-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/eclipselink-users>, <mailto:eclipselink-users-request@eclipse.org?subject=unsubscribe>
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Doug,
    Your finding that on Intel - that the JPA queries are faster than the base JDBC (even with a reduced select clause) is very interesting and kind of counter intuitive. If we rule out the EclipseLink cache - from first looks without more EclipseLink database traffic detail - it looks like the JDBC code, Derby or the JVM either are running on may be intentionally or inadvertently optimized for AMD architecture.
    We should look into this more.
    /michael

---------------------------------------------other details----------------------------------------------------------------------

    I would expect/guess that the processor makes a difference.   Aside from the 21% clock and 9% table size difference, the Intel chip has 2 (more efficient) cores (if jdbc exploits parallelism) and 4 times the L2 cache. Cache makes a big difference both on chip and in the software - remember when celeron processors were slow (they had their cache disabled).
    The JDBC queries are likely benefiting from the better architecture of the 65nm 2-core 2007 Intel E6750 chip running with 2mb/core L2 cache in comparison to the very old 130nm 1-core AMD 3500+ chip with 512kb L2 cache from 2004. The fact that the E6750 also has support for DDR3 and has a 1333 FSB will mean that memory speed will definitely be better on the Intel platform. Your ram on the AMD is likely very old PC2700/3200 which would be a lot slower.


    However, I find it interesting - the fact that the slow and outdated AMD system is as fast as it is (16% faster by records/sec). This shows us that when similar chips are run from AMD and Intel - that AMD would likely a lot faster. If you retest using for example an AMD Athlon 64 X2 5000+(65nm .5mb L2), 5200+ (90nm 1mb L2) processor - you may get a more level test and I would expect that the AMD chips beats the Intel chip - based on your tests.

    Some differences may also be due to...
Hardware:
    - L1 instr/data on chip cache (per core)
    - L2 on chip cache (per processor)
    - GHz (both core and multipler should match for both chips)
    - bus speed
    - RAM type (the i7-920 is capable of running tripple channel for example)
    - RAM speed
    - number of threads do not match number of cores - more threads than cores will introduce context switching delays
    - number of active cores (are you using a dual or quad core)
    - technology of the die (32 nm on the i7 has a more efficient pipeline than the core in the 45 and 65 nm and even 130nm versions)
    - hyperthreading on/off (the flagship Intel i7-920 has it back on for a total of 8 threads) - HT can add 5-10% speedup depending on the algorithm
    - type of motherboard chipset
    - Is the library architecture aware? likely it is generic IA32 but there are database vendors that currently exploit the parallelism in the SSEn instructions and even the GPU vector instructions (aka CUDA and ATI Stream) if they are available
       (for example the Join in your test will greatly benefit from using any vector operation instruction if they are available)
    I am also assuming that the 2 systems are virtually identical in other specs - especially HD speed/cache - if you are not running everything in-memory.
    - no HD access - this is the single biggest barrier to performance and variability (you will want to run an in-memory database)
    - resource contention (is everything else off) you may want to capture the ram/proc usage with JConsole (ideally JRMC in JRockit to minimize JVM overhead

Software:
    There are a lot of possible differences here, I will just list a couple.
       - remember that the millisecond timer is only accurate to 15 ms - the nanosecond timer is more precise (this should not be an issue for you since your time is an accumulation of seconds)
       - L1 software cache (EclipseLink cache) - you will want to prep the cache (fill it) before accessing entities directly from the database
       (however the fact that your AMD tests are the same - precludes the EclipseLink cache)
       - JVM technology that is optimized for a particular processor or favors one with a large L2 cache (SUN, JRockit, J9)
       - 64 bit JVM (mostly better word alignment (no dual fetch) in IA64 in addition to better usage of L2 cache at the cost of 2x RAM (where 8GB acts like 4)
       As an example if I only change the int id fields - I can manufacture a 4-5x speedup by switching to a 64 bit version of a JVM
http://wiki.eclipse.org/EclipseLink/Building/64bit#JVM_In-Memory_4x_to_5x_speedup_running_64_bit_SUN_JVM_from_32_bit_version
       (however in your case you are likely running a 32-bit version of XP)
       - running with the JVM server flag (differences in handling of multithreaded apps)


    thank you for running this.
    /michael
Daoqi(Doug) Yang wrote:

I did some performance comparison on JPA queries and JDBC queries using EclipseLink 2.1.

My one test result on an Intel processor shows that JPA queries are 15% faster than functionally equivalent, but slightly simplified JDBC queries , while another test result on an AMD processor says JPA queries are the same as JDBC queries.

In both tests, JPA named queries, JPA criteria queries, and JPA Native queries are equally fast.

My question is: Has any one done similar tests and observed different results from my mine? How to explain EclipseLink is faster on one machine architecture over the other?

See below for more testing details.

Thanks,
Daoqi

************ more details **************************************

Since a JPA implementation is a framework built on the top of JDBC, a valid question to ask is: how much overhead does a JPA implementation introduce over using JDBC directly. In the following, I use EclipseLink JPA 2.1 to do some performance testing and compare it to JDBC.

A typical insert, update, and delete operation is performed on one row at a time and to one database table only. Thus I am not worried about possible performance overhead from JPA. My test is focused on query performance between JPA and JDBC. My test scenario is constructed as follows:

1)      Use some well-known entities: Order, LineItem, and Customer with some obvious relationships and mapping.

2)      Perform JPA JPQL queries, JPA criteria queries, and JPA native queries, and JDBC queries on separate JVM runs. These queries are functionally equivalent with JPA native queries and JDBC queries perform almost exactly the same SQL queries. The JDBC queries are slightly simplified versions of the SQL queries generated from JPA JPQL queries by EclipseLink.

3)      They simulate a typical web application, in which a max number of records is retrieved from all queries, and the values in the JDBC result set are programmatically converted into a list of Java objects. JPA queries return a list of Java objects automatically. These objects are potentially used in a framework such as Struts or JSF for display in the browser.

A test run using Derby 10.4.2 database on a Windows XP computer with Intel Core Duo CPU, 2.66 GHz, and 3.5 GB RAM gives the following results:

Query Types

Time in seconds

JPA JPQL Queries

53

JPA Criteria Queries

53

JPA Native Queries

53

JDBC Queries

62

This test run has 11984 records in the Order table, 23966 records in the LineItem table, and 47935 records in the Customer table. It shows that JPA queries are 15% faster than their equivalent JDBC queries. Also, JPA criteria queries are as fast as JPQL queries, and JPA native queries have the same efficiency as JPQL queries.

Another test run on the same queries using Derby 10.4.2 database on a Windows XP computer with 2.20 GHz AMD Athlon 64 processor 3500+ and 2.5GB RAM gives the following results:

Query Types

Time in seconds

JPA JPQL Queries

42

JPA Criteria Queries

42

JPA Native Queries

42

JDBC Queries

42

This test run has 11008 records in the Order table, 22008 records in the LineItem table, and 44000 records in the Customer table. It shows that JPA queries are as fast as their equivalent JDBC queries.

Next, I show the three queries that are run multiple times during each test.

Query 1

The first JPQL named query involves a two table inner join and reads:

SELECT o FROM Order o JOIN o.lineItems i WHERE i.price > 5

EclipseLink JPA generates the following SQL query during execution:

SELECT t1.ORDER_ID_PK, t1.CREATE_TIME, t1.CREATE_USER, t1.PRICE,

       t1.STATUS, t1.UPDATE_TIME, t1.ORDER_TIME, t1.VERSION,

       t1.UPDATE_USER, t1.CUSTOMER_ID_FK

FROM jpatest.LINE_ITEM t0, jpatest.ORDERS t1

WHERE ((t0.PRICE > ?) AND (t0.ORDER_ID_FK = t1.ORDER_ID_PK))

The corresponding first JDBC query is based on the generated SQL above but with fewer data fields in the SELECT clause:

SELECT t1.ORDER_ID_PK, t1.PRICE, t1.STATUS, t1.ORDER_TIME

FROM jpatest.LINE_ITEM t0, jpatest.ORDERS t1

WHERE ((t0.PRICE > ?) AND (t0.ORDER_ID_FK = t1.ORDER_ID_PK))

The first JPA native query executes the same SQL query as the JDBC query above, and the first JPA criteria query corresponds exactly to the JPQL query above.

Query 2

The second JPQL query involves a two table outer join and reads:

SELECT o FROM Order o LEFT OUTER JOIN o.lineItems i

WHERE o.orderTime < :today

EclipseLink JPA generates the following SQL query during execution:

SELECT t1.ORDER_ID_PK, t1.CREATE_TIME, t1.CREATE_USER, t1.PRICE,

       t1.STATUS, t1.UPDATE_TIME, t1.ORDER_TIME, t1.VERSION,

       t1.UPDATE_USER, t1.CUSTOMER_ID_FK

FROM jpatest.ORDERS t1 LEFT OUTER JOIN jpatest.LINE_ITEM t0

     ON (t0.ORDER_ID_FK = t1.ORDER_ID_PK)

WHERE (t1.ORDER_TIME < ?)

The corresponding second JDBC query is based on the generated SQL above but with fewer data fields in the SELECT clause:

SELECT t1.ORDER_ID_PK, t1.PRICE, t1.STATUS, t1.ORDER_TIME

FROM jpatest.ORDERS t1 LEFT OUTER JOIN jpatest.LINE_ITEM t0

     ON (t0.ORDER_ID_FK = t1.ORDER_ID_PK)

WHERE (t1.ORDER_TIME < ?)

The second JPA native query executes the same SQL query as the JDBC query above, and the second JPA criteria query corresponds exactly to the JPQL query above.

Query 3

The third JPQL query involves third tables with a subquery and reads:

SELECT c FROM Customer c

WHERE (

        SELECT COUNT(o) FROM c.orders o WHERE SIZE(o.lineItems) >= 1

       ) > :count

EclipseLink JPA generates the following SQL query during execution:

SELECT t0.CUSTOMER_ID_PK, t0.CUSTOMER_TYPE, t0.CREATE_TIME,

       t0.UPDATE_TIME, t0.VERSION, t0.PICTURE, t0.CREATE_USER,

       t0.INCOME, t0.NAME, t0.UPDATE_USER, t0.ADDRESS_ID_FK,

       t0.ACCOUNT_NUMBER, t0.BANK_NAME, t0.ROUTING_NUMBER,

       t1.CUSTOMER_ID_FPK, t1.CREDIT_LIMIT, t1.CARD_NUMBER,

       t2.CUSTOMER_ID_FPK, t2.EXPIRATION_DATE, t2.DISCOUNT_RATE

FROM jpatest.CUSTOMER t0 LEFT OUTER JOIN jpatest.GOLD_CUSTOMER t1

    ON (t1.CUSTOMER_ID_FPK = t0.CUSTOMER_ID_PK)

    LEFT OUTER JOIN jpatest.PREFERRED_CUSTOMER t2

    ON (t2.CUSTOMER_ID_FPK = t0.CUSTOMER_ID_PK)

WHERE ((SELECT COUNT(t3.ORDER_ID_PK) FROM jpatest.CUSTOMER t4,

       jpatest.ORDERS t3 WHERE ((((SELECT COUNT(t5.LINE_ITEM_ID_PK)

       FROM jpatest.LINE_ITEM t5

       WHERE (t5.ORDER_ID_FK = t3.ORDER_ID_PK)) >= ?)

       AND (t0.CUSTOMER_ID_PK = t4.CUSTOMER_ID_PK))

       AND (t3.CUSTOMER_ID_FK = t4.CUSTOMER_ID_PK))) > ?)

EclipseLink JPA binds the first parameter with the value of 1.





The corresponding third JDBC query is based on the generated SQL above but with fewer data fields in the SELECT clause:

SELECT t0.CUSTOMER_ID_PK, t0.CUSTOMER_TYPE, t0.PICTURE, t0.INCOME,

       t0.NAME, t0.UPDATE_USER, t0.ADDRESS_ID_FK, t0.ACCOUNT_NUMBER,

       t0.BANK_NAME, t0.ROUTING_NUMBER,

       t1.CUSTOMER_ID_FPK, t1.CREDIT_LIMIT, t1.CARD_NUMBER,

       t2.CUSTOMER_ID_FPK, t2.EXPIRATION_DATE, t2.DISCOUNT_RATE

FROM jpatest.CUSTOMER t0 LEFT OUTER JOIN jpatest.GOLD_CUSTOMER t1

     ON (t1.CUSTOMER_ID_FPK = t0.CUSTOMER_ID_PK)

     LEFT OUTER JOIN jpatest.PREFERRED_CUSTOMER t2

     ON (t2.CUSTOMER_ID_FPK = t0.CUSTOMER_ID_PK)

WHERE ((SELECT COUNT(t3.ORDER_ID_PK) FROM jpatest.CUSTOMER t4,

      jpatest.ORDERS t3

      WHERE ((((SELECT COUNT(t5.LINE_ITEM_ID_PK)

      FROM jpatest.LINE_ITEM t5

      WHERE (t5.ORDER_ID_FK = t3.ORDER_ID_PK)) >= ?)

      AND (t0.CUSTOMER_ID_PK = t4.CUSTOMER_ID_PK))

      AND (t3.CUSTOMER_ID_FK = t4.CUSTOMER_ID_PK))) > ?)

The first parameter in this JDBC query is also bound with value 1.

The third JPA native query executes the same SQL query as the JDBC query above, and the third JPA criteria query corresponds exactly to the JPQL query above.

During the test run for JPQL queries, the following Java code is executed multiple times and running time is measured:

protected void queryUsingJPQL() {

EntityManager em = emf.createEntityManager();

try {

    // JPQL query 1: join two tables

    {

      Query q = em.createQuery(jpqlQuery1);

      q.setMaxResults(maxRows);

      q.setFirstResult(0);

      List resultList = q.getResultList();

      for (Object o : resultList) {

      double price = ((Order) o).getPrice();

      if (debug) {

          System.out.println("jpql inner join o: " + price);

      }

      }

     }

    // JPQL query 2: join two tables with outer join

    {

      . . .

    }

    // JPQL query 3: join two tables with a subquery

    {

      . . .

    }

   } catch (Throwable t) {

     t.printStackTrace();

   }

   em.close();

}

Note that a fixed number of records is retrieved for each query to simulate a typical web application in which a page of data is displayed on the screen each time.

During the test run for JDBC queries, the following Java code is executed multiple times and running time is measured:

protected void queryUsingJDBC() {

try {

    Connection conn = DriverManager.getConnection(databaseURL,

                              databaseUsername, databasePassword);

    // JDBC query 1: join two tables

    {

      PreparedStatement ps = conn.prepareStatement(jdbcQuery1);

      ps.setDouble(1, 5.0);

      ps.setMaxRows(maxRows);

      ResultSet rs = ps.executeQuery();





      List<Order> orderList = new ArrayList<Order>();

      while (rs.next()) {

      Order order = new Order();

      rs.getInt(1);

      order.setPrice(rs.getDouble(2));

      int status = rs.getInt(3);

      order.setStatus(OrderStatus.fromInt(status));

      order.setOrderTime(rs.getTimestamp(4));

      orderList.add(order);

      if (debug) {

          System.out.println("jdbc inner join o: " + order.getPrice());

      }

      }

    }

    // JDBC query 2: join two tables with outer join

    {

       . . .

    }

    // JDBC query 3: join two tables with a subquery

    {

       . . .

    }

    conn.close();

} catch (Throwable t) {

    t.printStackTrace();

}

}

To make the JDBC and JPQL queries functionally equivalent, the values in the JDBC result set are converted into Java objects. These objects are typically used by frameworks such as Struts or JSF in the presentation layer. This will make it an apple for apple comparison.
_______________________________________________
eclipselink-users mailing list
eclipselink-users@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/eclipselink-users
  

References:
- [eclipselink-users] JPA performance over JDBC
  - From: Daoqi(Doug) Yang

Prev by Date: [eclipselink-users] Have to add readAllObject before read a single Object?
Next by Date: Re: [eclipselink-users] Customizing delete calls before updating a ElementCollection
Previous by thread: RE: [eclipselink-users] JPA performance over JDBC
Next by thread: [eclipselink-users] Have to add readAllObject before read a single Object?
Index(es):
- Date
- Thread

Breadcrumbs