Bug 352845 - Reduce EclipseLink startup time
Summary: Reduce EclipseLink startup time
Status: REOPENED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Eclipselink (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 enhancement with 4 votes (vote)
Target Milestone: ---   Edit
Assignee: Nobody - feel free to take it CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2011-07-22 06:10 EDT by Patric Rufflar CLA
Modified: 2022-06-09 10:25 EDT (History)
2 users (show)

See Also:


Attachments
Profiling results of another application (41.57 KB, image/gif)
2011-07-22 12:10 EDT, Patric Rufflar CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Patric Rufflar CLA 2011-07-22 06:10:36 EDT
Build Identifier: 2.3.0

The startup time for creating an EntityManagerFactory and the first EntityManager should be reduced.

For example we have applications which do only a little work (let's say only one EntityManager.find() call) and finishes its actual work in less than one second.
However the applications needs 7 seconds to return because 6 seconds are spent within the EclipseLink startup procedure.

I think that's a way too much. 
If it is not possible to reduce startup time (I see many file systems operations/unpacking of jar files because of nested persistence units - is this really necessary for the common usage scenarious?) maybe it would be possible adding some kind of lazy startup where the initializations are made on-the-fly/on demand.

Reproducible: Always

Steps to Reproduce:
1. Create a project which has large amount of class files and uses a persistence unit of 200 entities
2. create EntityManagerFactory and EntityManager
3. measure execution time
Comment 1 Tom Ware CLA 2011-07-22 08:26:47 EDT
EclipseLink, by default, initialized much of the metadata and connects to the database in a lazy manner. (i.e. only when the first EntityManager is requested)  This will result in a slower initialize time for your first entity manager.

Starting in EclipseLink 2.3, you can request that the start-up occur at EntityManagerFactory creation time.  (In a container managed case, this is usually when the application is deployed and in an application managed case, it is when you create your first EntityManagerFactory).  To do that, use the persistence unit property: eclipselink.deploy-on-startup=True

I am closing this bug based on the above, if you still see the problem, please feel free to reopen.
Comment 2 Patric Rufflar CLA 2011-07-22 09:00:12 EDT
Tom,

imagine an application which needs to deal with one single entity (which does not have any references to other entity classes) only?

Currently, every entity will be prepared which of course takes some time (and will increase with the number of entities).

I'd really like to have a more fine grained lazy-loading than the current all-or-nothing approach.
Comment 3 Tom Ware CLA 2011-07-22 09:23:29 EDT
Can you elaborate a bit on what you mean by "more fine grained lazy-loading"

You mention you have a large persistence unit and you only want to use a single entity.  Are you in a situation where that single entity is of the same type?  Is it possible to have a smaller persistence unit containing Entities involved in those situtations?  Is it possible to cache an initialized EntityManagerFactory somewhere?

There are some costs that will be hard to make go away.  For instance in many cases, alot of the startup cost relates to connecting to the database establishing the connection pool.  How much of your time is spent in that type of operation?
Comment 4 Patric Rufflar CLA 2011-07-22 11:11:34 EDT
>Can you elaborate a bit on what you mean by "more fine grained lazy-loading"

I think that EclipseLink should be able to load only a subset of entities from a persistence unit.
Imagine a persistence unit with the entities A,B,C where B is linked to C
If an application makes a find/query/persist/whatever on A only, EclipseLink should only load A. 
If an application does a JPA operation on B, EclipseLink should load only B and C.


>Are you in a situation where that single entity is of the same type? 
What do you mean with "same type"? Same type of what?

>Is it possible to have a smaller persistence unit containing Entities involved
>in those situations?
Only theoretically possible. This would mean that we have to create and manage several persistence units for each "area" to get an optimal startup-performance.
We would need several EntityManager instances for each transaction which we need to manage - this would massively increase the complexity.
The runtime performance would suffer because we must do a commit (with change tracking and all other expensive operations) on each EntityManager, batch writing would be less effective, and there's the question if all EntityManager would rely on the same transaction, hence using the same database connection?
I think that's an overkill only to get a better startup performance.

>Is it possible to cache an initialized EntityManagerFactory somewhere?
We do that already.
In all of our applications an EntityManagerFactory will only be instanced once.

>There are some costs that will be hard to make go away.  For instance in many
>cases, alot of the startup cost relates to connecting to the database
>establishing the connection pool.  How much of your time is spent in that type
>of operation?

I am not using the internal Eclipslink connection pool.
I did some profiling, here are some results:

Total running time of the application: 6.9 seconds
Time required for creating the EntityManagerFactory: 3.2 seconds
Time required for creating the EntityManager: 2.0 seconds

"Hot spots":

2.6 seconds are taken by 
org.eclipse.persistence.internal.jpa.EntityManagerSetupImpl.predeploy

0.7 seconds are taken by 
org.eclipse.persistence.sessions.Project.convertClassNamesToClasses

0.6 seconds are taken by
org.eclipse.persistence.internal.jpa.deployment.JPAInitializer.findPersistenceUnitInfo

Only 0.7 seconds are taken by 
org.eclipse.persistence.internal.jpa.EntityManagerFactoryProvider.login
(which is related to establishing the database connections, detecting platform etc. which is OK for me)

If you need more detailed profiling results, let me know.
Comment 5 Tom Ware CLA 2011-07-22 11:44:26 EDT
Are your results with eclipselink.deploy-on-startup=True?  My expectation is that the major things you have listed will occur at EntityManagerFactory creation time when that property is enabled and will be split between EntityManagerFactory creation time and 1st EntityManager use without the property (On EclipseLink 2.3)

Reopening as an enhancement.

Community, if this is a big issue for you, please vote for this issue.
Comment 6 Patric Rufflar CLA 2011-07-22 11:59:14 EDT
I'm not (explicitly) setting eclipselink.deploy-on-startup=True (so I'm relying on the defaults).

Please note: In the profiled application I am _always_ creating exactly one  EntityManagerFactory and _always_ creating exactly one EntityManager - no more, no less.
But: I am only using the EntityManager to execute one Query which is only using one entity class - the other 200 entity classes are not required by the application at all (but still loaded by EclipseLink)
So the application would benefit much from a separate lazy loading of entity independent class groups.
Comment 7 Patric Rufflar CLA 2011-07-22 12:10:26 EDT
Created attachment 200212 [details]
Profiling results of another application

Attached profiling results of another application. 
(instances _one_ EntityManagerFactory and _one_ EntityManager)

Total running time of the application: 23.6s

As you can see, 70% is spent creating the EntityManagerFactory, 7,8% is spent creating the EntityManager.
Comment 8 Tom Ware CLA 2011-07-22 13:04:46 EDT
Thanks for the info:

FYI: This feature might be useful.

http://wiki.eclipse.org/EclipseLink/Examples/JPA/Composite

You could use it to break your persistence units up into chunks and still be able to access the mappings as a whole.  (Smaller persistence unit should be faster creation time)
Comment 9 Patric Rufflar CLA 2015-12-17 11:33:35 EST
I'm currently upgrading EclipseLink from 2.4.2 to 2.6.1 and I still got the feeling that the startup time hasn't improved much so far, at least for the environment's I'm working with.

While a few seconds are irrelevant for production/qa environments,
they're really hurting for unit testing / automatic testing environments where a lot of startups happen.


Observations on EclipseLink 2.6.1 / java 8 on windows:
(one persistence unit located in a common class directory,
mid-sized project, eclipselink.exclude-eclipselink-orm=true)


Acquiring an EntityManagerFactory in my setup still takes more than 3.5 seconds:

- 0.8 seconds (23%) are wasted for an expensive search (see below) for a (non-existent) orm.xml file 
[org.eclipse.persistence.internal.jpa.metadata.MetadataProcessor.loadMappingFiles()]

- The DirectoryArchive-related code seems to crawl parts of the file system and issues many I/O-Operations.
This code takes the most significant part (1.5 seconds) of the EntityManagerFactory creation.

For example, in my case, it invokes many File.isDirectory() calls, which (on windows) is expensive because it calls java.io.WinNTFileSystem.getBooleanAttributes()

Other projects already identified this as a performance penalty:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=450629
https://netbeans.org/bugzilla/show_bug.cgi?id=168389#c23



Suggestions:

* Please add possibility to disable orm.xml scanning (e.g. by introducing another property).
I'm think that most of the persistence units don't use orm.xml. 

* Revise DirectoryArchive-related code if it can be refactored in way that it will do less I/O operations (maybe using's java 7 nio2 FileVisitor API would be more efficient, too)

* If possible, do further performance tests when using a persistence unit stored in a common class directory rather than a jar
Comment 10 Eclipse Webmaster CLA 2022-06-09 10:25:52 EDT
The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink