Community
Participate
Working Groups
Build Identifier: 2.3.0 The startup time for creating an EntityManagerFactory and the first EntityManager should be reduced. For example we have applications which do only a little work (let's say only one EntityManager.find() call) and finishes its actual work in less than one second. However the applications needs 7 seconds to return because 6 seconds are spent within the EclipseLink startup procedure. I think that's a way too much. If it is not possible to reduce startup time (I see many file systems operations/unpacking of jar files because of nested persistence units - is this really necessary for the common usage scenarious?) maybe it would be possible adding some kind of lazy startup where the initializations are made on-the-fly/on demand. Reproducible: Always Steps to Reproduce: 1. Create a project which has large amount of class files and uses a persistence unit of 200 entities 2. create EntityManagerFactory and EntityManager 3. measure execution time
EclipseLink, by default, initialized much of the metadata and connects to the database in a lazy manner. (i.e. only when the first EntityManager is requested) This will result in a slower initialize time for your first entity manager. Starting in EclipseLink 2.3, you can request that the start-up occur at EntityManagerFactory creation time. (In a container managed case, this is usually when the application is deployed and in an application managed case, it is when you create your first EntityManagerFactory). To do that, use the persistence unit property: eclipselink.deploy-on-startup=True I am closing this bug based on the above, if you still see the problem, please feel free to reopen.
Tom, imagine an application which needs to deal with one single entity (which does not have any references to other entity classes) only? Currently, every entity will be prepared which of course takes some time (and will increase with the number of entities). I'd really like to have a more fine grained lazy-loading than the current all-or-nothing approach.
Can you elaborate a bit on what you mean by "more fine grained lazy-loading" You mention you have a large persistence unit and you only want to use a single entity. Are you in a situation where that single entity is of the same type? Is it possible to have a smaller persistence unit containing Entities involved in those situtations? Is it possible to cache an initialized EntityManagerFactory somewhere? There are some costs that will be hard to make go away. For instance in many cases, alot of the startup cost relates to connecting to the database establishing the connection pool. How much of your time is spent in that type of operation?
>Can you elaborate a bit on what you mean by "more fine grained lazy-loading" I think that EclipseLink should be able to load only a subset of entities from a persistence unit. Imagine a persistence unit with the entities A,B,C where B is linked to C If an application makes a find/query/persist/whatever on A only, EclipseLink should only load A. If an application does a JPA operation on B, EclipseLink should load only B and C. >Are you in a situation where that single entity is of the same type? What do you mean with "same type"? Same type of what? >Is it possible to have a smaller persistence unit containing Entities involved >in those situations? Only theoretically possible. This would mean that we have to create and manage several persistence units for each "area" to get an optimal startup-performance. We would need several EntityManager instances for each transaction which we need to manage - this would massively increase the complexity. The runtime performance would suffer because we must do a commit (with change tracking and all other expensive operations) on each EntityManager, batch writing would be less effective, and there's the question if all EntityManager would rely on the same transaction, hence using the same database connection? I think that's an overkill only to get a better startup performance. >Is it possible to cache an initialized EntityManagerFactory somewhere? We do that already. In all of our applications an EntityManagerFactory will only be instanced once. >There are some costs that will be hard to make go away. For instance in many >cases, alot of the startup cost relates to connecting to the database >establishing the connection pool. How much of your time is spent in that type >of operation? I am not using the internal Eclipslink connection pool. I did some profiling, here are some results: Total running time of the application: 6.9 seconds Time required for creating the EntityManagerFactory: 3.2 seconds Time required for creating the EntityManager: 2.0 seconds "Hot spots": 2.6 seconds are taken by org.eclipse.persistence.internal.jpa.EntityManagerSetupImpl.predeploy 0.7 seconds are taken by org.eclipse.persistence.sessions.Project.convertClassNamesToClasses 0.6 seconds are taken by org.eclipse.persistence.internal.jpa.deployment.JPAInitializer.findPersistenceUnitInfo Only 0.7 seconds are taken by org.eclipse.persistence.internal.jpa.EntityManagerFactoryProvider.login (which is related to establishing the database connections, detecting platform etc. which is OK for me) If you need more detailed profiling results, let me know.
Are your results with eclipselink.deploy-on-startup=True? My expectation is that the major things you have listed will occur at EntityManagerFactory creation time when that property is enabled and will be split between EntityManagerFactory creation time and 1st EntityManager use without the property (On EclipseLink 2.3) Reopening as an enhancement. Community, if this is a big issue for you, please vote for this issue.
I'm not (explicitly) setting eclipselink.deploy-on-startup=True (so I'm relying on the defaults). Please note: In the profiled application I am _always_ creating exactly one EntityManagerFactory and _always_ creating exactly one EntityManager - no more, no less. But: I am only using the EntityManager to execute one Query which is only using one entity class - the other 200 entity classes are not required by the application at all (but still loaded by EclipseLink) So the application would benefit much from a separate lazy loading of entity independent class groups.
Created attachment 200212 [details] Profiling results of another application Attached profiling results of another application. (instances _one_ EntityManagerFactory and _one_ EntityManager) Total running time of the application: 23.6s As you can see, 70% is spent creating the EntityManagerFactory, 7,8% is spent creating the EntityManager.
Thanks for the info: FYI: This feature might be useful. http://wiki.eclipse.org/EclipseLink/Examples/JPA/Composite You could use it to break your persistence units up into chunks and still be able to access the mappings as a whole. (Smaller persistence unit should be faster creation time)
I'm currently upgrading EclipseLink from 2.4.2 to 2.6.1 and I still got the feeling that the startup time hasn't improved much so far, at least for the environment's I'm working with. While a few seconds are irrelevant for production/qa environments, they're really hurting for unit testing / automatic testing environments where a lot of startups happen. Observations on EclipseLink 2.6.1 / java 8 on windows: (one persistence unit located in a common class directory, mid-sized project, eclipselink.exclude-eclipselink-orm=true) Acquiring an EntityManagerFactory in my setup still takes more than 3.5 seconds: - 0.8 seconds (23%) are wasted for an expensive search (see below) for a (non-existent) orm.xml file [org.eclipse.persistence.internal.jpa.metadata.MetadataProcessor.loadMappingFiles()] - The DirectoryArchive-related code seems to crawl parts of the file system and issues many I/O-Operations. This code takes the most significant part (1.5 seconds) of the EntityManagerFactory creation. For example, in my case, it invokes many File.isDirectory() calls, which (on windows) is expensive because it calls java.io.WinNTFileSystem.getBooleanAttributes() Other projects already identified this as a performance penalty: https://bugs.eclipse.org/bugs/show_bug.cgi?id=450629 https://netbeans.org/bugzilla/show_bug.cgi?id=168389#c23 Suggestions: * Please add possibility to disable orm.xml scanning (e.g. by introducing another property). I'm think that most of the persistence units don't use orm.xml. * Revise DirectoryArchive-related code if it can be refactored in way that it will do less I/O operations (maybe using's java 7 nio2 FileVisitor API would be more efficient, too) * If possible, do further performance tests when using a persistence unit stored in a common class directory rather than a jar
The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink