Bug 278650 - QueryValueHolder Strategy for Better Memory Management
Summary: QueryValueHolder Strategy for Better Memory Management
Status: NEW
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Eclipselink (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 enhancement with 2 votes (vote)
Target Milestone: ---   Edit
Assignee: Nobody - feel free to take it CLA
QA Contact:
URL: http://wiki.eclipse.org/EclipseLink/D...
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2009-06-01 13:59 EDT by NateB CLA
Modified: 2022-06-09 10:36 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description NateB CLA 2009-06-01 13:59:35 EDT
In addition to the large number of objects, our persistence framework has a lot of interlinking (many 1-M and 1-1 back reference mappings throughout the application). The object graph that is loaded into memory is extremely intertwined. In instances of our application where datasets are very large we are running into the situation where the GC cannot free memory because active threads retain hard references to objects, which, because of QueryValueHolder, retains hard references to related objects throughout our object model. This is true regardless of the setting in the IdentityMap. 

For several reasons not worth exploring on this thread, we cannot remove these ValueHolder references. An interesting idea came out of our internal discussions and I wanted to post it to this forum to learn if this approach had any merit.

We would like to create an implementation of ValueHolder (we would assume a QueryValueHolder subclass?) with the following properties:

1) It would have a "reinitialize()" method that would release its reference on any loaded objects and be treated as if it had never loaded anything from the DB.

2) Have the ValueHolder reference itself by a WeakReference allowing the GC to harvest the entire collection. If the GC did harvest this memory, EclipseLink would treat this the same as if a "reinitialize()" request was made.
Comment 1 Doug Clarke CLA 2009-06-02 12:10:17 EDT
Assigning this enhancement to future for now. I am hopeful the discussion around this will come up with a work-around users can use today and then we'll assign adding this to an upcoming release.
Comment 2 NateB CLA 2009-06-05 16:38:34 EDT
(In reply to comment #1)
> Assigning this enhancement to future for now. I am hopeful the discussion
> around this will come up with a work-around users can use today and then we'll
> assign adding this to an upcoming release.

In your comment on the discussion forum you mentioned that the trick with this was to make sure you retained the DatabaseRow reference in the ValueHolder in order to reconstruct the data.

I've started playing with this a bit on the EL 1.1.1 code base. The QueryBasedValueHolder has 3 fields of interest:

value <== the actual collection of objects
session <== the ClientSession used for the parent object query
row <== the DB row from the parent object query.

I assume "row" is used to extract the FK in order to create the query.

What I don't understand is if you changed the "value" field from Object to WeakReference<Object>, encapsulated access to it, and tied the logic behind "isInstantiated()" on the ValueHolder to check both the boolean flag value as well as weather value had been GC'd how this would cause issues with DatabaseRow (or any others for that matter). The object looks much too simple for an issue like this.

Incidentally, I have begun testing this in the source code and am getting some very weird results. Obviously, I don't understand all of the interactions here, and I assume there is some clone/merge functionality that uses reflection that I'm not accounting for. 

But, I'm not wanting to GC the QueryBasedValueHolder itself, but rather, just the collection it is pointing to internally. I want to trick it to think that having a GC'd value is the same as not being instantiated in the first place.

What's the difference?

I'm all for looking for a workaround or working on a temporary branch of 1.1.1 until a more permanent solution is created. I'll also assign engineering resources to developing and testing the fix if we are given some guidance that would later be approved.

Any general strategy approaches would be appreciated.
Comment 3 Doug Clarke CLA 2009-06-08 05:04:59 EDT
I have started capturing some research notes and thoughts on a wiki page. The link is in the bug's URL field (above). My initial thought was that a custom indirection policy would do the trick but covering all usages will be more challenging. If we come up with any work-around we'll need to be very specific about the cases supported. 

Features that may make custom indirection more complex:
- Batch reading
- Weaving (JPA)
- Transformation mappings
- Proxy Indirection

Comment 4 NateB CLA 2009-06-09 16:32:43 EDT
A new IndirectionPolicy seems like a good start, but it would require changes to the ValueHolder inheritance chain classes (QBVH, UOWVH) for the approach I assumed we would use.

Is there any way to get a comment or some guidance on whether the approach I've suggested (changing the "value" field to a WeakReference and redefining what "isInstantiated" means) is a valid one to pursue? This is becoming a bit of an issue for us, and we've got some resources to start testing this. However, I'd appreciate being told whether this would be a recommended approach. I guess my specific questions would be:

1) Is changing QBVH.value to a WeakReference the cleanest place to implement this change? What are the other options and why would they be better from a design perspective?

2) Aside from supporting JPA, Batch, Tranformation, etc. what other areas of functionality would you become concerned about within EclipseLink with this change? I know EclipseLink does a lot of reflection internally and implementation details of the ValueHolderInterface classes are not well-encapsulated. As someone that understands this architecture much better than we do here, what gives you pause about going down this path?
Comment 5 NateB CLA 2009-06-10 13:32:41 EDT
I'm starting to understand more about the details of this issue. This is my understanding:

1) I have 2 objects in a 1->M relationship: Customer and Group(s). The Groups collection on Customer in our application uses indirection.

2) When a Customer loads from the DB, the direct mapped fields are loaded and all of the ValueHolder fields are replaced with instances of QueryBasedValueHolder that are initialized with the Session that the Customer was loaded with as well as the DatabaseRow for the Customer.

3) When "getValue()" is called on the QueryBasedValueHolder, it uses the reference to the Session as well as the DatabaseRow to execute a query to populate the collection with Group objects.

4) After this is loaded, resetFields() nulls out the references to the Session and the DatabaseRow. It also sets "isInstantiated" to true. From then on, getValue() returns the cached collection of Groups and will remain in memory until the Customer is GC'd.


So, the issue you brought up is: how could a DatabaseValueHolder reconstruct the data it the value had been GC'd? We need both a Session reference and a Row. 

How long is Session and Row preserved if the ValueHolder is not instantiated? 

Why is resetFields() necessary if references are kept (as long as 1 ValueHolder on Customer has not been instantiated, neither of these objects will be GC'd). I'm wondering if, in practice, the DatabaseRow would ever go away if it is kept around until all ValueHolders are instantiated.


Comment 6 Doug Clarke CLA 2009-06-11 13:12:22 EDT
Your understanding appears correct. We do clear the fields for cleanliness but its effectiveness varies with usage. I have had some success getting WeakReferences used within indirect relationships and started building an example. I still need much more testing but want to make to code available to you. The example is tracked in https://bugs.eclipse.org/bugs/show_bug.cgi?id=279643 . I will attach the current source to the example bug.
Comment 7 Eclipse Webmaster CLA 2022-06-09 10:36:22 EDT
The Eclipselink project has moved to Github: https://github.com/eclipse-ee4j/eclipselink