Bug 569752 - Detect non disposed OS resources via Object.finalize()
Summary: Detect non disposed OS resources via Object.finalize()
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.19   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.19 M2   Edit
Assignee: Alexandr Miloslavskiy CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
Depends on:
Blocks: 566159
  Show dependency tree
 
Reported: 2020-12-16 14:09 EST by Alexandr Miloslavskiy CLA
Modified: 2021-10-15 11:43 EDT (History)
9 users (show)

See Also:


Attachments
Visual of the reporter (145.95 KB, image/png)
2021-01-18 14:19 EST, Wim Jongman CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandr Miloslavskiy CLA 2020-12-16 14:09:33 EST
A few days ago I was fighting with Image leaks in some SWT code and had an idea:

Such leaks can be detected by overloading 'Object.finalize()' in SWT for things that need to be tracked, such as 'Image'. When 'Object.finalize()' occurs and detects (!isDisposed()), it will report the leak, optionally with allocation stack.

I did notice 'Device.tracking', but it's not as good, because:
1) It incurs performance penalty by keeping track of all objects in an array
2) It only gives a count of leaked objects, which isn't too helpful.

I will implement the code and try it in some real-world scenarios to see if it's worthy.
Comment 1 Eclipse Genie CLA 2020-12-17 15:27:30 EST
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/173938
Comment 2 Eclipse Genie CLA 2020-12-17 15:27:34 EST
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/173939
Comment 3 Alexandr Miloslavskiy CLA 2020-12-18 15:49:47 EST
I ran JUnit tests on all 3 platforms with this patch and only found one leak, which I also fixed (see second gerrit).
Comment 4 Eclipse Genie CLA 2021-01-01 05:36:18 EST
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.swt.binaries/+/174167
Comment 7 Andrey Loskutov CLA 2021-01-01 12:20:07 EST
Alexandr, please provide N&N note for this change (see https://git.eclipse.org/r/www.eclipse.org/eclipse/news.git).
Comment 9 Eclipse Genie CLA 2021-01-05 10:04:22 EST
New Gerrit change created: https://git.eclipse.org/r/c/www.eclipse.org/eclipse/news/+/174286
Comment 11 Andrey Loskutov CLA 2021-01-05 10:07:41 EST
(In reply to Andrey Loskutov from comment #7)
> Alexandr, please provide N&N note for this change (see
> https://git.eclipse.org/r/www.eclipse.org/eclipse/news.git).

Done via 

(In reply to Eclipse Genie from comment #10)
> Gerrit change
> https://git.eclipse.org/r/c/www.eclipse.org/eclipse/news/+/174286 was merged
> to [master].
> Commit:
> http://git.eclipse.org/c/www.eclipse.org/eclipse/news.git/commit/
> ?id=7b3eab8772d881c117b41922ad6e4b44e0107495
Comment 12 Eclipse Genie CLA 2021-01-05 12:14:21 EST
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/174291
Comment 14 Mickael Istria CLA 2021-01-05 13:02:26 EST
That's a great idea and improvement, thanks all!
Comment 15 Alexandr Miloslavskiy CLA 2021-01-06 05:21:19 EST
@Andrey thanks for taking care of this and other patches while I was on vacation!

One question about your edits: is it correct that 'public Resource()' will NOT collect stacks?
Comment 16 Phil Beauvoir CLA 2021-01-08 05:51:39 EST
Ales, this is an interesting addition.

Are there any implications due to Object.finalize being deprecated since Java 9?
Comment 17 Alexandr Miloslavskiy CLA 2021-01-08 11:59:03 EST
I don't think so. To my understanding, it's deprecated only because trying to use it to actually dispose resources is problematic.
Comment 18 Phil Beauvoir CLA 2021-01-08 12:11:37 EST
(In reply to Alexandr Miloslavskiy from comment #17)
> I don't think so. To my understanding, it's deprecated only because trying
> to use it to actually dispose resources is problematic.

OK, it's just that I was reading this:

https://stackoverflow.com/questions/56139760/why-is-the-finalize-method-deprecated-in-java-9

Quote:

"As of this writing (2019-06-04) there is no concrete plan to remove finalization from Java. However, it is certainly the intent to do so. We've deprecated the Object.finalize method, but have not marked it for removal. This is a formal recommendation that programmers stop using this mechanism. It's been known informally that finalization shouldn't be used, but of course it's necessary to take a formal step. In addition, certain finalize methods in library classes (for example, ZipFile.finalize) have been deprecated "for removal" which means that the finalization behavior of these classes may be removed from a future release. Eventually, we hope to disable finalization in the JVM (perhaps first optionally, and then later by default), and at some point in the future actually remove the finalization implementation from garbage collectors."
Comment 19 Jens Lideström CLA 2021-01-09 06:06:20 EST
This looks like a useful feature.

But I don't think it is a good idea to base it on finalization, since that is a deprecated feature. Did you evaluate the use of the new features that replace finalization?

Also, the use of finalization has a significant performance cost. Has that been taken into consideration and evaluated?

See for example this:

* https://www.ibm.com/developerworks/java/library/j-jtp01274/index.html#20

* https://stackoverflow.com/questions/2860121/why-do-finalizers-have-a-severe-performance-penalty

* Effective Java, 2nd edition, Item 7
Comment 20 Phil Beauvoir CLA 2021-01-09 06:23:17 EST
In Resource there's a field:

private NonDisposedException allocationStack;

This is referenced in the finalize() method:

nonDisposedReporter.onNonDisposedResource(this, allocationStack);

Can this object be safely referenced at this point?

(Not trying to rain on your parade, I'm just genuinely curious to see if there are any gotchas)
Comment 21 Alexandr Miloslavskiy CLA 2021-01-09 06:33:45 EST
> Also, the use of finalization has a significant performance cost. Has that
> been taken into consideration and evaluated?

No, this comes somewhat unexpected. I wonder what the actual performance penalty is... I guess another test snippet is needed here.
Comment 22 Alexandr Miloslavskiy CLA 2021-01-09 06:34:28 EST
> Can this object be safely referenced at this point?

Yes, according to java docs, the object can even be "resurrected" by re-referencing it from finalizer.
Comment 23 Phil Beauvoir CLA 2021-01-09 06:48:20 EST
Thanks, Alex.

But with:

nonDisposedReporter.onNonDisposedResource(this, allocationStack);

The Resource could be referenced in the NonDisposedReporter instance. Can the Resource now be garbage collected if there a new reference to it in NonDisposedReporter?
Comment 24 Alexandr Miloslavskiy CLA 2021-01-09 06:50:43 EST
If the reference is kept elsewhere, the object will be "resurrected" until the new reference goes away.

If reporter doesn't save the reference to some other alive object, then in my understanding it will be garbage collected just fine.
Comment 25 Phil Beauvoir CLA 2021-01-09 06:51:34 EST
Thanks, Alex. Sorry for the noise. :-)
Comment 26 Alexandr Miloslavskiy CLA 2021-01-09 06:53:12 EST
I think it wasn't noise but rather reasonable questions. Also, I'm not even a java programmer, so I can definitely overlook things.
Comment 27 Phil Beauvoir CLA 2021-01-09 06:55:27 EST
(In reply to Alexandr Miloslavskiy from comment #26)
> I'm not even a java programmer...

:-O

If you're not a Java programmer then I think I need to give up programming!
Comment 28 Alexandr Miloslavskiy CLA 2021-01-09 07:02:48 EST
Pretty much all of Java code I written can be seen in SWT. That's some 500 lines total, maybe. Compare this to 20+ years of c++ :)  Don't give up programming yet :)
Comment 29 Andrey Loskutov CLA 2021-01-09 07:46:35 EST
(In reply to Alexandr Miloslavskiy from comment #15)
> @Andrey thanks for taking care of this and other patches while I was on
> vacation!
> 
> One question about your edits: is it correct that 'public Resource()' will
> NOT collect stacks?

My fault, for whatever reason I've assumed that all "real" SWT resources (non-Colors) always call second constructor, and I haven't noticed that GC is left alone :-) I will push a patch in a moment.
Comment 30 Eclipse Genie CLA 2021-01-09 08:01:28 EST
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/174524
Comment 32 Nikita Nemkin CLA 2021-01-16 13:20:43 EST
Alexandr Miloslavskiy from comment #0)
> I did notice 'Device.tracking', but it's not as good, because:
> 1) It incurs performance penalty by keeping track of all objects in an array
Device.setTracking performance is perfectly acceptable. The major cost is collecting call stacks, not object list management. Moreover, device tracking cost is _zero_, unless enabled.

Finalizer-based tracking _unconditionally_ enables finalization and adds 2 extra fields to every Resource. Finalization mechanism is deprecated and has non-trivial GC cost.

> 2) It only gives a count of leaked objects, which isn't too helpful.
Device does track allocation call stacks, see Device.error.


This feature uses a questionable approach to duplicate existing functionality. It also adds unnecessary public API that will be impossible to fix/remove due to Eclipse API guarantees.

Please revert.

PS. The modern approach to leak tracking is a ReferenceQueue of PhantomReferences, but it has its own problems.
Comment 33 Alexandr Miloslavskiy CLA 2021-01-16 15:06:03 EST
(In reply to Nikita Nemkin from comment #32)
> Device.setTracking performance is perfectly acceptable. The major cost is
> collecting call stacks, not object list management.

I'm not so sure. Currently both add and remove operations impose O(N*N) complexity. With some 500 objects in the list, I think that's already non-trivial performance cost. Acceptable? Could be. Without a test snippet, I wouldn't really make claims regarding which approach is slower.

> Moreover, device tracking cost is _zero_, unless enabled.
> Finalization mechanism ... has non-trivial GC cost.

This was unforeseen.

> adds 2 extra fields to every Resource

Technically this is correct, however in practice the cost is negligible. I don't expect more than 10000 Resources at any given time, which means some 160k memory in worst case.

> Device does track allocation call stacks, see Device.error.

Indeed I overlooked this.

----

With new input and taking unforeseen things into account, the new API is still easier to use, because it doesn't require to parse through lots of noise to find the leak. In Bug 570094, there's already a number of leaks found.

With old API, there were just two ways possible:
1) A human goes through lots of noise - well, practice shows that this never works well. The number of newly found leaks confirms that old approach didn't work so well.
2) Wait for Display.dispose() and mass detect leaks there. Problem is, I think that for lots of real applications doing something after Display.dispose() is very inconvenient.

I didn't really have time to think it through, but one possible workaround could be to move '.finalize()' from 'Resource' to some new 'class ResourceFinalizer' which will have a link to parent 'Resource'. This way, 'ResourceFinalizer' can be allocated only when tracking is enabled, removing the cost for cases when it's not enabled. The cost of when it's actually enabled is yet to be determined and it doesn't really matter because it's merely a cost-benefit tradeoff that anyone can make to their liking.
Comment 34 Nikita Nemkin CLA 2021-01-17 02:11:13 EST
(In reply to Alexandr Miloslavskiy from comment #33)
> I'm not so sure. Currently both add and remove operations impose O(N*N)
> complexity. With some 500 objects in the list, I think that's already
> non-trivial performance cost. Acceptable? Could be. Without a test snippet,
> I wouldn't really make claims regarding which approach is slower.
Well, make Device.objects/errors a HashMap<Resource, Error> and done.

> With new input and taking unforeseen things into account, the new API is
> still easier to use, because it doesn't require to parse through lots of
> noise to find the leak. In Bug 570094, there's already a number of leaks
> found.
No idea what you mean by lots of noise.

If you set display.DEBUG = true, Display.dispose will print leaked object counts and a stack trace for every leaked allocation.

If you want to do something else with this data, register a disposeExec listener:

    display.disposeExec(() -> {
        DeviceData data = display.getDeviceData();
        // Non-null data.objects are leaks, do anything you want with them.
        // Stack traces are in data.errors.
    });

This approach is reliable (if tracking is enabled early), there are no false positives or false negatives or any other "noise".

Compared to this, GC-based tracking is non-deterministic. GC happens at random times or not at all. For example, there's no GC on app exit and no GC if enough heap memory is available. You have to force System.gc() a couple of times just to be sure that all finalizers have run.

The claim about the new API success is also misleading. You'd find all the same leaks using device tracking, if you enable it. I agree that having a sysvar for this would be nice.

> I didn't really have time to think it through, but one possible workaround
> could be to move '.finalize()' from 'Resource' to some new 'class
> ResourceFinalizer' which will have a link to parent 'Resource'. This way,
> 'ResourceFinalizer' can be allocated only when tracking is enabled, removing
> the cost for cases when it's not enabled. The cost of when it's actually
> enabled is yet to be determined and it doesn't really matter because it's
> merely a cost-benefit tradeoff that anyone can make to their liking.
ReferenceQueue is what you're looking for. With a queue, most finalizer issues don't apply. But queues have to be polled sometime and they still require a forced GC for reliability.

I insist on removal of duplicate API and problematic tracking code.
Comment 35 Phil Beauvoir CLA 2021-01-18 06:40:25 EST
I'm following this issue with interest.

Given that the issues around this change are are being contended and that adding a new API might be hard to reverse (Nikita - "...public API that will be impossible to fix/remove due to Eclipse API guarantees") would it make sense to revert the changes for now?
Comment 36 Andrey Loskutov CLA 2021-01-18 10:19:28 EST
(In reply to Alexandr Miloslavskiy from comment #33)
> With new input and taking unforeseen things into account, the new API is
> still easier to use, because it doesn't require to parse through lots of
> noise to find the leak. In Bug 570094, there's already a number of leaks
> found.

I can confirm that the "easy of use" is given. I've enabled this for our product tests and it was immediately useful without providing extra code that would be needed to collect some data after device disposal, parsing the output, reporting some errors. With errors logged to Eclipse log automatically turned to test fails (which we have) I only had to add the new system property & fix lot of issues found.
 
> With old API, there were just two ways possible:
> 1) A human goes through lots of noise - well, practice shows that this never
> works well. The number of newly found leaks confirms that old approach
> didn't work so well.
> 2) Wait for Display.dispose() and mass detect leaks there. Problem is, I
> think that for lots of real applications doing something after
> Display.dispose() is very inconvenient.

One problem here is also the device tracking approach of keeping *every* resource reference. I can't imagine to enable this for our long time stability tests (we run stress tests over 3 days to detect memory leaks etc), it will most likely lead to memory issues. 

This could be fixed with the proposed HashMap<Resource, Error> for tracking.

Another problem is that the display.disposeExec() callback doesn't guarantee that the added callback for device tracking handler is the *last* one that runs, so as of today it runs before all possible bundle disposal handler that cleanup the registries, so we still have a lot of false positives that need to be manually filtered out.

So it is a bit more tricky: it has to be done inside dispose event handling:

Display display = Display.getDefault();
display.addListener(SWT.Dispose, e -> {
	display.disposeExec(() -> {
		DeviceData info = display.getDeviceData();
		Object[] objects = info.objects;
		Error[] errors = info.errors;
		for (int i = 0; i < errors.length; i++) {
			Error error = errors[i];
			Resource r = (Resource) objects[i];
			reporter.onNonDisposedResource(r, error);
		}
	});
});

And finally the biggest problem is that all the reporting happens only at the end. One can't check just one Widget or UI action - one have to wait till the shutdown. No immediate feedback is not convenient, compared with the new code.

(In reply to Nikita Nemkin from comment #34)
> (In reply to Alexandr Miloslavskiy from comment #33)
> > I'm not so sure. Currently both add and remove operations impose O(N*N)
> > complexity. With some 500 objects in the list, I think that's already
> > non-trivial performance cost. Acceptable? Could be. Without a test snippet,
> > I wouldn't really make claims regarding which approach is slower.
> Well, make Device.objects/errors a HashMap<Resource, Error> and done.

This makes sense, we can do that.

> The claim about the new API success is also misleading. 

I don't think so. I'm one of happy users being able to detect & eliminate the leaks with almost no changes in our custom product code, and this is great! Sometimes "easy of use" is the key. We can do all what we did for this bug also for device tracking, the problem will remain that the errors will be reported at the end only (and that the memory will grow due that). 

> You'd find all the
> same leaks using device tracking, if you enable it. 

Right, but... One also have to write an extra code for the test framework that would evaluate the logs *after* the tests are finished (because display is disposed after that). One have to write the parser for that that would "fail" right tests that responsible for the leaks, or emulate some extra test failures etc.

> I agree that having a
> sysvar for this would be nice.

Yes, we can use the new system property to set debug & tracking flags in Device.

> > I didn't really have time to think it through, but one possible workaround
> > could be to move '.finalize()' from 'Resource' to some new 'class
> > ResourceFinalizer' which will have a link to parent 'Resource'. This way,
> > 'ResourceFinalizer' can be allocated only when tracking is enabled, removing
> > the cost for cases when it's not enabled. The cost of when it's actually
> > enabled is yet to be determined and it doesn't really matter because it's
> > merely a cost-benefit tradeoff that anyone can make to their liking.
> ReferenceQueue is what you're looking for. With a queue, most finalizer
> issues don't apply. But queues have to be polled sometime and they still
> require a forced GC for reliability.

That polling could be done from tests, in IDE it could be a Timer.

> I insist on removal of duplicate API and problematic tracking code.

Note: please don't simply revert SWT code, because it is already used by the platform UI via bug 570094, so in case we decide to revert something, we should either do this on one day on both repositories, or revert the platform UI part first and merge SWT change day after.

Nikita, do you have numbers regarding finalize() costs in SWT? I see this as the biggest issue with the current approach. Would it be OK for you if we would switch from finalize() based approach to the ReferenceQueue based?

We should clarify if we should release new code or not before M3.

PS: while answering here I've did some changes as commented above, I've pushed them as a patch for bug 570452.
Comment 37 Nikita Nemkin CLA 2021-01-18 11:27:26 EST
(In reply to Andrey Loskutov from comment #36)
> I can confirm that the "easy of use" is given. I've enabled this for our
> product tests and it was immediately useful without providing extra code
> that would be needed to collect some data after device disposal, parsing the
> output, reporting some errors. With errors logged to Eclipse log
> automatically turned to test fails (which we have) I only had to add the new
> system property & fix lot of issues found.
You had to "provide extra code" either way, SWT doesn't post errors to Eclipse log.

>So it is a bit more tricky: it has to be done inside dispose event handling:
Indeed, making sure that dispose listener runs last is an issue. Your solution is clever, a simpler option is to register disposeExec right before display.dispose() call.

> One problem here is also the device tracking approach of keeping *every*
> resource reference. I can't imagine to enable this for our long time
> stability tests (we run stress tests over 3 days to detect memory leaks
> etc), it will most likely lead to memory issues. 
It's keeping every _undisposed_ reference. If your code doesn't leak a lot, there won't be any unnecessary retention. And if it does leak a lot, native object leak overshadows Java object leak anyway.

> And finally the biggest problem is that all the reporting happens only at
> the end. One can't check just one Widget or UI action - one have to wait
> till the shutdown. No immediate feedback is not convenient, compared
> with the new code.
Without forced GC you can't attribute finalizer execution to any particular test. Do you run System.gc(); System.gc(); after every test?

Also, device tracking can be enabled locally to verify that all objects created in a region of code had been disposed. This is exactly what Sleak does.

> Nikita, do you have numbers regarding finalize() costs in SWT? I see this as
> the biggest issue with the current approach. Would it be OK for you if we
> would switch from finalize() based approach to the ReferenceQueue based?
No, I won't be profiling JVM to prove that finalizers are bad.
No, ReferenceQueue still requires forced GC and extra queue maintenance, it's still a duplicate feature/API and the biggest issue here is complete disregard of existing code and practices.
Comment 38 Alexandr Miloslavskiy CLA 2021-01-18 12:17:06 EST
I have discussed it with a colleague who's handling Java bugs in our product and he also sees the new API as easier to use. Our product already used the old API, but due to the difficulties of it, it merely reported current counters to log, so that if large-scale leak occurs, it can be human-detected. For us, detecting leaks after 'Display.dispose()' is too late to use without additional trouble.

This makes it 4 people now, each already used the tool AND found leaks, against one opinion that it was already possible to detect bugs with old API. That's technically true that it was possible to use the old API. To put it further, even the old API wasn't needed, because people can use 3rd party tools, review their code better and just stop doing mistakes.

Have you noticed that discussion goes along the lines of "this is difficult to use" - "but you can do this" - "no this wont work too well" - "oh ok then you can do this" ?

At the same time, new API "just works". Maybe it's not entirely deterministic, but even if it doesn't (yet to be shown) detect some problems sometimes, I would say that it's going to detect most problems in just a few runs.

This said, I would suggest to evaluate 4 happy users vs "this is not needed".

As for ReferenceQueue, I'm currently evaluating it. The biggest hurdle here is the need to poll, and I'm not yet sure how to do that to keep things convenient.
In our product we use (for a different reason) 'ManagementFactory.getGarbageCollectorMXBeans()' to subscribe to GC events but it's probably not a good idea to use that in SWT.
Comment 39 Andrey Loskutov CLA 2021-01-18 12:56:14 EST
(In reply to Nikita Nemkin from comment #37)
> You had to "provide extra code" either way, SWT doesn't post errors to
> Eclipse log.

That was done via bug 570094. So in fact I only had to change our startup config for tests.
 
> It's keeping every _undisposed_ reference. If your code doesn't leak a lot,
> there won't be any unnecessary retention. And if it does leak a lot, native
> object leak overshadows Java object leak anyway.

Right, missed that the dispose() removes the references

> Without forced GC you can't attribute finalizer execution to any particular
> test. Do you run System.gc(); System.gc(); after every test?

Yep, that can be done too, but it works also pretty good without, may be because we run on huge heap. The point is to get feedback *before* shutdown.

> Also, device tracking can be enabled locally to verify that all objects
> created in a region of code had been disposed. This is exactly what Sleak
> does.

But this is nothing that can be done automatically, because there are so many cases of lazy loaded factories, caching etc that could be triggered by an action and will be shown as false positives. Bin there, done that, it is as every manual task boring and time consuming.

> > Nikita, do you have numbers regarding finalize() costs in SWT? I see this as
> > the biggest issue with the current approach. Would it be OK for you if we
> > would switch from finalize() based approach to the ReferenceQueue based?
> No, I won't be profiling JVM to prove that finalizers are bad.

It would at least help you to convince others.

> No, ReferenceQueue still requires forced GC and extra queue maintenance,
> it's still a duplicate feature/API and the biggest issue here is complete
> disregard of existing code and practices.

We are all looking for best way to track memory / resources leaks. Existing solution was not good enough, so a new solution was proposed. Yes, it duplicates part of the original one, but it also does few things better. It is not "disregarding", it is just different. Also it gives us a perfect instrument for bug reporting by end users - with enabled tracking we could interactively validate if some code leaks resources even at customer side, "remotely" troubleshooting memory issues that couldn't be reproduced locally by develolers. 

The question is here: is there a blocking issue in current patch or not?
Given ReferenceQueue based implementation I think there shouldn't be "blockers" left. Some overhead of ReferenceQueue / GC with enabled resource tracking is expected and not a problem IMO. 

So I would encourage Alexander to push a patch to get rid of finalize() and vote to keep it.
Without that I would vote for a revert.
Comment 40 Wim Jongman CLA 2021-01-18 14:11:35 EST
(In reply to Andrey Loskutov from comment #36)

> I can confirm that the "ease of use" is given.

I second that. I hope we can keep the current functionality of immediate reporting.
Comment 41 Wim Jongman CLA 2021-01-18 14:19:23 EST
Created attachment 285300 [details]
Visual of the reporter

Is it possible to create a secondary Resource object that gets woven in instead of the normal Resource object in case the property is set?
Comment 42 Alexandr Miloslavskiy CLA 2021-01-18 14:22:02 EST
(In reply to Wim Jongman from comment #41)
> Is it possible to create a secondary Resource object that gets woven in
> instead of the normal Resource object in case the property is set?

What problem will this solve? If you're trying to remove the costs when debug option is disabled, I already found solution for this: instead of having '.finalize()' in the Resource itself, it can be moved to a proxy class like 'ResourseFinalizer' which is only allocated with debugging enabled.
Comment 43 Wim Jongman CLA 2021-01-18 14:54:47 EST
(In reply to Alexandr Miloslavskiy from comment #42)
> (In reply to Wim Jongman from comment #41)
> > Is it possible to create a secondary Resource object that gets woven in
> > instead of the normal Resource object in case the property is set?
> 
> What problem will this solve? 

I addressed the API addition objection. It was just an idea.
Comment 44 Alexandr Miloslavskiy CLA 2021-01-18 15:13:19 EST
I have now spent some time studying 'ReferenceQueue' vs '.finalize()'

This is how '.finalize()' works:Finalizer.FinalizerThread
1) There is a class 'java.lang.ref.Finalizer' which contains a 'ReferenceQueue' (surprise!) [1].
2) There is a thread 'Finalizer.FinalizerThread' which extracts objects from queue and calls '.finalize()' on them [2].
3) When an object with '.finalize()' is allocated, JVM automatically adds it to said queue [3][4][5][6][7][8].

Now let's review the pros and cons of 'ReferenceQueue' vs '.finalize()':

Performance argument
--------------------
Mostly void, because both implementations are based on 'ReferenceQueue'.
Using 'ReferenceQueue' directly could even be a performance *degradation*, because it will no longer use JVM's optimized paths.

The only reasonable explanation I found for '.finalize()' is that GC can no longer sweep entire Eden when it needs to finalize asynchronously. However, even if we use 'WeakReference', we still need to keep some things in memory, such as allocation stack. Therefore, GC can't sweep entire Eden anyway. Also, it seems that this explanation requires ALL objects in Eden to be GC'ed, and if any of them is still referenced, GC can't sweep anyway.

There could be a memory improvement because with 'WeakReference', Resource itself could be GC'ed slightly earlier. Due to the nature of how 'Resource' is used and 'FinalizerThread' which will run when it has work, I think that the difference would be very negligible.

Non-deterministic argument
--------------------------
Void, because both implementations are based on 'ReferenceQueue'.

Using deprecated Java API
-------------------------
This holds, but on the other hand there's no indication that it's going to be removed any soon. It was deprecated for other reasons, none of which seem to be important for such a debug API.

Polling the queue
-----------------
'.finalize()' just works: according to (2) above, JVM itself starts a thread.
We could run a thread of our own.
Pros/cons:
+ Easy to use
- Another thread.

Another approach is to make an SWT API to poll queue.
Pros/cons:
- Inconvenient to use
- Requires app code changes
- Therefore can't be enabled via system property
- Queue could grow to big sizes if not polled often
+ Callback is called from the same thread

Another approach is to poll from SWT code, for example in Resource constructor.
+ Easy to use
- May have unwanted performance costs by polling often
- Somewhat unpredictable

Keeping enough information
--------------------------
How do we figure if the Resource was disposed properly?

'WeakReference' will not allow to query the object anymore.
'PhantomReference' will not allow to query the object anymore.
'SoftReference' randomly allows it or not. This is more or less acceptable for such API, but it will have to be marked as randomly unreliable.
'FinalReference' used in '.finalize()' is not public.
'Reference' is not allowed to be subclassed.

So we get a new problem: there's no reliable way to test the Resource at detection time. One workaround is to extend a 'WeakReference' to add a new boolean there, but making sure that field is coherent will need more complicated code.

Summary
-------
My understanding is that '.finalize()' is simpler than 'ReferenceQueue'.
At the same time, 'ReferenceQueue' doesn't seem to be better, except that it's not deprecated. If '.finalize()' can't be used anymore one day, then it can be replaced with 'ReferenceQueue' + polling thread and SWT API will continue to work.

I'd vote to continue using '.finalize()'. I'd also suggest to explain in Javadoc that API relies upon '.finalize()' in JVM and may cease to work later. It's not a show stopper anyway.

What are your thoughts?

[1] https://github.com/openjdk/jdk/blob/66943fe/src/java.base/share/classes/java/lang/ref/Finalizer.java#L38
[2] https://github.com/openjdk/jdk/blob/66943fe/src/java.base/share/classes/java/lang/ref/Finalizer.java#L171
[3] https://github.com/openjdk/jdk/blob/66943fe/src/hotspot/share/oops/instanceKlass.cpp#L1417
[4] https://github.com/openjdk/jdk/blob/66943fe/src/hotspot/share/oops/instanceKlass.cpp#L1405
[5] https://github.com/openjdk/jdk/blob/66943fe/src/hotspot/share/memory/universe.hpp#L253
[6] https://github.com/openjdk/jdk/blob/66943fe/src/hotspot/share/memory/universe.cpp#L898
[7] https://github.com/openjdk/jdk/blob/66943fe/src/java.base/share/classes/java/lang/ref/Finalizer.java#L65
[8] https://github.com/openjdk/jdk/blob/66943fe/src/java.base/share/classes/java/lang/ref/Finalizer.java#L49
Comment 45 Andrey Loskutov CLA 2021-01-18 15:35:33 EST
Interestingly for me, my naive approach to use ReferenceQueue with Resource doesn't work. Could someone explain why handleRelease() is never called (even if I create a fool proof leak)? The new thread is started, but then seems GC never puts anything to the queue. I must overlook something trivial:


diff --git a/bundles/org.eclipse.swt/Eclipse SWT/common/org/eclipse/swt/graphics/Resource.java b/bundles/org.eclipse.swt/Eclipse SWT/common/org/eclipse/swt/graphics/Resource.java
index 81da332..bd9cb8c 100644
--- a/bundles/org.eclipse.swt/Eclipse SWT/common/org/eclipse/swt/graphics/Resource.java
+++ b/bundles/org.eclipse.swt/Eclipse SWT/common/org/eclipse/swt/graphics/Resource.java
@@ -15,2 +15,4 @@
 
+import java.lang.ref.*;
+
 import org.eclipse.swt.*;
@@ -91,7 +93,26 @@
 
-	/**
-	 * Recorded at object creation if {@link #trackNonDisposed(boolean)} was
-	 * called with {@code true}, used to track resource disposal
-	 */
-	private NonDisposedException allocationStack;
+	private static ReferenceQueue<? super Resource> referenceQueue;
+
+
+	static class ResourceReference extends WeakReference<Resource> {
+		/**
+		 * Recorded at object creation if {@link #trackNonDisposed(boolean)} was
+		 * called with {@code true}, used to track resource disposal
+		 */
+		private final NonDisposedException allocationStack;
+
+		public ResourceReference(Resource referent, NonDisposedException allocationStack) {
+			super(referent, referenceQueue);
+			this.allocationStack = allocationStack;
+		}
+
+		public void handleRelease() {
+			if (nonDisposedReporter == null) {
+				return;
+			}
+        	Resource resource = get();
+        	if (resource == null || resource.nonDisposedIgnore || resource.isDisposed()) return;
+			nonDisposedReporter.onNonDisposedResource(resource, allocationStack);
+		}
+	}
 
@@ -100,7 +121,24 @@
 		if (property != null) {
-			if (property.equalsIgnoreCase("stacks")) {
-				trackNonDisposed(true);
-			} else if (property.equalsIgnoreCase("true")) {
-				trackNonDisposed(false);
-			}
+			referenceQueue = new ReferenceQueue<>();
+
+	        Thread t = new Thread("SWT reference daemon"){
+	            @Override
+	            public void run() {
+	                while (true) {
+	                    try {
+	                    	ResourceReference ref = (ResourceReference) referenceQueue.remove();
+	                    	ref.handleRelease();
+	                    } catch (Throwable e) {
+	                        e.printStackTrace();
+	                    }
+	                }
+	            }
+	        };
+	        t.setDaemon(true);
+	        t.start();
+	        if (property.equalsIgnoreCase("stacks")) {
+	        	trackNonDisposed(true);
+	        } else if (property.equalsIgnoreCase("true")) {
+	        	trackNonDisposed(false);
+	        }
 		}
@@ -110,3 +148,3 @@
 	if (collectAllocationStacks && !(this instanceof Color)) {
-		allocationStack = new NonDisposedException();
+		new ResourceReference(this, new NonDisposedException());
 	}
@@ -139,13 +177,2 @@
 /**
- * Returns the allocation stack if {@link #collectAllocationStacks} is enabled.
- * Otherwise, will return {@code null}.
- *
- * @see #trackNonDisposed(boolean)
- * @since 3.116
- */
-public NonDisposedException getAllocationStack() {
-	return allocationStack;
-}
-
-/**
  * Returns the <code>Device</code> where this resource was
@@ -179,14 +206,2 @@
 
-@Override
-protected void finalize() {
-	if (nonDisposedIgnore) return;
-	if (nonDisposedReporter == null) return;
-	if (isDisposed()) return;
-
-	// Color doesn't really have any resource to be leaked, ignore
-	if (this instanceof Color) return;
-
-	nonDisposedReporter.onNonDisposedResource(this, allocationStack);
-}
-
 /**
@@ -203,3 +218,2 @@
  *
- * @see #getAllocationStack()
  * @since 3.116
@@ -221,3 +235,2 @@
  *
- * @see #getAllocationStack()
  * @since 3.116
Comment 46 Alexandr Miloslavskiy CLA 2021-01-18 15:40:39 EST
(In reply to Wim Jongman from comment #43)
> I addressed the API addition objection. It was just an idea.

Could you please detail the idea? I don't quite understand.

I understand that you'te suggesting that when debugging is enabled, every Resource object is wrapped, ex. 'Font' is wrapped into some new 'TrackedFont extends Font'.

In this case, how will leaks be detected and reported? Which specific API does this allow to avoid?
Comment 47 Wim Jongman CLA 2021-01-18 16:20:21 EST
(In reply to Alexandr Miloslavskiy from comment #46)
> (In reply to Wim Jongman from comment #43)
> > I addressed the API addition objection. It was just an idea.
> 
> Could you please detail the idea? I don't quite understand.

Not wrapping, replacing. It is possible to replace a class at runtime with an alternative version.

So we maintain an internal object (e.g., ResoureInternal) that contains the tracking code and replaces Resource on the fly if the tracking debug option is set. So not wrapping, but replacing.

(Purely theoretical, I never attempted this before.)
Comment 48 Alexandr Miloslavskiy CLA 2021-01-18 16:23:20 EST
You still didn't say what part of API this will improve :)
Comment 49 Alexandr Miloslavskiy CLA 2021-01-18 16:32:34 EST
(In reply to Andrey Loskutov from comment #45)

> Interestingly for me, my naive approach to use ReferenceQueue with Resource
> doesn't work.

I must admit, this puzzled me as well.

I found that when I keep 'ResourceReference' in a static field, things begin to work. When I keep in in a normal field, it still fails.

I understand that 'ResourceReference' doesn't "reference itself" and it - poof - just goes away. JVM has a global list of unfinalized objects [1], I wonder if this is exactly to prevent this problem?

[1] https://github.com/openjdk/jdk/blob/66943fe/src/java.base/share/classes/java/lang/ref/Finalizer.java#L54
Comment 50 Wim Jongman CLA 2021-01-18 16:33:43 EST
(In reply to Alexandr Miloslavskiy from comment #48)
> You still didn't say what part of API this will improve :)

I did. It will not add the API that Nikita objects to.
Comment 51 Alexandr Miloslavskiy CLA 2021-01-18 16:36:40 EST
(In reply to Andrey Loskutov from comment #45)

> +        	if (resource == null || resource.nonDisposedIgnore ||
> resource.isDisposed()) return;

Note that due to the nature of 'WeakReference' this is always null. See "Keeping enough information" in Comment 44.

> +	        Thread t = new Thread("SWT reference daemon"){

This needs to be started for 'Resource.trackNonDisposed()' as well.
Comment 52 Alexandr Miloslavskiy CLA 2021-01-18 16:39:56 EST
> I did. It will not add the API that Nikita objects to.

Sorry, maybe it's too late in the evening, but I still don't understand what you mean.

Can you please have pity for people like me and explain your idea very explicitly?

Like "we will replace class with class ResoureInternal that will have code ??? which will detect leaks and report them via ??? to avoid API ??? which Nikita doesn't like because ???"
Comment 53 Wim Jongman CLA 2021-01-18 16:46:49 EST
(In reply to Alexandr Miloslavskiy from comment #52)
> > I did. It will not add the API that Nikita objects to.
> 
> Sorry, maybe it's too late in the evening,

Let's sleep on it :)
Comment 54 Alexandr Miloslavskiy CLA 2021-01-18 17:36:11 EST
My colleague noted that it might make sense to remove 'Resource resource' argument of 'NonDisposedReporter.onNonDisposedResource()' to better prepare for possible 'ReferenceQueue' switch. Indeed, with 'ReferenceQueue', it's hard to present the object itself for inspection, see "Keeping enough information" in Comment 44.

This will then imply to have stack collection always enabled, otherwise 'NonDisposedReporter' won't be able to tell anything useful. This is probably not a problem: on my computer, snippet from Bug 570094 comment 10 gives around 350.000 stacks allocated per second. Resources are not expected to ever be allocated at such rate.

In our product we're going to always have the stacks enabled anyway, so we don't need to inspect the leaked object itself.
Comment 55 Eclipse Genie CLA 2021-01-18 18:14:50 EST
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.swt/+/175016
Comment 56 Alexandr Miloslavskiy CLA 2021-01-18 18:16:45 EST
I have submitted a demo patch that resolves various problems, see commit message.
Comment 57 Nikita Nemkin CLA 2021-01-19 00:39:46 EST
If this horror is here to stay (to which I object vehemently), at least reduce the public surface to the necessary minimum:

1. No additions to Resource.
2. No new exception classes.
3. One new method in Device, something like setNonDisposeHandler(Consumer<Error> handler).
(Analogous to Display error handling hooks).

You know, just follow the existing code and API style, like responsible contributors usually do...

Also, for a RefQueue, a pull-style API (with an optional wait) might be a better fit. The client will have to pull actively, but they could chose the best way for them to do it (thread, timer, asyncExec, Dispose listener, whatever).
Comment 58 Eclipse Genie CLA 2021-01-19 12:46:21 EST
New Gerrit change created: https://git.eclipse.org/r/c/www.eclipse.org/eclipse/news/+/175064
Comment 59 Alexandr Miloslavskiy CLA 2021-01-19 12:47:12 EST
That's the updated N&N entry.
Comment 62 Andrey Loskutov CLA 2021-01-22 10:11:45 EST
@Nikita: we fixed the most critical points you've mentioned:
- no finalize() anymore by default
- exception is not a field on a resource, no memory increase by default.
- removed all new API except one method call.

@Alexander: many thanks for you patience & patches.

I think the end result looks reasonable, is a good compromise and provides a very valuable addition to already existing API. Closing this as resolved now.
Comment 63 Phil Beauvoir CLA 2021-07-15 05:05:18 EDT
@Alexandr I had my first experience of this feature yesterday. I upgraded to Eclipse 4.20 and when debugging my RCP app it reported a resource leak for an Image. Sure enough, I had not disposed the Image before creating a new one.

Nice one, thanks!
Comment 64 Alexandr Miloslavskiy CLA 2021-07-15 05:07:12 EDT
You're welcome :)
Comment 65 Phil Beauvoir CLA 2021-09-28 02:28:59 EDT
See also Bug Bug 575420