Bug 309457 - [DS] Component deactivation in incorrect order
Summary: [DS] Component deactivation in incorrect order
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Compendium (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 critical (vote)
Target Milestone: 3.6 M7   Edit
Assignee: Stoyan Boshev CLA
QA Contact:
URL:
Whiteboard:
Keywords: greatbug
Depends on:
Blocks:
 
Reported: 2010-04-16 07:59 EDT by raymond.cuenen CLA
Modified: 2010-04-22 09:35 EDT (History)
5 users (show)

See Also:


Attachments
Example component configuration describing the problem (16.85 KB, application/pdf)
2010-04-16 08:00 EDT, raymond.cuenen CLA
no flags Details
Console debug output (28.19 KB, text/plain)
2010-04-16 08:01 EDT, raymond.cuenen CLA
no flags Details
Patch proposal (511 bytes, patch)
2010-04-16 08:01 EDT, raymond.cuenen CLA
no flags Details | Diff
DSTest eclipse project (15.88 KB, application/zip)
2010-04-16 10:59 EDT, raymond.cuenen CLA
no flags Details
proposed patch (4.32 KB, patch)
2010-04-21 12:13 EDT, Stoyan Boshev CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description raymond.cuenen CLA 2010-04-16 07:59:11 EDT
Build Identifier: 3.6M6

In our framework we use Declarative Services for component binding.
In a specific case, when a component becomes unregistered, the deactivate method of a component in the middle of the dependency tree is called before the deactivate method of the component on the top of the tree (see attached example). This results in a NullPointerException in that component implementation because the deactivation of the component assumes the bound references still to be valid at that time.

In the source code of org.eclipse.equinox.ds_1.2.0.v20100301 I found that the selectNewlyUnsatisfied method returns a Vector of components that need to be deactivated. This result Vector is traversed from top to bottom resulting in the worng component being deactivated first (see also attached console logging).

Since this result Vector originates from the scpEnabled Vector it should be traversed in the opposite order of component activation.
I would suggest to add a call to Collections.reverse before returning the result Vector (see attached patch proposal).

Reproducible: Always

Steps to Reproduce:
1. Start OSGi container.
2. Install bundle containing the component configuration as in the example.
   Console prompt returns.
3. Start the bundle.
   The components are activated in the correct order.
4. Give the list command.
   Components are listed.
5. Disable component E.
   The components are deactivated in an incorrect order.
Comment 1 raymond.cuenen CLA 2010-04-16 08:00:33 EDT
Created attachment 165083 [details]
Example component configuration describing the problem
Comment 2 raymond.cuenen CLA 2010-04-16 08:01:15 EDT
Created attachment 165084 [details]
Console debug output
Comment 3 raymond.cuenen CLA 2010-04-16 08:01:53 EDT
Created attachment 165085 [details]
Patch proposal
Comment 4 Thomas Watson CLA 2010-04-16 09:06:31 EDT
Thanks for the detailed bug report!  Marking for greatbug.  Stoyan, can you take a look?
Comment 5 Simon Archer CLA 2010-04-16 09:37:25 EDT
I'm unsure if this is related to bug 255279, but its Summary field is certainly similar. Let's make sure that both test cases continue to behave as expected.
Comment 6 raymond.cuenen CLA 2010-04-16 10:25:26 EDT
Didn't see  bug 255279. It seems a similar situation as I describe here.
I have tetsted my examble without component E (and then deactivating component D), which gives me the same, incorrect, deactivation order.
I haven't tested the case of also removing component A (and making component B immediate=true), which is the situation in bug 255279 I believe.

I have tested other dependency chains that work fine in both versions, original and patched. Let me know if I should attach them here also.
Comment 7 raymond.cuenen CLA 2010-04-16 10:59:08 EDT
I have downscaled my example to three components B, C, D (as described in my previous comment). This is different from bug 255279, since it still shows the same problem.

I also took the liberty of testing the code of bug 255279. Running that code gives the same output for both the original and patched version.

I've also attached my eclipse project that I used to test with, although it only contains this component description of the downscaled example and not the full range of component descriptions that I had.
Comment 8 raymond.cuenen CLA 2010-04-16 10:59:54 EDT
Created attachment 165108 [details]
DSTest eclipse project
Comment 9 raymond.cuenen CLA 2010-04-17 11:22:48 EDT
Or maybe a better patch approach would be to change line 272 "scpEnabled.addElement(scp);" to "scpEnabled.insertElementAt(scp, 0);", so that the scpEnabled Vector always contains the correct order, instead of adding "Collections.revers(result);" at line 564.
Comment 10 raymond.cuenen CLA 2010-04-20 02:53:01 EDT
Still I have seen in my runtime container that the deactivation order varies from time to time. I cannot locate the exact position of the problem (yet), but sometimes the order is correct (for the patched version) and sometimes it is not.

I don't know the design ideas of the Resolver class (or any of DS) but I suspect that the component dependency tree resolving algorithm doesn't fully comply with the graph theory needed for tree-traversal when deactivating service components.

Anayway, to Stoyan, can you please check if any of this is valid, since simply changing the order of the "scpEnabled" Vector isn't sufficient to fully solve this problem.
Comment 11 raymond.cuenen CLA 2010-04-20 10:23:17 EDT
Ok. I came up with another solution;

In my analysis it seems that the activation order is correct, so I added another Vector to the Resolver class called "scpActivated" from which the clone is used in the selectNewelyUnsatisfied method (line 457) instead of the clone of the "scpEnabled" Vector.
This new "scpActivated" Vector is modified in the InstanceProcess class. In the buildComponent method I added, after line 572:

if (!InstanceProcess.resolver.scpActivated.contains(scp)) {
    InstanceProcess.resolver.scpActivated.addElement(scp);
}

And in the disposeInstances method I added after line 335 and line 357:

InstanceProcess.resolver.scpActivated.removeElement(scp);

This solution works fine in our, fairly complex, runtime container.
Comment 12 Stoyan Boshev CLA 2010-04-20 10:56:44 EDT
(In reply to comment #10)
> Still I have seen in my runtime container that the deactivation order varies
> from time to time. I cannot locate the exact position of the problem (yet), but
> sometimes the order is correct (for the patched version) and sometimes it is
> not.

Yes, the proposed patch will not work because the order of the components in "scpEnabled" Vector depends on the sequence of processing the the xml files of the bundle that hold them. And later it may also depend on the enable/disable calls to the DS API.

(In reply to comment #11)
> Ok. I came up with another solution;

I agree this solution might work in most of the cases. 
I think there will be problems if there are multiple threads registering services thus leading to activation of components through different threads and therefore the order of the activated components in the new vector will not be correct.


I think the correct solution would be to fix the Resolver's selectNewlyUnsatisfied() logic to find the components that depend on the service that is being unregistered and they have to be deactivated because of that. Currently it finds all components that must be deactivated. This brings in the result components that do not depend on the current service being unregistered. 
I will try this solution and provide a patch tomorrow.
Comment 13 Stoyan Boshev CLA 2010-04-21 12:13:07 EDT
Created attachment 165595 [details]
proposed patch

This patch implements the solution as described in comment#12
Raymond, can you please test it in your environment?
Comment 14 raymond.cuenen CLA 2010-04-22 04:03:43 EDT
(In reply to comment #13)
> Created an attachment (id=165595) [details]
> proposed patch
> This patch implements the solution as described in comment#12
> Raymond, can you please test it in your environment?

I have tested the patch in our osgi environment and I ran the tests I constructed for reporting this bug. All tests succeeded and I haven't observered any other unexpected behaviour. Thanks.
Comment 15 Stoyan Boshev CLA 2010-04-22 09:35:33 EDT
Fixed in HEAD