Bug 379865 - Getting OSGi service references with simple filter is slow
Summary: Getting OSGi service references with simple filter is slow
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.6.2   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: Juno RC2   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords: core, performance
Depends on:
Blocks:
 
Reported: 2012-05-17 18:58 EDT by Stanley Poon CLA
Modified: 2012-05-22 11:03 EDT (History)
6 users (show)

See Also:
hargrave: review+
jwross: review+


Attachments
java code to reproduce the problem (3.55 KB, text/plain)
2012-05-17 19:02 EDT, Stanley Poon CLA
no flags Details
pom file to build the test bundle to reproduce the bug (1.09 KB, text/xml)
2012-05-17 19:04 EDT, Stanley Poon CLA
no flags Details
bundle to reproduce the problem (5.75 KB, application/octet-stream)
2012-05-17 19:05 EDT, Stanley Poon CLA
no flags Details
Result from OSGi console (1.01 KB, text/plain)
2012-05-17 19:12 EDT, Stanley Poon CLA
no flags Details
possible fix (3.02 KB, patch)
2012-05-21 11:53 EDT, Thomas Watson CLA
no flags Details | Diff
LinkedList patch (1.14 KB, patch)
2012-05-21 16:02 EDT, Thomas Watson CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stanley Poon CLA 2012-05-17 18:58:35 EDT
Build Identifier: 

With 200K services registered with an ID propertis, looking up a service by getServiceReferences(null, "(ID=1000)") is very slow. It took ~4 seconds.

The same code running on Felix will return within 100ms. This is about 50X.

I started an email thread and the forum recommended to file a bug.

Following are results from the console:

Equinox:
  4.043 ( sec) Find middle Resource reference by ID
  4.000 ( sec) Find middle Resource reference by class name and ID


Felix:
  76.854 (msec) Find middle Resource reference by ID
  120.876 (msec) Find middle Resource reference by class name and ID


Reproducible: Always

Steps to Reproduce:
1. build a bundle with the code attached
2. Activator will run the test when deployed in the OSGi containers
3. results should be shown on OSGi console
Comment 1 Stanley Poon CLA 2012-05-17 19:02:50 EDT
Created attachment 215826 [details]
java code to reproduce the problem

Activator will run the test when bundle is deployed.
Comment 2 Stanley Poon CLA 2012-05-17 19:04:28 EDT
Created attachment 215827 [details]
pom file to build the test bundle to reproduce the bug

Maven pom file to build the bundle. The Java code is attached as a separate file.
Comment 3 Stanley Poon CLA 2012-05-17 19:05:40 EDT
Created attachment 215828 [details]
bundle to reproduce the problem

Bundle to be deployed to OSGi container to reproduce the bug.
Comment 4 Stanley Poon CLA 2012-05-17 19:12:16 EDT
Created attachment 215829 [details]
Result from OSGi console
Comment 5 Thomas Watson CLA 2012-05-18 15:12:22 EDT
Thanks for the sample code.  I have confirmed the results across felix and equinox.  The majority of the time is being spent matching the filter across the large set of services that are registered.

To get equivalent performance we would have to restructure the way we filter across the set of services.  Felix takes a pretty unique approach to doing service filter evaluations.  From the equinox mailing list discussions I think you also indicated that knopflerfish was also very fast when compared to equinox for this kind of service lookups?
Comment 6 Thomas Watson CLA 2012-05-21 11:49:23 EDT
BJ identified the issue.  We have an ArrayList with all the services that we want to evaluate the filter over and we remove each one that does not match.  To do this we use Iterator.remove().  That gets to be very slow when the ArrayList is very large because of the amount of shifting that must occur to remove a single element from the very large list.  This is why the slow down is not linear as the number of service registrations increases.

I think we should attempt a fix for RC2.
Comment 7 Thomas Watson CLA 2012-05-21 11:53:20 EDT
Created attachment 215979 [details]
possible fix

The actual fix is smaller than the patch indicates.  I renamed the "result" variable to make it more clear.  We decided to create an initial filteredRegistrations list with a capacity equal to the original list to avoid growth costs.  This list ultimately gets thrown away anyway before we return the final array to the caller.
Comment 8 Thomas Watson CLA 2012-05-21 12:00:46 EDT
BJ please review.  I am only suggesting this for RC2 because I think the fix is safe and it could help for any case where there are lots of services registered.
Comment 9 Thomas Watson CLA 2012-05-21 12:01:57 EDT
John, please review as well.
Comment 10 Thomas Watson CLA 2012-05-21 16:02:53 EDT
Created attachment 215986 [details]
LinkedList patch

Here is a simplified patch that just changes to using a LinkedList for the copy which provides O(1) performance for removals.
Comment 11 BJ Hargrave CLA 2012-05-21 16:15:19 EDT
+1 to the LinkedList patch.
Comment 12 Thomas Watson CLA 2012-05-21 16:52:47 EDT
Fixed in the following commit:

http://git.eclipse.org/c/equinox/rt.equinox.framework.git/commit/?id=8ad704b0c8149a6ab23d96f414045543e171fa06

Thanks for the help in providing the testcase for this Stanley.
Comment 13 Markus Keller CLA 2012-05-22 10:40:08 EDT
The LinkedList looks good as an RC2 fix, but you're going to use the real fix for Kepler, aren't you?

The LinkedList still has allocation and pointer assignment overheads. The "possible fix" looks good, except that I wouldn't pass a size to the ArrayList constructor (since the result is expected to be small).
Comment 14 BJ Hargrave CLA 2012-05-22 11:03:43 EDT
(In reply to comment #13)
> The LinkedList looks good as an RC2 fix, but you're going to use the real fix
> for Kepler, aren't you?
> 
> The LinkedList still has allocation and pointer assignment overheads. The
> "possible fix" looks good, 

I don't see any need to "improve" this. This list is only used temporarily and is not returned to the calling bundle. And this performance problem is only an issue in the somewhat pathological case of 200K instances of a service. In more common cases, there are vastly fewer services.

The LinkedList solution performed approximately as well as using 2 ArrayLists and is a much simpler fix.

> except that I wouldn't pass a size to the ArrayList
> constructor (since the result is expected to be small).

We have no ability to know this in general. For the specific test case, this is true. What if the filter was not (id=42) but (!(id=42))? Then the result could be very large.