[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] Race conditions with remote provider

Alex,

I think I need to ask a few questions to understand what you are doing:

On 6/6/2011 10:14 AM, Alex Blewitt wrote:
I have an ECF container, which I'm using to register services via r-osgi.

If I register the container service (locator/advertiser) in the bundle's start method, then all works.

First question: are you registering the service via OSGi remote services? (i.e. by adding the standardized service properties to the bundleContext.registerService call?)...or directly via the ECF remote services API (i.e. directly on the container)? The reason I ask is that you say that you 'have' an ECF container that you are using to register services via r-osgi.


However, I'm refactoring to pass in an ID for the connect method. I can't do this in the bundle start because various callbacks (eg createID) use reflection to find the bundle's classloader, which isn't possible as the start method hasn't returned.

I'm not sure I understand what you mean by 'various callbacks using reflection to find the bundle's classloader'. Do you mean in ECF code? Or your own code? If your own code...can I ask why this is necessary? (I'm not questioning the need for it in general...I'm just trying to understand the use case better).


So I fire up a thread and do it off start thread, but now I have a race condition.

If I register the service before connecting, then it's possible the client tries to access it in an unconnected state.

I don't really understand...is a single thread doing the host registration *and* the client connecting? A corrollary: are both the host and client in the same process? (one reason to ask here is that by default the BasicTopologyManager filters out loopback remote services...although this can be changed to allow this if this is what you wish to do).


And I guess I don't quite follow why you are explicitly 'connecting' at all (if you are using OSGi remote services), as this can/be done for you in response to the discovery of a remote service (and that's one way to avoid race conditions in general...is to have the client respond to discovery of the remote service...which is guaranteed to happen only *after* the registration has succeeded).

Also, I guess I don't understand why you are changing/setting the ID for the r-osgi provider...as since it was developed before ECF remote services it has it's own way of managing ports (via system property rather than ID/config parameter).

If I wait to register the locator/advertiser service until after the start method is called, none of the r-osgi adverts happen (on the client side). Even stopping/redrafting bundles doesn't seem to trigger it.

r-osgi by itself doesn't have locator/advertiser mechanism at all. It's purely a distribution provider...the discovery was factored out of r-osgi some time ago. So r-osgi alone doesn't do any discovery publishing (advertising) at all. But if you are using OSGi remote services, and have a discovery provider in place (e.g. zookeeper, zeroconf, slp), then the endpoint advertising should be done as part of the BasicTopologyManager's responding to the bundleContext.registerService (when it's got the standard remote service properties set, of course).


One thought that may be helpful: As part of the OSGi Remote Service Admin spec, there is a 'file-based discovery' mechanism that uses an xml file to 'discover' a remote service. There's some docs on this mechanism here [1], and more complete information in the RSA spec itself. I'm not sure it's useful for your use case or not right now, but the way that this works is that rather than having network discovery (via zookeeper, zeroconf, or whatever), the client side has an xml-file (in the edef format), that when the bundle that contains the edef xml is started the client-side discovery process is 'kicked off'. This behavior is as specified in the RSA spec.


So, how should I be making this available? Or is there an alternate method (say, coming from the plugin.xml's extension points) being triggered and failed with an exception at startup?

No...I don't believe so. The only mechanism for discovery that is not based upon one of the network discovery protocols (e.g. zookeeper, etc) is the RSA EDEF mechanism (in ECF 3.5 anyway). And this waits for the bundle that has the edef xml file to be started as per the RSA specification. Note, however, that the Remote Service Admin impl, and the BasicTopologyManager (which is the only topology manager that comes in ECF 3.5.1) do have to be started...prior to either the remote service registration (on the host side), or the discovery (on the consumer side). The BasicTopologyManager is in this bundle:


org.eclipse.ecf.osgi.services.distribution

and starting this bundle will start up the other parts of RSA.

I think it would be helpful for me if you could either post some code (for creating the container explicitly rather than automatically via the OSGi remote service admin impl...if that's what you are doing...and the remote service registration. And also describe the environment...e.g. Eclipse, some other OSGi environment (list of bundles?) and the startup sequence...for both the host side and the consumer side of things. If you would rather do that via bug that's fine with me...as attaching code to a bug is probably easier for everyone.

Thanks,

Scott