[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] Race conditions with remote provider

On 6 Jun 2011, at 18:58, Scott Lewis wrote:

> I think I need to ask a few questions to understand what you are doing:

Sure thing, thanks for the detailed response (especially given the lack of mine)

> On 6/6/2011 10:14 AM, Alex Blewitt wrote:
>> I have an ECF container, which I'm using to register services via r-osgi.
>> 
>> If I register the container service (locator/advertiser) in the bundle's start method, then all works.
> 
> First question:  are you registering the service via OSGi remote services? (i.e. by adding the standardized service properties to the bundleContext.registerService call?)...or directly via the ECF remote services API (i.e. directly on the container)?  The reason I ask is that you say that you 'have' an ECF container that you are using to register services via r-osgi.

Well, I'm writing a discovery container. So I'm using r-osgi and the example hello consumer/producer examples, just with my container instead of ZeroConf. In turn, I'm instantiating this and connecting to it (with connect(id,connectContext)) in the bundle activator's start method.

>> However, I'm refactoring to pass in an ID for the connect method. I can't do this in the bundle start because various callbacks (eg createID) use reflection to find the bundle's classloader, which isn't possible as the start method hasn't returned.
> 
> I'm not sure I understand what you mean by 'various callbacks using reflection to find the bundle's classloader'.  Do you mean in ECF code?  Or your own code?  If your own code...can I ask why this is necessary?  (I'm not questioning the need for it in general...I'm just trying to understand the use case better).

So I had something like (excuse the exact method calls, this is from memory):

class Activator {
  void start() {
    id = IDFactory.getDefault().createId(namespace, new String[] { "a","b","c" });
    myContainer = new MyContainer();
    myContainer.connect(id,null);
    context.registerService(myContainer, {IDiscoveryListener, IDiscoveryAdvertiser}); 
  }
}

The problem was that the ID Factory was using the namespace (either from the plugin.xml where I have some links, or from a service I may have registered prior to that call ... think the plugin.xml is how it's finding it) and using bundle.getClass() to load the namespace's specific class provider.

Since the start() method hasn't completed at that point, an external bundle doing bundle.loadClass(some.class.name) fails with some OSGi 'bundle not started' type exception. 

>> So I fire up a thread and do it off start thread, but now I have a race condition.
>> 
>> If I register the service before connecting, then it's possible the client tries to access it in an unconnected state.
> 
> I don't really understand...is a single thread doing the host registration *and* the client connecting?  A corrollary:  are both the host and client in the same process?  (one reason to ask here is that by default the BasicTopologyManager filters out loopback remote services...although this can be changed to allow this if this is what you wish to do).
> 
> And I guess I don't quite follow why you are explicitly 'connecting' at all (if you are using OSGi remote services), as this can/be done for you in response to the discovery of a remote service (and that's one way to avoid race conditions in general...is to have the client respond to discovery of the remote service...which is guaranteed to happen only *after* the registration has succeeded).
> 
> Also, I guess I don't understand why you are changing/setting the ID for the r-osgi provider...as since it was developed before ECF remote services it has it's own way of managing ports (via system property rather than ID/config parameter).

In this case, I'm setting the ID in my own discovery process. I'm using r-osgi (by virtue of it being in the same OSGi runtime) but I'm not directly talking to it. I need to pass some data in via the ID because my discovery agent needs some additional parameters.

>> If I wait to register the locator/advertiser service until after the start method is called, none of the r-osgi adverts happen (on the client side). Even stopping/redrafting bundles doesn't seem to trigger it.
> 
> r-osgi by itself doesn't have locator/advertiser mechanism at all.  It's purely a distribution provider...the discovery was factored out of r-osgi some time ago.  So r-osgi alone doesn't do any discovery publishing (advertising) at all.  But if you are using OSGi remote services, and have a discovery provider in place (e.g. zookeeper, zeroconf, slp), then the endpoint advertising should be done as part of the BasicTopologyManager's responding to the bundleContext.registerService (when it's got the standard remote service properties set, of course).
> 
> One thought that may be helpful:  As part of the OSGi Remote Service Admin spec, there is a 'file-based discovery' mechanism that uses an xml file to 'discover' a remote service.  There's some docs on this mechanism here [1], and more complete information in the RSA spec itself.   I'm not sure it's useful for your use case or not right now, but the way that this works is that rather than having network discovery (via zookeeper, zeroconf, or whatever), the client side has an xml-file (in the edef format), that when the bundle that contains the edef xml is started the client-side discovery process is 'kicked off'.  This behavior is as specified in the RSA spec.

Thanks, I'll take a look into it and see if I can find out what is going on.

>> So, how should I be making this available? Or is there an alternate method (say, coming from the plugin.xml's extension points) being triggered and failed with an exception at startup?
> 
> No...I don't believe so.  The only mechanism for discovery that is not based upon one of the network discovery protocols (e.g. zookeeper, etc) is the RSA EDEF mechanism (in ECF 3.5 anyway).  And this waits for the bundle that has the edef xml file to be started as per the RSA specification.    Note, however, that the Remote Service Admin impl, and the BasicTopologyManager (which is the only topology manager that comes in ECF 3.5.1) do have to be started...prior to either the remote service registration (on the host side), or the discovery (on the consumer side).  The BasicTopologyManager is in this bundle:
> 
> org.eclipse.ecf.osgi.services.distribution
> 
> and starting this bundle will start up the other parts of RSA.
> 
> I think it would be helpful for me if you could either post some code (for creating the container explicitly rather than automatically via the OSGi remote service admin impl...if that's what you are doing...and the remote service registration.  And also describe the environment...e.g. Eclipse, some other OSGi environment (list of bundles?) and the startup sequence...for both the host side and the consumer side of things.  If you would rather do that via bug that's fine with me...as attaching code to a bug is probably easier for everyone.

I'll hack on it a bit more, it's probably something I'm not quite understanding in my own code. The goal is to have a zookeeper like discovery container, and to run that from two separate processes.

If I register the service before it's connected, both client and server work fine (in separate processes, using r-osgi under the covers for inter-process communication). However, if I wait until afterwards I don't appear to see anything on the client side. And whenever I put a debug in the code, the connection timeouts start firing invalidating my test. Maybe I'll liberally sprinkle some print statements to see if I can find out what's going on, and report back if I have more data.

Alex