Bug 349078 - [remoteserviceadmin] basic topology manager does not export previously registered services upon start
Summary: [remoteserviceadmin] basic topology manager does not export previously regist...
Status: RESOLVED FIXED
Alias: None
Product: ECF
Classification: RT
Component: ecf.remoteservices (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: 3.5.2   Edit
Assignee: Scott Lewis CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks:
 
Reported: 2011-06-10 13:04 EDT by Alex Blewitt CLA
Modified: 2011-06-20 15:38 EDT (History)
2 users (show)

See Also:


Attachments
Reorders the registration of EventHook and existing services (1.53 KB, patch)
2011-06-20 14:21 EDT, Alex Blewitt CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Blewitt CLA 2011-06-10 13:04:30 EDT
I'm using ECF 3.5.1 and the new remoteservicesadmin bundles. I've started with the Hello Service DS example from master, and I'm replacing the JMDNS provider with my own.

My container doesn't register an instance of itself at startup, but defers it until later. I'm using the ECF generic server at the moment.

I'm creating the remote service with DS. At startup, I get a message:

[log;+0100 2011.06.10 17:33:21:869;INFO;org.eclipse.ecf.osgi.services.remoteserviceadmin;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.remoteserviceadmin;code=0;message=Advertise endpointDescription=EndpointDescription[containerID=StringID[ecftcp://localhost:3787/server],connectTargetID=null,idFilter=null,rsFilter=null,properties={component.id=1, component.name=org.eclipse.ecf.examples.remoteservices.hello.ds.host, ecf.endpoint.id.ns=org.eclipse.ecf.core.identity.StringID, ecf.exported.containerfactoryargs=ecftcp://localhost:3787/server, endpoint.framework.uuid=309b15b6-0fd1-4fe1-9d7b-12c826b4953c, endpoint.id=ecftcp://localhost:3787/server, endpoint.package.version.org.eclipse.ecf.examples.remoteservices.hello=3.0.0, endpoint.service.id=1, objectClass=[Ljava.lang.String;@3eec1a, remote.configs.supported=[Ljava.lang.String;@10e820, remote.intents.supported=[Ljava.lang.String;@1f7abae, service.id=55, service.imported=true, service.imported.configs=[Ljava.lang.String;@10e820}].  No endpointDescriptionLocator advertisers available.  Cannot unpublish endpointDescription=EndpointDescription[containerID=StringID[ecftcp://localhost:3787/server],connectTargetID=null,idFilter=null,rsFilter=null,properties={component.id=1, component.name=org.eclipse.ecf.examples.remoteservices.hello.ds.host, ecf.endpoint.id.ns=org.eclipse.ecf.core.identity.StringID, ecf.exported.containerfactoryargs=ecftcp://localhost:3787/server, endpoint.framework.uuid=309b15b6-0fd1-4fe1-9d7b-12c826b4953c, endpoint.id=ecftcp://localhost:3787/server, endpoint.package.version.org.eclipse.ecf.examples.remoteservices.hello=3.0.0, endpoint.service.id=1, objectClass=[Ljava.lang.String;@3eec1a, remote.configs.supported=[Ljava.lang.String;@10e820, remote.intents.supported=[Ljava.lang.String;@1f7abae, service.id=55, service.imported=true, service.imported.configs=[Ljava.lang.String;@10e820}];severity4;exception=null;children=[]]]

The important bit (I think) is "No endpointDescriptionLocator advertisers available.  Cannot unpublish..."

It seems that there's no IDiscoveryAdvertiser at that point according to the code (BTW the 'unpublish' should probably be 'publish' at this point; the 'advertise' boolean must be 'true' since the message prefix is 'Advertise' - that's possibly a separate bug)

Anyway, my IDiscoveryAdvertiser hasn't been registered at that point, but then does subsequently get registered. However, there are existing remote services which haven't been registered with my container after it does.

Because I'm using DS, I can stop and re-start my service, at which point it gets picked up afresh (and the IDiscoveryAdvertiser does exist) and it works as expected.

So, in summary, I think the bug is that when a new IDiscoveryAdvertiser is registered, any existing services need to be published, as well as subscribing to future services coming/going. Either that, or we'll have to put in a start order that says this container is started before DS gets a chance ... but that's a bit ugly.
Comment 1 Alex Blewitt CLA 2011-06-10 13:24:11 EDT
FWIW I found that I also needed to have org.eclipse.equinox.event installed and started, as otherwise I was seeing messages like:

[log;+0100 2011.06.10 18:17:44:182;ERROR;org.eclipse.ecf.osgi.services.remoteserviceadmin;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.remoteserviceadmin;code=4;message=org.eclipse.ecf.osgi.services.remoteserviceadmin.RemoteServiceAdmin:RemoteServiceAdmin.postEvent:No event admin service available to post event=RemoteServiceAdminEvent[containerID=StringID[ecftcp://localhost:3787/server], getType()=2, getSource()=org.eclipse.ecf.osgi.services.remoteserviceadmin_2.0.100.v20110531-2218 [26], getException()=null, getImportReference()=null, getExportReference()=org.eclipse.ecf.osgi.services.remoteserviceadmin.RemoteServiceAdmin$ExportReference@67f797];severity4;exception=null;children=[]]]

It's not clear to me if this is an error or not, as the service appears to get registered and usable in any case.
Comment 2 Markus Kuppe CLA 2011-06-10 13:37:59 EDT
(In reply to comment #1)
> FWIW I found that I also needed to have org.eclipse.equinox.event installed and
> started, as otherwise I was seeing messages like:
> 
> [log;+0100 2011.06.10
> 18:17:44:182;ERROR;org.eclipse.ecf.osgi.services.remoteserviceadmin;org.eclipse.core.runtime.Status[plugin=org.eclipse.ecf.osgi.services.remoteserviceadmin;code=4;message=org.eclipse.ecf.osgi.services.remoteserviceadmin.RemoteServiceAdmin:RemoteServiceAdmin.postEvent:No
> event admin service available to post
> event=RemoteServiceAdminEvent[containerID=StringID[ecftcp://localhost:3787/server],
> getType()=2,
> getSource()=org.eclipse.ecf.osgi.services.remoteserviceadmin_2.0.100.v20110531-2218
> [26], getException()=null, getImportReference()=null,
> getExportReference()=org.eclipse.ecf.osgi.services.remoteserviceadmin.RemoteServiceAdmin$ExportReference@67f797];severity4;exception=null;children=[]]]
> 
> It's not clear to me if this is an error or not, as the service appears to get
> registered and usable in any case.

AFAICT .event is optional which would make this ERROR a candidate to be downgraded to WARNING or INFO.
Comment 3 Scott Lewis CLA 2011-06-10 13:53:22 EDT
(In reply to comment #0)
<stuff deleted> 
> The important bit (I think) is "No endpointDescriptionLocator advertisers
> available.  Cannot unpublish..."

Yes, this is saying that there is no advertiser available for publish (I've fixed the typo in the message) at the point when the remote service is registered (and exported by the RSA).

> 
> It seems that there's no IDiscoveryAdvertiser at that point according to the
> code (BTW the 'unpublish' should probably be 'publish' at this point; the
> 'advertise' boolean must be 'true' since the message prefix is 'Advertise' -
> that's possibly a separate bug)

Yeah, it's a typo...already fixed.  Thanks.

> 
> Anyway, my IDiscoveryAdvertiser hasn't been registered at that point, but then
> does subsequently get registered. However, there are existing remote services
> which haven't been registered with my container after it does.
> 
> Because I'm using DS, I can stop and re-start my service, at which point it
> gets picked up afresh (and the IDiscoveryAdvertiser does exist) and it works as
> expected.
> 
> So, in summary, I think the bug is that when a new IDiscoveryAdvertiser is
> registered, any existing services need to be published, as well as subscribing
> to future services coming/going. 

The OSGi Remote Service Admin specification defines something called a topology manager...which is responsible for controlling when/if/how services are exported, and if/when/how they are published for remote discovery.

Currently, ECF ships with a single topology manager...an instance of this class:

org.eclipse.ecf.internal.osgi.services.distribution.BasicTopologyManager

This topology manager was created so that 

1) people can/could get started quickly/easily with remote services...in as many contexts as made sense (e.g. running examples in Eclipse, running examples in other runtime contexts, etc).

2) to serve as example code for those that might wish to create their own topology managers to fit more complex use cases.

The reason I say all this is that it's the topology manager that defines the export/import and the advertise/discovery policy...so depending upon your use case(s), it might make sense for you to consider creating your own topology manager...so that you can control it.

Although technically easy, I'm a little hesitant to add the described publish behavior to the BasicTopologyManager...i.e. re-publishing all remote services when a new IDiscoveryAdvertiser is registered...because for some use cases (e.g. running in Eclipse with the entire ECF SDK present)...this may not be the appropriate/expected/desired behavior...and it does have quite a few implications...especially in an environment where people have several/a number of discovery providers present (e.g. they've installed all of ECF SDK into Eclipse...which is pretty common).

At this point I'm not necessarily against enhancing the BasicTopologyManager with the behavior that you describe...I just need to think about this a little bit and discuss on this bug...but it may be a better approach to create more/other topology managers (e.g. one that does what you describe), and make them available to you and others...so that they can get the behavior that they want without it having lots of consequences in other environments. 

As suggested by the RSA spec, it's been my intention to add other topology managers to meet a variety of use cases (the spec was created with TMs as the way to customize and control the export and discovery), but we haven't yet been able to do so...due to resource constraints.
Comment 4 Alex Blewitt CLA 2011-06-10 14:04:08 EDT
I agree it's useful to be flexible with the topology managers, but the problem as it stands is it effectively introduces a start ordering between when the remote service gets registered and when the discovery advertiser gets created. The reason the examples currently work is because the ZeroConf bundle starts up quickly (and so is there in time for the discovery bundle to find it prior to the service is registered). If you changed the start order for the jmdns provider in the examples, you'd see nothing working.

With DS, it's relatively easy to stop and start a component (and then find out Hey! It is working!) but with bundles maybe less so (if only because there are usually an order of magnitude less DS components than bundles).

The net effect is you can have all the right bundles, and all started, and yet not see behaviour that you were expecting without rebooting a component/bundle. It's especially frustrating when you can do a 'services' and see your service there quite clearly, but not know why it's not being advertised. (And that's aside from dozy errors like not putting the right container name in the exported configs :)

Let's take a look from another example. We have a system with 10 (remote) services. The discovery container bundle gets stopped; all 10 services disappear from the container. The discovery container starts up again and ... nothing. To republish the service, you have to bounce (everything else). That doesn't sound like good default behaviour in a dynamic world to me.
Comment 5 Markus Kuppe CLA 2011-06-10 14:07:47 EDT
I wonder if this is something to be discussed for the discovery layer instead of the remote service layer as the question pretty much boils down to a IServiceInfo cache for discovery providers, doesn't it?
Comment 6 Scott Lewis CLA 2011-06-10 14:23:14 EDT
(In reply to comment #4)
> I agree it's useful to be flexible with the topology managers, but the problem
> as it stands is it effectively introduces a start ordering between when the
> remote service gets registered and when the discovery advertiser gets created.
> The reason the examples currently work is because the ZeroConf bundle starts up
> quickly (and so is there in time for the discovery bundle to find it prior to
> the service is registered). If you changed the start order for the jmdns
> provider in the examples, you'd see nothing working.
> 
> With DS, it's relatively easy to stop and start a component (and then find out
> Hey! It is working!) but with bundles maybe less so (if only because there are
> usually an order of magnitude less DS components than bundles).
> 
> The net effect is you can have all the right bundles, and all started, and yet
> not see behaviour that you were expecting without rebooting a component/bundle.
> It's especially frustrating when you can do a 'services' and see your service
> there quite clearly, but not know why it's not being advertised. (And that's
> aside from dozy errors like not putting the right container name in the
> exported configs :)
> 
> Let's take a look from another example. We have a system with 10 (remote)
> services. The discovery container bundle gets stopped; all 10 services
> disappear from the container. The discovery container starts up again and ...
> nothing. To republish the service, you have to bounce (everything else). That
> doesn't sound like good default behaviour in a dynamic world to me.

I agree that it's not a good default behavior in a dynamic world for some use cases (yours obviously).  OTOH, I'm having trouble immediately coming up with an alternative that would *also* have good default behavior in other...maybe more common use cases (e.g. Eclipse SDK+ECF SDK).  That's why I suggested that a second/new topology manager might be a good way to address this need...as my understanding is that this was the RSA spec intention for the TM in the first place...to make it easy to support multiple distribution and discovery use cases.

Another issue...for me anyway...currently, I have very few resources to dedicate to significant enhancements...for free, anyway.  I wish this were not the case, but it is.
Comment 7 Scott Lewis CLA 2011-06-10 14:26:21 EDT
(In reply to comment #5)
> I wonder if this is something to be discussed for the discovery layer instead
> of the remote service layer as the question pretty much boils down to a
> IServiceInfo cache for discovery providers, doesn't it?

I don't think so...as we're talking about the publish/discovery of OSGi remote service endpoint descriptions in particular...not discovery of arbitrary IServiceInfos...so this seems pertinent to the remote service/RSA layer.
Comment 8 Markus Kuppe CLA 2011-06-10 14:29:49 EDT
(In reply to comment #7)
> (In reply to comment #5)
> > I wonder if this is something to be discussed for the discovery layer instead
> > of the remote service layer as the question pretty much boils down to a
> > IServiceInfo cache for discovery providers, doesn't it?
> 
> I don't think so...as we're talking about the publish/discovery of OSGi remote
> service endpoint descriptions in particular...not discovery of arbitrary
> IServiceInfos...so this seems pertinent to the remote service/RSA layer.

On the discovery layer an OSGi remote service endpoint description is just an IServiceInfo. So I fail to see why it is limited to the remote service layer.
Comment 9 Alex Blewitt CLA 2011-06-16 06:31:52 EDT
So there's an interaction between three players here.
A - the remote services admin/toplogy manager
C - the container
S - the service which is to be registered.

The 'correct' sequence appears to be C A S, which causes it to be available and published in container C. I suspect A C S would also work as well. Ideally, you want S to be decoupled from C (other than the intents) so there's no start order/DS dependency order to order them in such a way.

Any combination where S precedes A causes the service not to be published:

C S A, S C A, S A C

That's the part of the bug I'm referring to here.

Note that if A occurs before C, and if S occurs before C, then C doesn't get the service registered with it (and instead, we fall back to a 'generic' container):

A S C (service gets published remotely), S A C (service doesn't get published remotely)

Finally with regards the container, it's not always the case that there will be a default one created at bundle start-up. This may be because the container requires external configuration (e.g. credentials) or it may be that the remote publishing mechanism is currently down. So whilst the concept of bundle start-order may be used to prevent bundle A starting until bundle C has started, that doesn't necessarily guarantee that bundle C will have instantiated a container service at that point.
Comment 10 Scott Lewis CLA 2011-06-16 17:40:05 EDT
(In reply to comment #9)
> So there's an interaction between three players here.
> A - the remote services admin/toplogy manager
> C - the container
> S - the service which is to be registered.
> 
> The 'correct' sequence appears to be C A S, which causes it to be available and
> published in container C. I suspect A C S would also work as well. Ideally, you
> want S to be decoupled from C (other than the intents) so there's no start
> order/DS dependency order to order them in such a way.
> 
> Any combination where S precedes A causes the service not to be published:
> 
> C S A, S C A, S A C
> 
> That's the part of the bug I'm referring to here.


If so, then I wouldn't refer to it as a bug, but rather an enhancement.  Essentially, the topology manager is the determinant of the export process...i.e. the BasicTopologyManager has a service hook that receives a synchronous notification when services are registered.  It's behavior is (currently) to check to see if the service.exported.interfaces (and other standard properties) are set...and if they are set then the service is exported (i.e. RSA.exportService is called).

If the behavior that you want is to have the topology manager *also* check for all *previously* exported remote services (I mean previous to the topology manager being started/registered as a service hook), then this is completely fine...but either a) the BasicTopologyManager could be enhanced to support this; or b) A new topology manager can be created to have this behavior.  The reason that I think that b is a better option is that a could result in some very complicated timing/startup situations for (e.g.) use while in Eclipse or other situations...where the basic topology manager isn't expected to be started at all (nor any of ECF).  But it's possible that I could be convinced that a would work ok (enhancing the BasicTopologyManager to export all previously registered remote services) with some experimentation in different situations.

> 
> Note that if A occurs before C, and if S occurs before C, then C doesn't get
> the service registered with it (and instead, we fall back to a 'generic'
> container):
> 
> A S C (service gets published remotely), S A C (service doesn't get published
> remotely)
> 
> Finally with regards the container, it's not always the case that there will be
> a default one created at bundle start-up. This may be because the container
> requires external configuration (e.g. credentials) or it may be that the remote
> publishing mechanism is currently down. So whilst the concept of bundle
> start-order may be used to prevent bundle A starting until bundle C has
> started, that doesn't necessarily guarantee that bundle C will have
> instantiated a container service at that point.

True.  Note also that ECF has the concept of a 'host container selector' (i.e. IHostContainerSelector) service, that is used by ECF's implementation of the RemoteServiceAdmin...to actually select (and/or create a container).  If desired, it's a simple matter to replace/substitute one's on implementation of IHostContainerSelector to be responsible for creating/configuring and/or selecting an existing container to use for a given call to RSA.exportService call (when the topology manager that controls the export chooses to export the service).

So I guess to summarize...if one wishes to handle the ordering different relative to 'A' (the topology manager/RSA.exportService call) then one can change the topology manager (for example, when a topology manager was started, it could very easily look for all existing/previously registered remote services, and export them at startup time), and if one wishes to handle the ordering differently than the current implementation between C and the other two (A or S), then it would probably make sense to have one's own implementation of IHostContainerFinder (which essentially defines the logic for selecting or creating/configuring a host container during the RSA.exportService call).

One other thing to point out...the RSA spec doesn't say anything about the 'required' behavior of the topology manager...and in fact the intention with the design was to accommodate other topology managers...that are appropriate for other use cases.  That's why, for example, there is an AbstractTopologyManager...to support the easy creation of other topology managers.
Comment 11 Alex Blewitt CLA 2011-06-17 07:16:31 EDT
(In reply to comment #10)
> > Any combination where S precedes A causes the service not to be published:
> >
> > C S A, S C A, S A C
> >
> > That's the part of the bug I'm referring to here.
> 
> If so, then I wouldn't refer to it as a bug, but rather an enhancement.

I look at this as being a show-stopping bug of trying to get remote services admin adopted. It is not possible to have a dynamic system such as OSGi only work when bundles are started in a particular order, which is the case here. I spent a good few hours trying to figure out why my examples weren't working, only to find that if I stopped and started the service then it suddenly did work. This is not going to be the last time someone suffers from the same problem.

> Essentially, the topology manager is the determinant of the export
> process...i.e. the BasicTopologyManager has a service hook that receives a
> synchronous notification when services are registered.

The problem is that it doesn't look for *existing* services at startup. That is independent for what the BasicTopologyManager does once it receives a synchronous notification.

> If the behavior that you want is to have the topology manager *also* check for
> all *previously* exported remote services (I mean previous to the topology
> manager being started/registered as a service hook), then this is completely
> fine...but either a) the BasicTopologyManager could be enhanced to support this;
> or b) A new topology manager can be created to have this behavior.

This isn't about previously exported remote services. This is about exportable services which were started prior to the remote service admin being started.

Look at DS. If you install/start DS, then install/start a bundle with a DS component, the component is started. On the other hand, if you install/start a bundle with a DS component, then install/start DS *it still works*.

With ECF RSA, if you install/start RSA, then install/start a bundle(service), it works. If you install/start a bundle(service), then install/start RSA you get different behaviours.

> The reason that I think that b is a better option is that a could result in some very
> complicated timing/startup situations for (e.g.) use while in Eclipse or other
> situations...where the basic topology manager isn't expected to be started at
> all (nor any of ECF).

I really don't think this is about the implementation of any one topology manager - this is about what RSA should do by default.

Consider the case of upgrading RSA in place in a long-running OSGi server-side component. It's up and running, RSA is exporting 5 service, everything working as expected. Now upgrade RSA (either bundle.update or bundle.uninstall/bundle.install). Suddenly, those 5 services disappear and don't come back. You have to either (a) cycle the services or (b) reboot the entire OSGi runtime. And that's because RSA is only listening for 'new' services.

> But it's possible that I could be convinced that a would
> work ok (enhancing the BasicTopologyManager to export all previously registered
> remote services) with some experimentation in different situations.

> True.  Note also that ECF has the concept of a 'host container selector' (i.e.
> IHostContainerSelector) service, that is used by ECF's implementation of the
> RemoteServiceAdmin...to actually select (and/or create a container). 

Let's take this particular item out of the loop for now. 

> One other thing to point out...the RSA spec doesn't say anything about the
> 'required' behavior of the topology manager...and in fact the intention with the
> design was to accommodate other topology managers...that are appropriate for
> other use cases.  That's why, for example, there is an
> AbstractTopologyManager...to support the easy creation of other topology
> managers.

I think my understanding of the Topology Manager differs from this interpretation, but in all honesty that's probably because I've not spent significant amounts of time involved in it.

The problem is essentially that the BasicToplogyManager only listens for new services, via the service event hook:

	public void start() throws Exception {
		// Register as EndpointListener, so that it gets notified when Endpoints
		// are discovered
		Properties props = new Properties();
		props.put(
				org.osgi.service.remoteserviceadmin.EndpointListener.ENDPOINT_LISTENER_SCOPE,
				getEndpointListenerScope());
		endpointListenerRegistration = getContext().registerService(
				EndpointListener.class.getName(), this, (Dictionary) props);

		// Register as EventHook, so that we get notified when remote services
		// are registered
		eventHookRegistration = getContext().registerService(
				EventHook.class.getName(), this, null);
	}

This gives us the behaviour that remote services are only notified when an event comes in, such as when a service is started. But there are many cases where this event won't come in, because the service has already been started; for example:

* BasicTopologyManager starts after MyService is registered
* BasicTopologyManager bundle is stopped/started
* BasicTopologyManager bundle is updated

In all of these cases, you're left with a service that's running, but not announced to the world. In a dynamic environment such as OSGi, I can't see how that's a feature. Why should I need to re-start my service just because the BasicTopologyManager has been restarted/updated/installed?

I really think that the start() of BTM needs to be updated, so as well as listening for new services coming and going, it exports any previously started services with the service.exported.interfaces set to an appropriate value. That way, you'll be able to stop() it (and have endpoints removed), and then start() it (and have the endpoints recreated).

Right now, I can't move forward with RSA because if the start order isn't explicitly specified, there's a good chance that it won't work. And that's not good for anyone.
Comment 12 Scott Lewis CLA 2011-06-17 21:07:57 EDT
(In reply to comment #11)
<stuff deleted>
> > If so, then I wouldn't refer to it as a bug, but rather an enhancement.
> 
> I look at this as being a show-stopping bug of trying to get remote services
> admin adopted. 
>It is not possible to have a dynamic system such as OSGi only
> work when bundles are started in a particular order, which is the case here. I
> spent a good few hours trying to figure out why my examples weren't working,
> only to find that if I stopped and started the service then it suddenly did
> work. This is not going to be the last time someone suffers from the same
> problem.

So Alex...I'm not clear on what would your solution would be.  Ultimately both the topology manager and the RSA impl have to be started...how are they started?  (how the topology manager is started is not specified by the spec).

> 
> > Essentially, the topology manager is the determinant of the export
> > process...i.e. the BasicTopologyManager has a service hook that receives a
> > synchronous notification when services are registered.
> 
> The problem is that it doesn't look for *existing* services at startup. That is
> independent for what the BasicTopologyManager does once it receives a
> synchronous notification.

It's another piece of functionality for the topology manager.  I don't disagree that you want this functionality for your use case...what I've been trying to say is that in *other environments* (e.g. Eclipse IDE after installing the entire ECF SDK), that this functionality is not necessarily the best idea...and so may not be the right thing to add to the only existing topology manager (BasicTopologyManager).

The spec is quite clear that there can/should/will be many topology managers...with different capabilities.  I'm hoping that ECF will have many topology managers...and have tried to make it as easy as possible to create new ones...for me, for you, for anyone that want's to control remote services in the extensible fashion that the RSA spec supports.


> 
> > If the behavior that you want is to have the topology manager *also* check for
> > all *previously* exported remote services (I mean previous to the topology
> > manager being started/registered as a service hook), then this is completely
> > fine...but either a) the BasicTopologyManager could be enhanced to support this;
> > or b) A new topology manager can be created to have this behavior.
> 
> This isn't about previously exported remote services. This is about exportable
> services which were started prior to the remote service admin being started.

Sorry...I used the wrong word above...I should have said '*previously* registered' rather than '*previously exported' in the sentence above.  I think we are talking about the same thing (services that are intended to be exported...that are registered before the BasicTopologyManager's service hook is installed).

<stuff deleted>
> 
> I really don't think this is about the implementation of any one topology
> manager - this is about what RSA should do by default.

No...it is about what the topology manager should do...and the default behavior is not specified by the spec.   The spec allows any variety of topology managers...and by spec the topology managers get to decide *if* and *when* a service is exported.  Some do it by registering a service hook...to hook into the registerService call (BasicTopologyManager), others can/could do it at other times...based upon other user or system events.

I'm fine with you suggesting that the default behavior...for the BasicTopologyManager (and/or others) should be that it should export all the previously-registered remote services...I just think that there are two completely reasonable approaches to implement this:  

1) add that behavior to the BasicTopologyManager...i.e. when the basic topology manager is created/started, simply look through all previously registered services, and if they have the appropriate remote service properties set, then export them

2) Create a new topology manager...with all the existing behavior of the BasicTopologyManager...and the logic defined above to export previously registered remote services...and start *it* instead of the BasicTopologyManager.

> 
> Consider the case of upgrading RSA in place in a long-running OSGi server-side
> component. It's up and running, RSA is exporting 5 service, everything working
> as expected. Now upgrade RSA (either bundle.update or
> bundle.uninstall/bundle.install). Suddenly, those 5 services disappear and
> don't come back. You have to either (a) cycle the services or (b) reboot the
> entire OSGi runtime. And that's because RSA is only listening for 'new'
> services.

Just to be clear...the RSA service (i.e. org.osgi.services.remoteserviceadmin.RemoteServiceAdmin) is...by spec...completely passive.  That is...without any topology manager to decide when it's relevant to export a remote service, and then actually *call* RemoteServiceAdmin.exportService(ServiceReference,Map), then no service would *ever get exported* (or imported for that matter).  That's why the topology manager is needed at all...to actually decide when it's appropriate to export publish for discovery/import a remote service...and then have a thread call the RSA.exportService method (along with telling some discovery mechanism to publish for discovery).  For reference, I would refer you to fig 122.1 in the RSA chapter in the enterprise spec:

http://www.osgi.org/download/r4v42/r4.enterprise.pdf

My understanding is that this was all put into the RSA spec to allow a variety of approaches for deciding if, when, how, local OSGi services get exported for remote access...and that these decisions are the role of the topology manager...of which there may (eventually) be many.

<stuff deleted>
> 
> The problem is essentially that the BasicToplogyManager only listens for new
> services, via the service event hook:
> 
>     public void start() throws Exception {
>         // Register as EndpointListener, so that it gets notified when
> Endpoints
>         // are discovered
>         Properties props = new Properties();
>         props.put(
>                
> org.osgi.service.remoteserviceadmin.EndpointListener.ENDPOINT_LISTENER_SCOPE,
>                 getEndpointListenerScope());
>         endpointListenerRegistration = getContext().registerService(
>                 EndpointListener.class.getName(), this, (Dictionary) props);
> 
>         // Register as EventHook, so that we get notified when remote services
>         // are registered
>         eventHookRegistration = getContext().registerService(
>                 EventHook.class.getName(), this, null);
>     }
> 
> This gives us the behaviour that remote services are only notified when an
> event comes in, such as when a service is started. But there are many cases
> where this event won't come in, because the service has already been started;
> for example:
> 
> * BasicTopologyManager starts after MyService is registered
> * BasicTopologyManager bundle is stopped/started
> * BasicTopologyManager bundle is updated
> 
> In all of these cases, you're left with a service that's running, but not
> announced to the world. In a dynamic environment such as OSGi, I can't see how
> that's a feature. Why should I need to re-start my service just because the
> BasicTopologyManager has been restarted/updated/installed?

I'm not saying it's a feature...it's clearly a limitation for your use case.  I'm only asserting that 

1) It may be/is a reasonable limitation for a *single* topology manager...given the other use cases where BTM will be used (e.g. in Eclipse+ECF SDK).
2) There's a design choice about how to fix it...for your use case...i.e. represented by my 1 and 2 above (in this comment).

> 
> I really think that the start() of BTM needs to be updated, so as well as
> listening for new services coming and going, it exports any previously started
> services with the service.exported.interfaces set to an appropriate value. That
> way, you'll be able to stop() it (and have endpoints removed), and then start()
> it (and have the endpoints recreated).

Ok...I'm willing to consider adding this functionality to the BTM...rather than creating a new topology manager class...if you contribute the addition to BTM.

All I've been trying to say is this:  the logic you describe for your use case, has some potentially negative (or at least very confusing) connotations for other use cases...when added to BTM (currently the only choice for tm that ECF provides).  That's why it *might* be worthwhile to consider a new topology manager class (maybe just a subclass of BTM?) in order to deal with your use case *and* not create problems under other use cases (i.e. so that people could choose the topology manager to use under different circumstances).

> 
> Right now, I can't move forward with RSA because if the start order isn't
> explicitly specified, there's a good chance that it won't work. And that's not
> good for anyone.

So Alex...what is your expected behavior in terms of starting a framework and the RSA impl?  According to the RSA spec, a topology manager has to be created (and therefore something associated with RSA has to be started) before anything can be exported via any RSA impl.  How would you have us create/start any topology manager (since RSA impl is not part of the framework itself...and so has to be started at some point).

I think we should stick to the question of whether to add the behavior you are asking for to the BTM or create a new/separate topology manager class.  In either case, however, something has to create the topology manager.

Note also:  the BTM is the only one presently implemented (warts and all) at least partially because we/I as primary implementer simply can't do everything at once...and have very few resources...even for my own time.  Perhaps it 'should' have much more functionality...ok...I accept that...but if you want the additional functionality...then someone's got to add it...and we're currently more limited than we would like...so I/we could use some help to get everything that everyone expects and would like (in addition to providing a completely free implementation of an enterprise spec, and providing more complete documentation, and providing support, and...).
Comment 13 Scott Lewis CLA 2011-06-18 12:08:24 EDT
Here is logic (available to any subclass of AbstractTopologyManager...i.e. BasicTopologyManager or other/new topology manager) for exporting any/all *previously* registered remote services:

ServiceReference[] existingServiceRefs = getContext().getAllServiceReferences(null, null);
if (existingServiceRefs != null)
  for (int i = 0; i < existingServiceRefs.length; i++)
    // This method will check the service properties for required
    // props associated with the OSGi remote service spec. If
    // registered as a remote service, it will export the remote
    // service
    // if not it will simply return/skip
    handleServiceRegistering(existingServiceRefs[i]);

getContext() returns BundleContext (for the bundle that creates the topology manager), and handleServiceRegistering does the actual work of checking to see if a service reference is intended to be a remote service, and if so, exporting the remote service.

This code could be placed in BasicTopologyManager.start(), prior to the service hook and endpoint listener registrations.

It could also be placed in a another/new topology manager class...i.e. for easy ability to optionally use it...and or use it in other use contexts.

The selection of where to put it is what I would like to consider. 

In either case, I believe I should make it's usage optional (perhaps via another system property), in order to support multiple use cases.
Comment 14 Markus Kuppe CLA 2011-06-18 12:27:59 EDT
(In reply to comment #13)
> In either case, I believe I should make it's usage optional (perhaps via
> another system property), in order to support multiple use cases.

What would be the default? I lean towards turning it on by default.
Comment 15 Scott Lewis CLA 2011-06-18 15:28:14 EDT
(In reply to comment #14)
> (In reply to comment #13)
> > In either case, I believe I should make it's usage optional (perhaps via
> > another system property), in order to support multiple use cases.
> 
> What would be the default? I lean towards turning it on by default.

ok.  I expect Alex would also lean toward turning it on by default...as it fits his use case/this bug...and at the moment 'on' seems an ok default to me.  But as you can imagine, it has a fair number of implications...since as you can see it's applied to every service reference in the runtime upon tm startup...so I would like to think about it a bit.
Comment 16 Scott Lewis CLA 2011-06-19 20:03:32 EDT
I've pushed to master a proposed fixed for the BasicTopologyManager that executes the logic described in comment 13.  See

http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=1af9f1483cb32f89c2a753748b93f668f5f1579a

I've decided for the time being to enhance the BasicTopologyManager rather than creating a new topology manager class.  I think that ultimately a new bundle...just for ECF topology managers...will/should be created, but that's a bigger job than is warranted right now.

As discussed in comments, the default is to read all previously registered services (registered previous to the BTM's start method being called), and if they were registered as remote services to export them automatically.

I'll leave this bug open for testing and comments about appropriateness for other use cases.
Comment 17 Alex Blewitt CLA 2011-06-20 08:47:28 EDT
We can be a bit more performant by only requesting those services with service.exported.interfaces, right? This will pump through all services, whether or not they're exposed (and then ignore them in the call).

I think we can do this by changing:

  private String exportRegisteredSvcsFilter = System
			.getProperty("org.eclipse.ecf.osgi.services.basictopologymanager.exportRegisteredSvcsFilter"); 
			
to 

  private String exportRegisteredSvcsFilter = System
			.getProperty("org.eclipse.ecf.osgi.services.basictopologymanager.exportRegisteredSvcsFilter", "(service.exported.interfaces=*)");
			
There's still the potential for a race condition (service starting between the first scan and the adding hook listener).

Is there a problem with calling the handleService twice for the same ServiceReference? If not, then putting the call for exportRegisteredServices after the registration for the EventHook will mean that we subscribe for new events, then look for all old events. That way we're guaranteed to see everything, but we might see the same thing twice (e.g. get event and then get a lookup). The only way of avoiding that for real would be to maintain e.g. a list of service IDs that had already been processed but this might be overkill.
Comment 18 Scott Lewis CLA 2011-06-20 10:53:44 EDT
(In reply to comment #17)
> We can be a bit more performant by only requesting those services with
> service.exported.interfaces, right? This will pump through all services,
> whether or not they're exposed (and then ignore them in the call).
> 
> I think we can do this by changing:
> 
>   private String exportRegisteredSvcsFilter = System
>            
> .getProperty("org.eclipse.ecf.osgi.services.basictopologymanager.exportRegisteredSvcsFilter"); 
> 
> to 
> 
>   private String exportRegisteredSvcsFilter = System
>            
> .getProperty("org.eclipse.ecf.osgi.services.basictopologymanager.exportRegisteredSvcsFilter",
> "(service.exported.interfaces=*)");

Sure.

> 
> There's still the potential for a race condition (service starting between the
> first scan and the adding hook listener).
> 
> Is there a problem with calling the handleService twice for the same
> ServiceReference? 

Yes.

>If not, then putting the call for exportRegisteredServices
> after the registration for the EventHook will mean that we subscribe for new
> events, then look for all old events. That way we're guaranteed to see
> everything, but we might see the same thing twice (e.g. get event and then get
> a lookup). The only way of avoiding that for real would be to maintain e.g. a
> list of service IDs that had already been processed but this might be overkill.

Yeah, I think it would be overkill...for this topology manager, anyway.
Comment 19 Alex Blewitt CLA 2011-06-20 14:19:28 EDT
See https://github.com/alblue/ecf/tree/bug349078 for a reordering of the registration of eventhook and existing service publication to reduce the race conditions

commit 531ef894bc3467391922fdac2432d1ea83c42986
Comment 20 Alex Blewitt CLA 2011-06-20 14:21:31 EDT
Created attachment 198276 [details]
Reorders the registration of EventHook and existing services

I wrote this myself, EPL, yada yada git process fundamentally broken at Eclipse.org etc.
Comment 21 Scott Lewis CLA 2011-06-20 15:38:44 EDT
(In reply to comment #20)
> Created attachment 198276 [details]
> Reorders the registration of EventHook and existing services
> 
> I wrote this myself, EPL, yada yada git process fundamentally broken at
> Eclipse.org etc.

Rather than worry about reviewing this patch for this trivial move of a block of code written by me, let's just pretend that you asked for the block move in prose, and I made the change...ok?

I've made the change and pushed to master:

http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=e41dc1510e9e9a05e52ac9d284d129cce1d11646

Thanks for the bug report.  As I mentioned in previous comments, in the longer term (ECF 3.6) I would like to introduce a new bundle(s)...with new packages...with possibly several topology manager classes...suitable for addressing different remote service export/import cases (OSGi server as service host, OSGi server as service consumer, both, Eclipse as remote service consumer, Eclipse as remote service host, etc) and allowing much more configuration and control than the current BasicTopologyManager supports.  Alex hopefully you will consider contributing to that effort.