Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] service discovery working even if port mis configured

Hi Peter,

On 4/24/2014 7:48 AM, Peter Hermsdorf wrote:
Hi Scott,

I don't think I understand what exactly you mean by 'stopping the host'. Do you mean just remote service unregistration?...or do you mean unceremonious host shutdown (e.g. kill -9 ), or something in between, or ?
<deleted>
I'm not sure. I think it hinges on what you want WRT the 'stopping the host' and the 'restart leading to new bind event'.
short answer: in any case ;)

'Any case' would indeed be nice, but of course what we are talking about is byzantine fault tolerance [1]...a very hard set of distributed systems problems.


We have a RCP client using service(s) of a single server instance. When that server goes down (software update, network problem, crash etc) the client can continue to work (just can't use these services in that time), but need a way to reconnect/rediscover the service when the server is online again....(without Client restart)

in the end the client needs to get an unbind when the service is not available and a bind when he is online again.

Ok I see.

I'm going to break this down a little bit...as it relates to remote services...just to talk through the issues and choices that can be made about discovery, distribution, and their combination for implementing remote services. Please forgive if this seems a little long-winded, but in truth there are no technical silver bullets here.

First...to get the client to 'unbind'...i.e. have the remote service proxy go away when the underlying host crashes...or the network partitions...it requires that the distribution provider do some failure detection. The ECF generic provider does have/do this failure detection, and so when the host goes down (e.g. crashes), the generic provider will detect this, and the remote service proxy will be unregistered/go away/unbound...as you've already found in your tests. Note this is not necessarily true of all distribution providers and/or implementations of OSGi remote services...for example if your distribution provider is based upon connectionless http, then the http server may go down, and if the client already has a working proxy then it may not be able to know that the remote service host has crashed/become unavailable. But again, the ECF generic provider does do such failure detection, and so the proxy unregistration upon host crash does occur.

Now...to get the client/service consumer to 'rebind' to the new service...when the host recovers and it becomes available...means that the new service instance metadata (edef) has to be communicated to the consumer *at that time*...i.e. dynamically via some sort of network discovery (zookeeper, etc) rather than an edef file. This is why you are not seeing the rebind happen with the static (or template-based) edef...because that's completely initiated by the consumer/client...and doesn't happen when the host recovers and makes a new instance of the remote service available.

In short, I think what you probably need is *both* a distribution provider with failure detection (generic, r-osgi, jms), and to use some network discovery provider (e.g. zookeeper, dnssd, jslp, zeroconf). Then the distribution provider can detect the host failure...to unbind the remote service proxy when a crash happens...and the network discovery can communicate the host's making a new remote service available...*after* it becomes available.

Given your initial explanation of the remote service metadata (changing a few of the edef property values), I had thought that using edef or edef templates would meet your use case. But it seems you have some additional requirements that make the dynamic aspect of network discovery necessary...as I've outlined.

Hopefully this discussion is helpful. I do wish that the failure/reliability properties of remote services could be entirely hidden...but there's lots of distributed systems work that shows such network transparency is not really possible (or at least not well advised). IMO, one thing that OSGi remote services uniquely provide...that makes them very attractive for remote services in general...is the ability to map network-based failure to the dynamics of OSGi and OSGi services (i.e. the service instances naturally come and go at runtime).


I hope that this explanation is somehow more clear ;)

I'll see you and raise you on that hope :).

Scott



Back to the top