Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] How to deal with recovering an ecf remote service connection

Hi Wim,

 

Lets’ see if I get this correctly:

 

Let’s say that A wants to know when a touch sensor on B is getting pressed (true/false). What I am doing right now is: A puts up a whiteboard service (TouchSensorSniffer). This service is picked up by B using a ServiceTracker [1]. B then holds on to the service and calls TouchSensorSniffer#onStateChanged whenever the touch sensor state changes. (Of course I also clear my internal cache when the service gets removed.)

 

I’ll use this simple setup to describe my situation: After both A and B are started everything is fine and B has discovered the TouchSensorSniffer from A. Now I disconnect B from the network by pulling the LAN cable. Both A and B continue to run. If the state of the touch sensor changes at this moment B would try to send this information over the TouchSensorSniffer to A. But since B is disconnected from the network, this request fails after the timeout. B thinks this is a temporary error and just logs it.

 

If I reconnect the LAN cable after a couple of seconds and the press my touch sensor again, B will again use the TouchSensorSniffer service to send the state change. This time everything works out because the network is back up: Cool. But let’s assume I don’t reconnect right away but I wait until the keepalive period (default 30 seconds) is over. What happens now is that the TouchSensorSniffer is unregistered in B which is ok, since we assume that the connection is gone for good. If I touch the sensor now B sees that no TouchSensorSniffer services are registered and therefore doesn’t send this information anywhere. Also good. Now, after 60 seconds, I reconnect the LAN cable. Both A and B are still running but B doesn’t pick up on the TouchSensorSniffer from A. They stay disconnected.

 

This last part is based on my observations, so I’m not sure I understand this completely. Does my description come close to the truth and is this the result that is to be expected? Or would you expect the discovery on B to find the TouchSensorSniffer from A again after the network connection has been reestablished?

 

Or is the problem that I am holding on to an instance of TouchSensorSniffer on B? I could stop using a ServiceTracker and look into the OSGi service registry directly to search for all implementations of TouchSensorSniffer anytime the state changes via BundleContext#getServiceReferences. I see that this would change to situation slightly, because I would use BundleContext#ungetService right after sending the information and then getting the service again for the next event. But I am not sure that this would change the basic situation, since the registry itself is already caching the available remote services. Or am I wrong about this?

 

Cheers,

Christoph

 

[1]: https://osgi.org/javadoc/r6/core/org/osgi/util/tracker/ServiceTracker.html

 

 

Von: ecf-dev-bounces@xxxxxxxxxxx [mailto:ecf-dev-bounces@xxxxxxxxxxx] Im Auftrag von Wim Jongman
Gesendet: Montag, 1. Februar 2016 15:44
An: Eclipse Communication Framework (ECF) developer mailing list.
Betreff: Re: [ecf-dev] How to deal with recovering an ecf remote service connection

 

Indeed, the clients holds on to some services from other gadgets. In OSGi terms, holding on to a service is a no-go. However, when looking at remote services, it looks like a logical thing to do. But when you then finally start communicating the service might be stale.

So instead of A holding on to a service of B and then, at some point, try to read B's state should be switched by A putting up a service which B consumes and uses to communicate its state (and then can forget about the service).

 

 

On Mon, Feb 1, 2016 at 2:14 PM, Keimel, Christoph <c.keimel@xxxxxxx> wrote:

Hi Wim,

 

yeah, exactly J.

 

Thanks for the tip! This strategy would work great for the services that are exported by the lost “client”. But I might still have to “re-discover” the services that the “client” consumes (i.e. whiteboard services), as these could still be consumed by other gadgets and would therefore not be re-registered automatically with this strategy. I’ll think about this some more …

 

Cheers,

Christoph

 

 

Von: ecf-dev-bounces@xxxxxxxxxxx [mailto:ecf-dev-bounces@xxxxxxxxxxx] Im Auftrag von Wim Jongman
Gesendet: Montag, 1. Februar 2016 13:28
An: Eclipse Communication Framework (ECF) developer mailing list.
Betreff: Re: [ecf-dev] How to deal with recovering an ecf remote service connection

 

Hi Cristoph,

It looks like you ran into the first fallacy of distributed programming [1], "The network is reliable".

When reading your post it looks like the "client" thinks that everything is ok but in fact the "server" has lost the service due to a network problem. Could you decide on a strategy where the "client" checks periodically if it's service is still being consumed? The "client" could for example drop its service if it was not used for the last 10 seconds and register it again.

Cheers,

Wim

 

On Mon, Feb 1, 2016 at 11:53 AM, Keimel, Christoph <c.keimel@xxxxxxx> wrote:

Hi everybody,

 

I am working on the stability of the remote services network for our live escape game (a bomb diffusing simulation) and I have some general questions on how to deal with recovering an ecf remote service connection.

 

Let me explain what I have done so far and where I am at the moment. I’ll start with the setup:

-          Equinox runtime

-          ECF 3.12.0

-          Distribution Provider: generic

-          Discovery Provider: zeroconf

 

I use remote services in a pretty basic way:

-          No dependencies on other services

-          Only sync calls with a small payloads

-          Tracking of services with a ServiceTracker (no ds)

 

The system consists of gadgets that run on Raspberry Pis (Model B+ or 2) and a desktop application for the game operator that shows all the sensor states in the network. The operator application uses whiteboard services that get informed when the state of some sensor changes. There is also a game timer (the bomb timer) that notifies the operator (and all other interested whiteboard services) when the countdown changes. Additionally the gadgets host remote services that let the operator remote control them if necessary (i.e. trigger an actor, add some time to the countdown, etc.).

 

I have tested that the tracking of services works smoothly. You can stop any application and the services of this host will be unregistered correctly. If you restart the application the services will be registered again and whiteboard services will be picked up as is to be expected. You can also just kill a device (cut the power of a pi) which will result in the same thing.

 

We have been using this system now for a couple for weeks on site and I got some reports of gadgets dropping out of the network for no apparent reason when the system is running for a longer time or not showing up on startup. This is pretty bad for the operator, as he usually has to reboot the whole system and try again to recover from this situation. I suspect that this has to do with the network situation on site, as they use pretty cheap equipment and there is also a lot of network traffic going on (they also have lan cameras and stuff like that). So I assume that the network connection is not very stable.

 

I did some more tests on this over the weekend. I tried to simulate the situation by temporarily removing the network connection by pulling the LAN cable. This was pretty interesting as it lead to some problems at first. I did two things to remedy this:

 

1) I am now using a different Thread (Executor) on the gadgets to do the actual remote calls, so that the application thread is not directly affected.

 

2) I set ecf.remotecall.timeout to a pretty short value (100 ms). I'm not sure why, but using a longer value (3000 ms) lead to a situation where the application did not recover very well when a lot of remote calls where waiting for the timeout at the same time (tested on a Raspberry Pi 2).

 

With these two changes the connection stays stable, even when I "pull the plug" for a couple of seconds. As soon as I reconnect, the connection resumes nicely. After 30 seconds, the connection breaks down which makes sense as this is the default keepalive value (if I understand this property correctly). In this case the services are unregistered which is picked up by the ServiceTracker. But after this, there is no recovery when I reconnect the device to the network and I have to restart the application.

 

And here come the questions I have at the moment:

 

Is this the behavior you would expect?

If yes, is there a way to remotely trigger a “re-discovery” (I know the IP address of the device I am looking for)?

Or would you expect the discovery mechanism to find the reconnected host and import its services again?

 

Apart from this, if you have any tips on that to look for or how to simulate a “bad network”, these are highly appreciated.

 

Thanks for the support!

 

Cheers

Christoph

--

Christoph Keimel

EM-SOFTWARE GmbH

Oskar-Messter-Straße 29

85737 Ismaning

 

Tel: 089 / 996547-26

Fax: 089 / 996547-99

Internet: http://www.emsw.de

EMail: c.keimel@xxxxxxx

 

Geschäftsführer: Dipl.-Inf. (FH) Georg Engl UST-Id-Nr.: DE 131 175 644, HRB 80271 München

 


_______________________________________________
ecf-dev mailing list
ecf-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/ecf-dev

 


_______________________________________________
ecf-dev mailing list
ecf-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/ecf-dev

 


Back to the top