Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[january-dev] remote services with IDatasets

Hi Folks,

Some of you may be familiar with ECF's impl of OSGi remote services [1]. Remote services allows OSGi services (defined as one or more java interfaces) to be made available/proxied outside of process. I believe ICE is using it currently.

ECF's implementation of remote services is pluggable and allows different transports to be used (at service registration time) to remote a service. We call these 'distribution providers' and now have quite a few of them [2], from rest/jax-rs, to rosgi, to jms, to xmlrpc, to mqtt, to plain 'ol tcp and others. These distribution providers encapsulate both the wire protocol/transport (e.g. http) as well as the serialization scheme (e.g. json).

We have recently completed and are testing a distribution provider that uses Py4j + Google's Protocol buffers [3]. After some experimentation, I've found that protocol buffers (binary mode) are fairly performant and relatively space efficient on serialization and deserialization, and Py4j is fairly performant *if* parameters and return values are serialized to byte[] (and therefore passed by value) rather than passed by reference...reference is the default for py4j...except for byte []s. Pass-by-reference often causes many round trips between (Python<->Java) in Py4j and this can quickly become a major performance problem with large amounts of data being exchanged (our observation).

So what's the point of this? Some of us are using [3] to provide a modular, performant localhost interaction between OSGi runtimes and Python code...by using OSGi remote services and ECF's Py4j-based and protocol buffers-based distribution provider [3]. This interaction is bi-directional as OSGi services are bi-directional, but for our use case we have been focusing on java code calling into data analysis code implemented in Python.

An example using [4] is provided here [4]. This example has the OSGi service interface in java, and this service is *implemented* by the python code in the python-src directory [5]. The consumer bundle shows how the service gets injected (by DS) at runtime and then is used as with any other OSGi service.

One of the things I've been contemplating is to use protobuf to define the serialization/deserialization of January IDatasets...to and from numpy Datasets (or perhaps some subclass or metatype). This would allow performant and easy exchange of Dataset/IDatasets between Python and Java...which is what we (and perhaps others) are interested in.

I wanted to explain this work publicly to this list, as I won't be able to attend the upcoming summit. I am, however, looking for possible collaboration on parts of things and willing to make contributions to January if they are desired.

Thanks,

Scott

[1] https://wiki.eclipse.org/Eclipse_Communication_Framework_Project

[2] https://wiki.eclipse.org/Distribution_Providers

[3] https://github.com/ECF/Py4j-RemoteServicesProvider

[4] https://github.com/ECF/Py4j-RemoteServicesProvider/tree/master/examples

[5] https://github.com/ECF/Py4j-RemoteServicesProvider/tree/master/examples/org.eclipse.ecf.examples.protobuf.hello





Back to the top