Bug 279303 - Race Condition on Client Startup
Summary: Race Condition on Client Startup
Status: NEW
Alias: None
Product: Riena
Classification: RT
Component: communication (show other bugs)
Version: 1.1.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-05 15:12 EDT by Olaf Fricke CLA
Modified: 2011-08-10 03:33 EDT (History)
2 users (show)

See Also:


Attachments
Two bundles containing a demonstration sample (18.00 KB, application/octet-stream)
2009-06-16 15:18 EDT, Olaf Fricke CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Olaf Fricke CLA 2009-06-05 15:12:41 EDT
Build ID: Riena-1.1.0.RC1-platform-win32.win32.x86.zip

Steps To Reproduce:
1. Create another extension of "org.eclipse.riena.communication.core.remoteservicefactory" to provide a new communication protokoll in a new bundle
2. Create some remote proxies in the activator of that bundle (they can use the hessian protocal as well.
3. Create a launch configuration that contains the new bundle and the bundle org.eclipse.riena.security.client.startup
4. try to use one of the remote proxies


More information:
Hello everybody,

this week, I encountered a nasty race condition between bundles on the startup of a riena client. The situation was as follows:
1.) Some weeks ago, I wrote my own extension of "org.eclipse.riena.communication.core.remoteservicefactory" to provide a new communication protokoll "local" to be able to redirect service calls to a local implementation (mainly for testing purpose, but also valid for a printing service).
2.) I wrote a test bundle that uses a 'local' service and started everything with an OSGi Frameword runner. Everything worked fine.
3.) Now I integrated everything in our riena client and suddenly all my services were null, even the remote ones. Surprinsingly I never got any exception.

After debugging the riena client I discovered the following race condition:
- The bundle "org.eclipse.riena.security.client.startup" defines an "org.eclipse.riena.core.startups" extension point and is thus started very early from the riena core.
- Its activator registers two remote proxy objects for the IAuthenticationService.
- During that registration the very first RemoteServiceFactory is created, calling "Inject.extension(IRemoteServiceFactoryProperties.EXTENSION_POINT_ID)".
- The Injector resolves all extensions for the extension point and therefore loads the class of my own factory.
- Now comes the nasty part: The EclipseLazyStarter detects in its method postFindLocalClass that another ClasspathManager exists on the activation stack and therefore calls secureAction.start(bundle, Bundle.START_TRANSIENT);
- That call triggers the Activator of my own communication bundle. In that activator all remote and local services are loaded and remote proxies are created. Unfortunately, the factories for the proxies are not ready yet.
- The method RemoteServiceFactory#createProxy cannot find a matching protocol and thus returns null.
- The fallback is to call RemoteServiceFactory#createLazyProxy. But that method fails in my environment, because it uses the ClassLoader of the RemoteServiceFactory class and that classloader does not know the service class (the current context classloader would know it!).
- I end up with a null value for all my service proxies.


I worked around this race condition by splitting my bundle into two: the first bundle contains everything for the local protocol, the second one contains the activator for creating the proxy object. Due to this split the first bundle can be fully activated and thus the protocols can be registered, before the second bundle starts.

Another solution could be to create the two needed IAuthenticationService proxies as lazy proxies, but I have not tried it.
Comment 1 Stefan Liebig CLA 2009-06-15 02:23:50 EDT
Hi Olaf,

Could you please provide your sources/bundles for easier reproducing this behavior?

Tschüß,
Stefan
Comment 2 Olaf Fricke CLA 2009-06-16 15:18:30 EDT
Created attachment 139345 [details]
Two bundles containing a demonstration sample

Hi Stefan,

attached you can find an archiv containing two bundles. The first bundle contains just a simple interface that will be used to defined a remote service. I need this bundle, because I want to demonstrate how to create a remote service without using Eclipse-BuddyReference.
The second bundle contains the classes that demonstarte the race condition. First of all, the class LocalServiceFactoryJava defines a new communication protocol (named local). This new protocal is needed to trigger the race, in combination with the usage of the security.client.startup bundle.
The Activator contains the demonstration code, consisting of four scenarios:
1.) The first scenario shows that a remote service can be created with a lazy reference, if the Eclipse-BuddyReference is used.
2.) The second scenario demonstrates that no proxy can be created, if no Buddy is defined.
3.) The third scenario shows that again no proxy can be created, even if the context class loader is changed.
4.) The fourth scenario finally proves the race condition: If I postpone the proxy creation until the security.client.startup bundle has been activated, a proxy with a real reference can be created for a non buddy interface by using a context classloader, that knows the services interface as well as the hessian classes.

I hope these explanation are helping to reproduce the race condition. I have developed the code aginst 1.1.0-RC3 of Riena.

Best regards,
Olaf

PS: Please have a look at my comment under http://dev.eclipse.org/mhonarc/lists/riena-dev/msg00738.html
Comment 3 Stefan Liebig CLA 2009-06-23 06:43:50 EDT
Hi Olaf,

Thanks for the test bundles. Yes, I could reproduce the "race condition".

I think this report describes (at least) two problems that got intertwined (correct me if I am wrong):
- the initialization problem of the RemoteServiceFactory
- and the problem of loading classes that are not visible for ´generic´ bundles, e.g. hessian

The initialization problem occurs because your local protocol factory class is in the same bundle as the usage of the remote service factory. Creating the LocalServiceFactoryJava within  RemoteServiceFactory´s initialization causes the activation of the bundle that contains the LocalServiceFactoryJava. Within the activators start() method the RemoteServiceFactory is used which is still not yet initialized.
To solve this problem it should be sufficient to separate the concerns, i.e. have two separate bundles, one containing only the LocalServiceFactoryJava and the other using the ´local´ protocol.

For the other problem (class loading) there is already an Eclipse way of dealing with that: buddy class loading. I think it is a practical approach since it does not pollute the (business) code, e.g. with class loading ´tricks´.
Comment 4 Olaf Fricke CLA 2009-06-25 12:26:25 EDT
Hi Stefan,

I agree with you, that splitting my bundle into two solves the first problem -- that's exacly the way I did it in our application.

But I do not agree with your opinion, that buddy class loading does not pollute the business code. The point is, that it is not enough to declare the buddy, but also a dependency to o.e.r.communication.core:

Eclipse-RegisterBuddy: org.eclipse.riena.communication.core
Require-Bundle: org.eclipse.riena.communication.core

This makes every (!) business api dependend from the technical communication layer. This is indeed pollution.

My way out of this trap is to provide hessian with a classloader that can load all needed classes by temporarly setting the context classloader. But this works only, if the createProxy method of the RemoteServiceFactory succeeds. If it fails, Riena calls createLazyProxy, where no context classloader is used. Hessian can be used withot buddy class loading, Riena not.
Comment 5 Stefan Liebig CLA 2009-06-29 02:42:02 EDT
Hi Olaf,

Ok, Buddy class loading pollutes the bundle but not the Java code. I think this a little bit better, but still not good (I intentionally did not use the word ´perfect´ ;-)
So, I was thinking about a ´good´ solution and I came up with the following. I am not sure if this is a general solution to this problem but it worked on a toy example here for me. My solution:
- create a fragment bundle with host bundle org.eclipse.riena.communication.core
- import the packages/bundles needed of the common bundle into the fragment bundle
- add that fragment bundle to your launch configuration
- remove buddy stuff and dependencies from you common bundle

With this the common bundle does now contain nothing related to riena communication.

Could please give this a try and comment on it?
Comment 6 Olaf Fricke CLA 2009-06-29 18:03:16 EDT
Hi Stefan,

I tried your proposal and it worked. Can you please give me some more details what is happening to the classloading when using fragments?

Nevertheless, your solution looks for me like 'out of the frying pan into the fire'. Previously, I had to add a dependency to 'o.e.r.communication.core' to each of my business api MANIFEST.MF. 

Now I have to add each business api bundle to the fragment bundle. That implies that whenever a new business api is coming, I have to change one central bundle, the fragment. This sounds not so good.

Maybe its time to look at the root of this issue again: I tried to find a way to avoid buddy classloading for remote service invocation with hessian. I found a way by supplying a context classloader to hessian that is capable of loading both the business api interface and the interface com.caucho.hessian.io.HessianRemoteObject.

This worked fine just until I trapped into the race condition: when creating the proxy for the remote service before the communication protocals have been registered, the real proxy can not be created and a lazy proxy needs to be created. The lazy proxy uses the classloader of the RemoteServiceFactory to lookup the business interface class and that fails, if no buddy classloding is used.

The obvious solution would be to use the classloader of the business interface itself (it is available as class in the RemoteServiceDescription). But this would only solve the first step, creating the lazy proxy. Later the createProxy method would be called again, this time without an adequate classloader.

I need a solution, that either postpones the creation of remote service until all protocols have been registred or that allows me to define the classloader to be used when creating the real proxy.

Best regards,
Olaf
Comment 7 Stefan Liebig CLA 2009-06-30 03:31:35 EDT
Hi Olaf,

I think fragments live inside of their host bundle, i.e. they have the same class loader as used for the bundle.
The benefits of this possibility is that the business API stays untouched (no ´strange´ dependencies).
You also make a assumption that does not hold. Instead of one fragment that collects all your business APIs you may have multiple fragments each for one business API. I haven´t tried that but it should work.

I also made a few experiments with providing explicit class loaders to the communication layer, but there is an issue where I stumbled upon, i.e.
a RemoteServiceDescription can be created with only the service class name in it (thus no attached class loader to use). This happens e.g. on the server side.
Comment 8 Olaf Fricke CLA 2009-07-03 13:09:31 EDT
Hi Stefan,
 
I do not know anything about the server side of riena, since we are working with a JBoss on the server side. We are using JNDI to lookup the service implementation.

For the client side, I found a solution to work around the race condition: I defer the creation of the remote proxies until I am sure, that all communication protocols have been fully initialized and are registered in the static map in RemoteServiceFactory.

To archieve the deferment, I first call 'new RemoteServiceFactory()' in the Activator of an initialization bundle. In the wiring class of my local communication bundle I register a BundleListener that listens for the initialization bundle to be started. Only that listener performs the cretion of the remote proxies.

The main purpose of the deferment is to avoid any call to the createLazyProxy method because that method does not use my context class loader.

Another workaround might be to initialized the static map of the class RemoteServiceFactory in the activator of o.e.r.communication.core, but I hav't gone that road (yet?).

Best regards,
Olaf
Comment 9 Stefan Liebig CLA 2011-08-09 09:03:45 EDT
Hi Olaf,

We are currently going through our open bugs. 
Is this one still relevant? Do you avoid this situation?

Tschüß,
Stefan
Comment 10 Olaf Fricke CLA 2011-08-10 02:56:27 EDT
Hi Stefan,

I do not know if the race condition still exists or not, since we have seperated the concerns and are still using two bundles. We can live with this situation. But I think it might be a good idea to detect such a situation by the framework and give the user a hint, how to solve the issue.

The other part of this jira task is connected to class loading issues with bundles. We worked around this problem by extending the hessian protocol: on the server side (in our case a HessianDispatcherServlet running on a JBoss server) we are writing not only the classname into the hession output stream, but also the bundle symbolic name and the bundle version. On the client side, we have extended the class loading mechanism of hessian to read both values and to load the requested class from the specified bundle. Now we do not need a buddy-solution or fragment solution any more.

Best regards,
Olaf
Comment 11 Stefan Liebig CLA 2011-08-10 03:33:42 EDT
Hi Olaf,

This is very interesting and a quite clever solution. Do you think your solution is general enough to be used by Riena and if so would you mind contributing it.

Tschüß,
Stefan