Bug 378350 - [Discovery] Upgrade to ZooKeeper 3.4.x release
Summary: [Discovery] Upgrade to ZooKeeper 3.4.x release
Status: NEW
Alias: None
Product: ECF
Classification: RT
Component: ecf.discovery (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 minor (vote)
Target Milestone: 3.7.2   Edit
Assignee: ecf.core-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-03 07:31 EDT by Wim Jongman CLA
Modified: 2019-01-03 12:02 EST (History)
4 users (show)

See Also:


Attachments
mylyn/context/zip (1.59 KB, application/octet-stream)
2013-11-05 06:27 EST, Markus Kuppe CLA
no flags Details
mylyn/context/zip (1.73 KB, application/octet-stream)
2013-11-06 09:51 EST, Markus Kuppe CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Wim Jongman CLA 2012-05-03 07:31:00 EDT
When ZooDiscovery is build with 3.4 from orbit 

http://ftp.osuosl.org/pub/eclipse/tools/orbit/downloads/drops/S20120428190502/repository/

then compile errors occur in ZooDiscoveryContainer
Comment 1 Wim Jongman CLA 2012-05-03 07:47:50 EDT
Added a dependency to (3.3.3 3.4.0] for now until we figure out why the API changed.
added this change to master
Comment 2 Markus Kuppe CLA 2013-11-05 05:40:36 EST
Description	Resource	Path	Location	Type
NIOServerCnxn.Factory cannot be resolved to a type	ZooDiscoveryContainer.java	/org.eclipse.ecf.provider.zookeeper/src/org/eclipse/ecf/provider/zookeeper/core	line 200	Java Problem
QuorumPeer.Factory cannot be resolved to a type	ZooDiscoveryContainer.java	/org.eclipse.ecf.provider.zookeeper/src/org/eclipse/ecf/provider/zookeeper/core	line 226	Java Problem
QuorumPeer.Factory cannot be resolved to a type	ZooDiscoveryContainer.java	/org.eclipse.ecf.provider.zookeeper/src/org/eclipse/ecf/provider/zookeeper/core	line 226	Java Problem
Factory cannot be resolved to a type	ZooDiscoveryContainer.java	/org.eclipse.ecf.provider.zookeeper/src/org/eclipse/ecf/provider/zookeeper/core	line 200	Java Problem
The import org.apache.zookeeper.server.NIOServerCnxn.Factory cannot be resolved	ZooDiscoveryContainer.java	/org.eclipse.ecf.provider.zookeeper/src/org/eclipse/ecf/provider/zookeeper/core	line 25	Java Problem
Comment 3 Markus Kuppe CLA 2013-11-05 05:46:19 EST
Same issue over at linkedin https://github.com/linkedin/linkedin-zookeeper/pull/8
Comment 4 Markus Kuppe CLA 2013-11-05 06:27:27 EST
Change reg pushed to gerrit [1]

[1] https://git.eclipse.org/r/18074
Comment 5 Markus Kuppe CLA 2013-11-05 06:27:30 EST
Created attachment 237190 [details]
mylyn/context/zip
Comment 6 Markus Kuppe CLA 2013-11-05 06:30:11 EST
Corresponding CQ https://dev.eclipse.org/ipzilla/show_bug.cgi?id=7695
Comment 7 Wim Jongman CLA 2013-11-05 11:31:06 EST
Gerrit and I pushed:

http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=ac5fa5c492fad225d4708bf29aa18292b4738ebb

Thanks for the patch Markus.
Comment 9 Markus Kuppe CLA 2013-11-05 11:48:06 EST
Seem Zookeeper 3.4.x requires org.slf4j (1.6.0,2.0.0]

!ENTRY org.eclipse.osgi 2 0 2013-11-05 16:44:38.760
!MESSAGE One or more bundles are not resolved because the following root constraints are not resolved:
!SUBENTRY 1 org.eclipse.osgi 2 0 2013-11-05 16:44:38.763
!MESSAGE Bundle reference:file:/opt/hudson/jobs/C-HEAD-discovery.zookeeper.feature/workspace/targetPlatformPath/plugins/org.apache.hadoop.zookeeper_3.4.5.v20121214-1350.jar was not resolved.
!SUBENTRY 2 org.apache.hadoop.zookeeper 2 0 2013-11-05 16:44:38.766
!MESSAGE Missing imported package org.slf4j_[1.6.0,2.0.0).
Comment 11 Markus Kuppe CLA 2013-11-06 02:55:50 EST
Once the CQ has been approaved, apply the following patch:

diff --git a/providers/bundles/org.eclipse.ecf.provider.zookeeper/buckminster.cspex b/providers/bundles/org.eclipse.ecf.provider.zookeeper/buckminster.cspex
index d037070..6448324 100644
--- a/providers/bundles/org.eclipse.ecf.provider.zookeeper/buckminster.cspex
+++ b/providers/bundles/org.eclipse.ecf.provider.zookeeper/buckminster.cspex
@@ -6,6 +6,7 @@
 	<dependencies>
 		<dependency name="org.apache.log4j" componentType="osgi.bundle"/>
 		<dependency name="org.apache.hadoop.zookeeper" componentType="osgi.bundle"/>
+		<dependency name="org.slf4j" componentType="osgi.bundle"/>
 	</dependencies>
 	<generators>
 		<!-- Place your Generators here -->
Comment 12 Markus Kuppe CLA 2013-11-06 09:51:49 EST
Added SLF4J with commit 910ed85c0d5513b0f3a91636b83dfbd7133f115e

[1] http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=910ed85c0d5513b0f3a91636b83dfbd7133f115e
Comment 13 Markus Kuppe CLA 2013-11-06 09:51:52 EST
Created attachment 237234 [details]
mylyn/context/zip
Comment 14 Markus Kuppe CLA 2013-11-06 10:18:23 EST
Zookeeper 3.4.5 appears to have introduced a deadlock in CI tests [1] and locally. Upon further analysis, it looks like WatchManager busy waits on org.eclipse.ecf.provider.zookeeper.node.internal.WatchManager.Lock.isOpen().

[1] https://build.ecf-project.org/jenkins/job/C-HEAD-discovery.zookeeper.feature/230/
Comment 15 Markus Kuppe CLA 2013-11-06 11:02:00 EST
The cause of the problem seems to be that org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer.startStandAlone(Configuration) gets executed multiple times throughout test execution and with Zookeeper 3.3 the underlying factory makes sure there is no port conflict. With 3.4 there is no factory and thus it runs into a port conflict.
Comment 16 Markus Kuppe CLA 2013-11-06 11:12:10 EST
FWIW link to old factory implementation: https://github.com/apache/zookeeper/blob/branch-3.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java


---

Seeing multiple of:

!ENTRY org.eclipse.ecf.provider.zookeeper 4 0 2013-11-06 17:11:22.942
!MESSAGE Zookeeper server cannot be started! Possibly another instance is already running on the same port. 
!STACK 0
java.net.BindException: Address already in use
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:174)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
	at org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer.startStandAlone(ZooDiscoveryContainer.java:199)
	at org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer$3.run(ZooDiscoveryContainer.java:168)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:679)

!ENTRY org.eclipse.ecf.provider.zookeeper 4 0 2013-11-06 17:11:22.943
!MESSAGE Zookeeper server cannot be started! Possibly another instance is already running on the same port. 
!STACK 0
java.net.BindException: Address already in use
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:174)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:139)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
	at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
	at org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer.startStandAlone(ZooDiscoveryContainer.java:199)
	at org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer$3.run(ZooDiscoveryContainer.java:168)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:679)
Comment 17 Markus Kuppe CLA 2013-11-06 11:13:30 EST
Wim, I suggest we roll back to 3.3.x. At least I know too little about the ZooDiscovery implementation to figure out what's going wrong in its dynamics.
Comment 18 Markus Kuppe CLA 2013-11-06 12:23:09 EST
(In reply to Markus Kuppe from comment #17)
> Wim, I suggest we roll back to 3.3.x. At least I know too little about the
> ZooDiscovery implementation to figure out what's going wrong in its dynamics.

After some more analysis I found that the NIOServerCnxnFactory instance never gets shutdown. This causes subsequent BindExceptions to be thrown. The threading in org.eclipse.ecf.provider.zookeeper.core.ZooDiscoveryContainer is still a mastery to me.
Comment 19 Markus Kuppe CLA 2013-11-06 12:52:15 EST
https://build.ecf-project.org/jenkins/job/C-HEAD-discovery.zookeeper.feature/231 passes successfully
Comment 20 Markus Kuppe CLA 2013-11-08 05:34:37 EST
I have revert all my work on upgrading to 3.4.x for now [1]. Ahmed of Wim please just git revert b58dc8d9b9546ede81464f947f22a8757c53c3d if you intend to continue on this.

[1] http://git.eclipse.org/c/ecf/org.eclipse.ecf.git/commit/?id=b58dc8d9b9546ede81464f947f22a8757c53c3d3
Comment 21 Markus Kuppe CLA 2013-11-08 05:46:12 EST
Tests are back green again https://build.ecf-project.org/jenkins/job/C-HEAD-discovery.zookeeper.feature/238/
Comment 22 Wim Jongman CLA 2013-11-08 07:03:14 EST
(In reply to Markus Kuppe from comment #21)
> Tests are back green again
> https://build.ecf-project.org/jenkins/job/C-HEAD-discovery.zookeeper.feature/
> 238/

Should we create a separate branch to R&D this from the master branch?
Comment 23 Markus Kuppe CLA 2013-11-08 07:09:20 EST
(In reply to Wim Jongman from comment #22)
> Should we create a separate branch to R&D this from the master branch?

Sure, why not. We can then also clone the zookeeper build to build the branch.

Does this generally mean that you or Ahmed are going to work on 3.4.x?
Comment 24 Wim Jongman CLA 2013-11-08 08:00:53 EST
(In reply to Markus Kuppe from comment #23)
> (In reply to Wim Jongman from comment #22)
> > Should we create a separate branch to R&D this from the master branch?
> 
> Sure, why not. We can then also clone the zookeeper build to build the
> branch.
> 
> Does this generally mean that you or Ahmed are going to work on 3.4.x?

Yes we will do this. Maybe not directly right now but certainly at some point in the not so far future.
Comment 25 Wim Jongman CLA 2014-05-11 12:03:41 EDT
New review.


https://git.eclipse.org/r/#/c/26339
Comment 26 Scott Lewis CLA 2014-09-30 13:14:12 EDT
Update:  The current/stable release of Zookeeper project at http://zookeeper.apache.org is now 3.4.6, with 3.5.0 currently in beta.   The most recent Orbit version is now 3.4.5, and I've asked on the Orbit mailing list what plans exist to move to 3.4.6 or even 3.5.0 release in Mars time frame.

I would like to do some planning around this upgrade for Mars release cycle.  Please advise about which version might be best to implement in Mars release cycle given existing fixed bugs in zookeeper, amount of expected work for upgrade, and expected resources in fall/winter 2014.
Comment 27 Wim Jongman CLA 2015-04-17 17:01:20 EDT
(In reply to Wim Jongman from comment #25)
> New review.
> 
> 
> https://git.eclipse.org/r/#/c/26339

We should be able to build and test reviews.