Community
Participate
Working Groups
Created attachment 225210 [details] stacktrace of a slave We have upgraded a great deal of our hudson jobs from a 1.393 to 3 RC4. It went more or less smoothly but it seems that the archiving takes much longer now. This happens on al our slave nodes (regardless of the OS) so i assume it has something todo with the protocol used to communicate with the master. I attached a stacktrace of the slave. I looked at the code and it seems that some 'window' is used to control the speed by wich the client can send data to the master (similar to how tcp works). Since it seems to keep blocking on this I assume the problem lies here. I could not find a way to make the window bigger, i tried playing arround with the archive settings but nothing really helped.
Just for the record, the archiving still works. It just takes very long.
Hi Pieter, Thanks for the bugs report and the useful stack trace. Several bugs were fixed in the master-slave remoting between 1.395 & 1.398. Especially synchronization problem between master and slave. The channel was closed incorrectly by slave before master gets all the bytes. Because of that though huge archive appears to be done, the data was incorrect. Apart from that nothing was changed between 1.393 and 3.0.0. Do you archive the entire work space (not specifying anything in the post build archive artifact) or archive certain files, by specifying them. Or do you use any plugin to archive your artifacts differently?
We archives multiple depending on the build. We use a filter (all artifacts end up in a single directory). I tried to toggle the GZIP option but it has no effect (this option was not there in the old hudson). For our integration this is 500mb. It consist out of a p2 update site (>450 jars) and some other artifacts mostly zip files. For one of our test builds it is a jacoco report wich is a directory structure that has html pages for each class of which code coverage is known. This are thus a lot of small files. Our other test builds a couple of junit xml files. We don't use any plugin that should affect the archiving step. This is a listing of our plugin directory. The two com.id plugins are used to add cvs support on our machines that don't have a proper cvs client. They are not used to archive. buildoperator@hudson:/var/hudson/home/plugins$ ls accurev accurev.hpi audit-trail audit-trail.hpi backup backup.hpi birt-charts birt-charts.hpi build-timeout build-timeout.hpi chucknorris chucknorris.bak chucknorris.hpi com.id.hudson.plugins.scm.javacvs com.id.hudson.plugins.scm.javacvs.hpi com.id.hudson.plugins.triggers.urltrigger com.id.hudson.plugins.triggers.urltrigger.hpi compact-columns compact-columns.hpi copy-to-slave copy-to-slave.hpi cron_column cron_column.hpi cvs cvs.hpi dashboard-view dashboard-view.hpi disk-usage disk-usage.hpi downstream-buildview downstream-buildview.hpi email-ext email-ext.hpi git git.hpi jfreechart-plugin jfreechart-plugin.bak jfreechart-plugin.hpi jira jira.hpi jna-native-support-plugin jna-native-support-plugin.hpi jna-native-support-plugin.hpi.disabled junit-attachments junit-attachments.hpi maven-plugin maven-plugin.hpi maven-plugin.hpi.disabled maven3-plugin maven3-plugin.hpi plot plot.hpi project-health-report project-health-report.hpi radiatorviewplugin radiatorviewplugin.hpi rest-plugin rest-plugin.hpi xfpanel xfpanel.hpi xpath-provider xpath-provider.hpi I will try to make a stackstrace of the client and hudson server at the same time. I'll see if i can also take a heap dump to find out values of the window used to throttle the speed.
I took a stacktraces of both client and hudson server, nothing new came up from this. A heap dump on the client showed that the window is indeed zero. And thus the client is waiting for the server to ack more data before it can send new bytes. It seems that the windowsize is fixed. The difference between the acked and send was the window size so all looked correct on the client. My current conclusion is that my server is not acking fast enough (to my liking). I monitored the network usage, the client seems to be slowly sending data (order of a couple of hundred kb/s). Since both the server and client are not using much resources (no cpu, no extra memory usage). The interconnect is gigabit or even localhost (this is a VM cluster). I will investigate further if i have some spare time. I have not looked at the ack system, i'll start there next time.
I will try sollution proposed here: https://issues.jenkins-ci.org/browse/JENKINS-3922
(In reply to comment #5) > I will try sollution proposed here: > https://issues.jenkins-ci.org/browse/JENKINS-3922 Did not help, more burst but still slow
Most likely this fixed this bug on jenkins: https://issues.jenkins-ci.org/browse/JENKINS-7813