397440 – Transfer in Archiving Step takes very long

Bug 397440 - Transfer in Archiving Step takes very long

Summary: Transfer in Archiving Step takes very long

Status:	ASSIGNED

Alias:	None

Product:	Hudson
Classification:	Technology
Component:	Core (show other bugs)
Version:	3.0.0
Hardware:	PC All

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	Winston Prakash
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-01-04 08:03 EST by Pieter-Jan Pintens
Modified:	2015-05-27 11:06 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
stacktrace of a slave (33.84 KB, text/plain) 2013-01-04 08:03 EST, Pieter-Jan Pintens	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Pieter-Jan Pintens

2013-01-04 08:03:49 EST

Created attachment 225210 [details]
stacktrace of a slave

We have upgraded a great deal of our hudson jobs from a 1.393 to 3 RC4.
It went more or less smoothly but it seems that the archiving takes much longer now. This happens on al our slave nodes (regardless of the OS) so i assume it has something todo with the protocol used to communicate with the master.

I attached a stacktrace of the slave.
I looked at the code and it seems that some 'window' is used to control the speed by wich the client can send data to the master (similar to how tcp works). Since it seems to keep blocking on this I assume the problem lies here.

I could not find a way to make the window bigger, i tried playing arround with the archive settings but nothing really helped.

Comment 1 Pieter-Jan Pintens

2013-01-04 08:05:14 EST

Just for the record, the archiving still works. It just takes very long.

Comment 2 Winston Prakash

2013-01-04 12:02:46 EST

Hi Pieter,

Thanks for the bugs report and the useful stack trace.

Several bugs were fixed in the master-slave remoting between 1.395 & 1.398. Especially synchronization problem between master and slave. The channel was closed incorrectly by slave before master gets all the bytes. Because of that though huge archive appears to be done, the data was incorrect. Apart from that nothing was changed between 1.393 and 3.0.0.

Do you archive the entire work space (not specifying anything in the post build archive artifact) or archive certain files, by specifying them. Or do you use any plugin to archive your artifacts differently?

Comment 3 Pieter-Jan Pintens

2013-01-07 03:09:01 EST

We archives multiple depending on the build. We use a filter (all artifacts end up in a single directory). I tried to toggle the GZIP option but it has no effect (this option was not there in the old hudson).

For our integration this is 500mb.
It consist out of a p2 update site (>450 jars) and some other artifacts mostly zip files.

For one of our test builds it is a jacoco report wich is a directory structure that has html pages for each class of which code coverage is known. This are thus a lot of small files.

Our other test builds a couple of junit xml files.

We don't use any plugin that should affect the archiving step.
This is a listing of our plugin directory. The two com.id plugins are used to add cvs support on our machines that don't have a proper cvs client. They are not used to archive.

buildoperator@hudson:/var/hudson/home/plugins$ ls
accurev
accurev.hpi
audit-trail
audit-trail.hpi
backup
backup.hpi
birt-charts
birt-charts.hpi
build-timeout
build-timeout.hpi
chucknorris
chucknorris.bak
chucknorris.hpi
com.id.hudson.plugins.scm.javacvs
com.id.hudson.plugins.scm.javacvs.hpi
com.id.hudson.plugins.triggers.urltrigger
com.id.hudson.plugins.triggers.urltrigger.hpi
compact-columns
compact-columns.hpi
copy-to-slave
copy-to-slave.hpi
cron_column
cron_column.hpi
cvs
cvs.hpi
dashboard-view
dashboard-view.hpi
disk-usage
disk-usage.hpi
downstream-buildview
downstream-buildview.hpi
email-ext
email-ext.hpi
git
git.hpi
jfreechart-plugin
jfreechart-plugin.bak
jfreechart-plugin.hpi
jira
jira.hpi
jna-native-support-plugin
jna-native-support-plugin.hpi
jna-native-support-plugin.hpi.disabled
junit-attachments
junit-attachments.hpi
maven-plugin
maven-plugin.hpi
maven-plugin.hpi.disabled
maven3-plugin
maven3-plugin.hpi
plot
plot.hpi
project-health-report
project-health-report.hpi
radiatorviewplugin
radiatorviewplugin.hpi
rest-plugin
rest-plugin.hpi
xfpanel
xfpanel.hpi
xpath-provider
xpath-provider.hpi

I will try to make a stackstrace of the client and hudson server at the same time. I'll see if i can also take a heap dump to find out values of the window used to throttle the speed.

Comment 4 Pieter-Jan Pintens

2013-01-07 10:08:18 EST

I took a stacktraces of both client and hudson server, nothing new came up from this.

A heap dump on the client showed that the window is indeed zero. And thus the client is waiting for the server to ack more data before it can send new bytes.
It seems that the windowsize is fixed. The difference between the acked and send was the window size so all looked correct on the client. My current conclusion is that my server is not acking fast enough (to my liking).

I monitored the network usage, the client seems to be slowly sending data (order of a couple of hundred kb/s). Since both the server and client are not using much resources (no cpu, no extra memory usage). The interconnect is gigabit or even localhost (this is a VM cluster).

I will investigate further if i have some spare time. I have not looked at the ack system, i'll start there next time.

Comment 5 Pieter-Jan Pintens

2013-01-24 08:08:11 EST

I will try sollution proposed here:
https://issues.jenkins-ci.org/browse/JENKINS-3922

Comment 6 Pieter-Jan Pintens

2013-01-24 11:47:30 EST

(In reply to comment #5)
> I will try sollution proposed here:
> https://issues.jenkins-ci.org/browse/JENKINS-3922

Did not help, more burst but still slow

Comment 7 Pieter-Jan Pintens

2013-04-17 06:18:57 EDT

Most likely this fixed this bug on jenkins: https://issues.jenkins-ci.org/browse/JENKINS-7813