Community
Participate
Working Groups
Created attachment 228179 [details] Screenshots of Dead Executors and Details Hudson job configuration update via /hudson/job/<job>/config.xml is not atomic and can kill (all) Executors! In our Hudson based build infrastructure we have currently 4000 jobs which are created and updated by a job generator. This generator sends POST requests to /hudson/job/<job>/config.xml to update the job configurations. During load tests after some hours almost all executors (master and slaves) were dead because an IllegalArgumentException in BaseProjectProperty killed them. The reason for this Exception is that the update procedure via AbstractItem.doConfigDotXml() is not atomic and between XmlFile.unmarshal (AbstractItem:488) and onLoad (AbstractItem:489) the job has been grabbed from the queue to run it. Although this is not very likely it happened regularly in our scenario. We were able to reproduce this issue. (Git Tag: hudson-parent-3.0.0) * Breakpoint at hudson.model.Queue.schedule(Queue.java:426) * Breakpoint at org.eclipse.hudson.model.project.property.BaseProjectProperty.setKey(BaseProjectProperty.java:52) * Trigger Test job in Hudson UI * Keep Queue.schedule breakpoint suspended * Send POST request with config.xml content to http://localhost:8080/hudson/job/Test/config.xml to update the job config * Keep BaseProjectProperty.setKey breakpoint suspended * Release Queue.schedule breakpoint and wait some seconds * End debug session... You will notice at least one executor died (see screenshot) and the details show this Exception: java.lang.IllegalArgumentException: Project property should have not null propertyKey at org.eclipse.hudson.model.project.property.BaseProjectProperty.getCascadingValue(BaseProjectProperty.java:93) at org.eclipse.hudson.model.project.property.BaseProjectProperty.getValue(BaseProjectProperty.java:120) at hudson.model.BaseBuildableProject.getBuildersList(BaseBuildableProject.java:153) at hudson.model.Project.getResourceActivities(Project.java:54) at hudson.model.AbstractProject.getResourceList(AbstractProject.java:1485) at hudson.model.Queue.isBuildBlocked(Queue.java:921) at hudson.model.Queue.maintain(Queue.java:969) at hudson.model.Queue.pop(Queue.java:806) at hudson.model.Executor.grabJob(Executor.java:183) at hudson.model.Executor.run(Executor.java:113) Here you can see the stack from AbstractItem.doConfigDotXml to BaseProjectProperty.setKey: Daemon Thread [Handling POST /hudson/job/Test/config.xml : http-8080-7] (Suspended (entry into method setKey in BaseProjectProperty)) BooleanProjectProperty(BaseProjectProperty<T>).setKey(String) line: 53 FreeStyleProject(Job<JobT,RunT>).buildProjectProperties() line: 400 FreeStyleProject(AbstractProject<P,R>).buildProjectProperties() line: 351 FreeStyleProject(BaseBuildableProject<P,B>).buildProjectProperties() line: 100 FreeStyleProject.buildProjectProperties() line: 87 FreeStyleProject(Job<JobT,RunT>).onLoad(ItemGroup<Item>, String) line: 356 FreeStyleProject(AbstractProject<P,R>).onLoad(ItemGroup<Item>, String) line: 323 FreeStyleProject(BaseBuildableProject<P,B>).onLoad(ItemGroup<Item>, String) line: 91 FreeStyleProject(AbstractItem).doConfigDotXml(StaplerRequest, StaplerResponse) line: 489 [...]
PS: Do you have any recommendations how to work around this issue?