Bug 535090 - CDT JIPP - All available display numbers are allocated or blacklisted
Summary: CDT JIPP - All available display numbers are allocated or blacklisted
Status: RESOLVED FIXED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: CI-Jenkins (show other bugs)
Version: unspecified   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: CI Admin Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-24 15:33 EDT by Jonah Graham CLA
Modified: 2018-05-28 11:27 EDT (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jonah Graham CLA 2018-05-24 15:33:29 EDT
(This looks similar to Bug 499487, and has happened just after HIPP5 has had some attention in Bug 535057)

We have a build that failed https://ci.eclipse.org/cdt/job/cdt-verify-test-cdt-other/1295/console with:

18:29:04 FATAL: All available display numbers are allocated or blacklisted.
18:29:04 allocated: [180, 181, 182, 183, 184, 185, 186, 187, 188, 189]
18:29:04 blacklisted: []
18:29:04 java.lang.RuntimeException: All available display numbers are allocated or blacklisted.
18:29:04 allocated: [180, 181, 182, 183, 184, 185, 186, 187, 188, 189]
18:29:04 blacklisted: []
18:29:04 	at hudson.plugins.xvnc.DisplayAllocator.doAllocate(DisplayAllocator.java:59)
18:29:04 	at hudson.plugins.xvnc.DisplayAllocator.allocate(DisplayAllocator.java:49)
18:29:04 	at hudson.plugins.xvnc.Xvnc.doSetUp(Xvnc.java:106)
18:29:04 	at hudson.plugins.xvnc.Xvnc.setUp(Xvnc.java:96)
18:29:04 	at jenkins.tasks.SimpleBuildWrapper.setUp(SimpleBuildWrapper.java:146)
18:29:04 	at hudson.model.Build$BuildExecution.doRun(Build.java:157)
18:29:04 	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
18:29:04 	at hudson.model.Run.execute(Run.java:1724)
18:29:04 	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
18:29:04 	at hudson.model.ResourceController.execute(ResourceController.java:97)
18:29:04 	at hudson.model.Executor.run(Executor.java:429)
Comment 1 Jonah Graham CLA 2018-05-24 16:43:53 EDT
A rebuild of the project was successful, so we are not actually blocked on this.
Comment 2 Jonah Graham CLA 2018-05-25 02:16:20 EDT
(In reply to Jonah Graham from comment #1)
> A rebuild of the project was successful, so we are not actually blocked on
> this.

I take that back, we have more builds failing than passing :-(
Comment 3 Mikaël Barbero CLA 2018-05-25 07:01:07 EDT
I've checked if another JIPP was using these display numbers, but none does. I've also checked if any residual process were running but all is clean. 

I'll update the JIPP to latest LTS along with all plugins just in case...
Comment 4 Mikaël Barbero CLA 2018-05-25 07:16:25 EDT
I can't find a way to retrigger cdt-verify-test-cdt-other job with proper parameter. Could you please give it a try? Thanks.
Comment 5 Jonah Graham CLA 2018-05-25 07:41:53 EDT
That is a gerrit job. But I have limited usability as I am out of the office on my phone. I don't see retrigger build, and in Jenkins I can't see any of the normal UI for gerrit. Is it possible that something in the version upgraded lost gerrit config?
Comment 6 Jonah Graham CLA 2018-05-25 07:43:01 EDT
There is a message under manage Jenkins that didn't used to be there:

You have data stored in an older format and/or unreadable data.

I don't want to try examining it on my phone though!
Comment 7 Mikaël Barbero CLA 2018-05-25 08:07:00 EDT
I'm having a look
Comment 8 Mikaël Barbero CLA 2018-05-25 08:13:58 EDT
For obscure reasons, the update have uninstalled the gerrit trigger plugin. I've re-installed it and the configuration have been restore.

I've re-triggered one of the failing build. VNC seems to be ok.
Comment 9 Jonah Graham CLA 2018-05-26 11:33:01 EDT
I had another handful of builds fail (e.g. https://ci.eclipse.org/cdt/view/Gerrit/job/cdt-verify-test-cdt-other/1311/console) but other builds are running OK.

I am going to try a few things to see if I can figure out what is going on. 

1- https://stackoverflow.com/questions/31481107/jenkins-xvnc-plugin-some-display-numbers-stay-allocated-when-a-build-is-stopped

2- Wipe out workspaces (include dup workspaces due to gerrit jobs running in parallel?)
Comment 10 Jonah Graham CLA 2018-05-27 06:01:58 EDT
(In reply to Jonah Graham from comment #9)
> I am going to try a few things to see if I can figure out what is going on. 

I didn't get a chance to try anything before HIPP5 went down again.
Comment 11 Jonah Graham CLA 2018-05-28 11:04:01 EDT
At the moment this has become a blocker. No build are getting through since the last restart.

I am about to try what I put in Comment 9.
Comment 12 Jonah Graham CLA 2018-05-28 11:13:16 EDT
I have run the recommended groovy script to clear the used display numbers. As there are no builds running, there should not have been any in use. I suspect that the HIPP crashing has left the config file indicating display numbers in use that were not in fact in use.

The script I used came from https://github.com/sdiepend/jenkins-scripts/blob/master/cleanXvncDisplayNumbers.groovy and is:

import jenkins.*
import jenkins.model.Jenkins

Jenkins jenkins = Jenkins.getActiveInstance();
xvncDescriptor = jenkins.getDescriptorByType(hudson.plugins.xvnc.Xvnc.DescriptorImpl.class)

xvncDescriptor.allocators.each {
  allocator = it.value
  // collect is used to make sure numAlloc is an entire new list and not just a reference to the same list object, otherwise you'll get a
  // concurrentmodification exception
  numAlloc = allocator.allocatedNumbers.collect()

  numAlloc.each {
    allocator.allocatedNumbers.remove(it)
  }
}



As the builds are now running again, I think this problem is resolved now. I will reopen if I have further issues.
Comment 13 Doug Schaefer CLA 2018-05-28 11:26:27 EDT
Is this really resolved? The machine was down all weekend. I haven't heard what brought it back. Are we sure it won't go down again?
Comment 14 Doug Schaefer CLA 2018-05-28 11:27:21 EDT
(In reply to Doug Schaefer from comment #13)
> Is this really resolved? The machine was down all weekend. I haven't heard
> what brought it back. Are we sure it won't go down again?

Never mind. I see now there are two bugs.