Bug 578386 - Deadlock between ApiBaseline.doDispose() and ApiBaselineManager.initializeStateCache()
Summary: Deadlock between ApiBaseline.doDispose() and ApiBaselineManager.initializeSta...
Status: RESOLVED FIXED
Alias: None
Product: PDE
Classification: Eclipse Project
Component: API Tools (show other bugs)
Version: 4.23   Edit
Hardware: PC Windows 10
: P3 normal (vote)
Target Milestone: 4.23 M2   Edit
Assignee: Andrey Loskutov CLA
QA Contact:
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks: 576736
  Show dependency tree
 
Reported: 2022-01-25 16:45 EST by Andrey Loskutov CLA
Modified: 2022-01-27 07:04 EST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Loskutov CLA 2022-01-25 16:45:31 EST
Just got this deadlock on switching t another branch.
Weird in this one is that in order to dispose a baseline we do baseline loading...

Found one Java-level deadlock:
=============================
"Worker-38: Performing API Analysis":
  waiting to lock monitor 0x00000254e5e14180 (object 0x0000000631c46860, a org.eclipse.pde.api.tools.internal.ApiBaselineManager),
  which is held by "Worker-41: Building"
"Worker-41: Building":
  waiting to lock monitor 0x0000025514709000 (object 0x0000000631c46900, a org.eclipse.pde.api.tools.internal.model.WorkspaceBaseline),
  which is held by "Worker-38: Performing API Analysis"

Java stack information for the threads listed above:
===================================================
"Worker-38: Performing API Analysis":
        at org.eclipse.pde.api.tools.internal.ApiBaselineManager.initializeStateCache(ApiBaselineManager.java:262)
        - waiting to lock <0x0000000631c46860> (a org.eclipse.pde.api.tools.internal.ApiBaselineManager)
        at org.eclipse.pde.api.tools.internal.ApiBaselineManager.loadBaselineInfos(ApiBaselineManager.java:232)
        at org.eclipse.pde.api.tools.internal.model.ApiBaseline.loadBaselineInfos(ApiBaseline.java:761)
        - locked <0x0000000631c46900> (a org.eclipse.pde.api.tools.internal.model.WorkspaceBaseline)
        at org.eclipse.pde.api.tools.internal.model.ApiBaseline.getAllApiComponents(ApiBaseline.java:730)
        at org.eclipse.pde.api.tools.internal.model.AbstractApiTypeRoot.getStructure(AbstractApiTypeRoot.java:69)
        at org.eclipse.pde.api.tools.internal.builder.ReferenceAnalyzer$Visitor.visit(ReferenceAnalyzer.java:87)
        at org.eclipse.pde.api.tools.internal.builder.TypeScope.accept(TypeScope.java:90)
        at org.eclipse.pde.api.tools.internal.builder.ReferenceAnalyzer.extractReferences(ReferenceAnalyzer.java:210)
        at org.eclipse.pde.api.tools.internal.builder.ReferenceAnalyzer.analyze(ReferenceAnalyzer.java:244)
        at org.eclipse.pde.api.tools.internal.builder.BaseApiAnalyzer.checkApiUsage(BaseApiAnalyzer.java:1286)
        at org.eclipse.pde.api.tools.internal.builder.BaseApiAnalyzer.analyzeComponent(BaseApiAnalyzer.java:289)
        at org.eclipse.pde.api.tools.internal.builder.IncrementalApiBuilder.build(IncrementalApiBuilder.java:305)
        at org.eclipse.pde.api.tools.internal.builder.IncrementalApiBuilder.build(IncrementalApiBuilder.java:256)
        at org.eclipse.pde.api.tools.internal.builder.ApiAnalysisBuilder.work(ApiAnalysisBuilder.java:475)
        at org.eclipse.pde.api.tools.internal.builder.ApiAnalysisBuilder$ApiAnalysisJob.run(ApiAnalysisBuilder.java:578)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
"Worker-41: Building":
        at org.eclipse.pde.api.tools.internal.model.ApiBaseline.loadBaselineInfos(ApiBaseline.java:759)
        - waiting to lock <0x0000000631c46900> (a org.eclipse.pde.api.tools.internal.model.WorkspaceBaseline)
        at org.eclipse.pde.api.tools.internal.model.ApiBaseline.getApiComponents(ApiBaseline.java:550)
        at org.eclipse.pde.api.tools.internal.model.ApiBaseline.doDispose(ApiBaseline.java:839)
        at org.eclipse.pde.api.tools.internal.model.WorkspaceBaseline.dispose(WorkspaceBaseline.java:56)
        at org.eclipse.pde.api.tools.internal.ApiBaselineManager.disposeWorkspaceBaseline(ApiBaselineManager.java:634)
        - locked <0x0000000631c46860> (a org.eclipse.pde.api.tools.internal.ApiBaselineManager)
        at org.eclipse.pde.api.tools.internal.WorkspaceDeltaProcessor.cleanAndDisposeWorkspaceBaseline(WorkspaceDeltaProcessor.java:288)
        at org.eclipse.pde.api.tools.internal.WorkspaceDeltaProcessor.resourceChanged(WorkspaceDeltaProcessor.java:238)
        at org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:305)
        at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:45)
        at org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:295)
        at org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:158)
        at org.eclipse.core.internal.resources.Workspace.broadcastBuildEvent(Workspace.java:367)
        at org.eclipse.core.internal.events.AutoBuildJob.doBuild(AutoBuildJob.java:156)
        at org.eclipse.core.internal.events.AutoBuildJob.run(AutoBuildJob.java:251)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)

Found 1 deadlock.
Comment 1 Eclipse Genie CLA 2022-01-25 17:39:33 EST
New Gerrit change created: https://git.eclipse.org/r/c/pde/eclipse.pde.ui/+/190028
Comment 2 Hannes Wellmann CLA 2022-01-26 03:01:23 EST
Could this also cause deadlocks during test execution in the pde.ui build?

I observed it twice that the build timed-out at the following location:
"""
Running org.eclipse.pde.api.tools.tests.ApiToolsPluginTestSuite
reflectNestedClassUseDollar=true due to isJRE9Plus
Build timed out (after 39 minutes). Marking the build as aborted.
Terminating xvnc.
"""

A retrigger let the build pass.
Comment 3 Andrey Loskutov CLA 2022-01-26 03:07:27 EST
(In reply to Hannes Wellmann from comment #2)
> Could this also cause deadlocks during test execution in the pde.ui build?
> 
> I observed it twice that the build timed-out at the following location:
> """
> Running org.eclipse.pde.api.tools.tests.ApiToolsPluginTestSuite
> reflectNestedClassUseDollar=true due to isJRE9Plus
> Build timed out (after 39 minutes). Marking the build as aborted.
> Terminating xvnc.
> """
> 
> A retrigger let the build pass.

Shouldn't, because during tests we don't run API analysis as job parallel to the build, but without a stack hard to say. 

If there *are* parallel PDE tasks, for sure, the code is a mine field and isn't seriously meant to be used in MT environment (despite a lot of synchronized methods that actually cause most of the issues in MT environment).
Comment 4 Hannes Wellmann CLA 2022-01-26 03:55:59 EST
(In reply to Andrey Loskutov from comment #3)
> 
> Shouldn't, because during tests we don't run API analysis as job parallel to
> the build, but without a stack hard to say. 
> 
> If there *are* parallel PDE tasks, for sure, the code is a mine field and
> isn't seriously meant to be used in MT environment (despite a lot of
> synchronized methods that actually cause most of the issues in MT
> environment).

Understand. Unfortunately I cannot contribute much more information. I just observed the deadlock in the tests recently two times. Do you know if this happened before?

Just in case you can get more information from the console, this was one build:
https://ci.eclipse.org/pde/job/eclipse.pde.ui-Gerrit/4078/
Comment 5 Andrey Loskutov CLA 2022-01-26 05:59:27 EST
(In reply to Hannes Wellmann from comment #4)
> Understand. Unfortunately I cannot contribute much more information. I just
> observed the deadlock in the tests recently two times. Do you know if this
> happened before?

No, not really.

> Just in case you can get more information from the console, this was one
> build:
> https://ci.eclipse.org/pde/job/eclipse.pde.ui-Gerrit/4078/

Unfortunately I don't see any thread dumps there (like in JDT tests), so I've added something similar via bug 578391. Feel free to add freeze report to more tests, especially if you know which were affected.