Bug 327126 - Deadlock attempting to cancel PDOMIndexerJob during a resource delta notification
Summary: Deadlock attempting to cancel PDOMIndexerJob during a resource delta notifica...
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 7.0   Edit
Hardware: PC All
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on: 249951
Blocks:
  Show dependency tree
 
Reported: 2010-10-06 11:40 EDT by James Blackburn CLA
Modified: 2020-09-04 15:24 EDT (History)
1 user (show)

See Also:


Attachments
backtrace (5.93 KB, text/plain)
2010-10-06 11:40 EDT, James Blackburn CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Blackburn CLA 2010-10-06 11:40:21 EDT
Created attachment 180343 [details]
backtrace

I'm tentatively filing this against the indexer - I think this is the easiest place to fix this...

I'm writing some ManagedBuild tests, and, as part of test tear down I'm deleting CDT projects. 
  - The PDOMManager, on post change notification is trying to cancel the active indexer job and waiting on the taskMutex.  
  - The PDOMIndexerJob is getProjectDescription which is reconciling the project description requiring a resource scheduling rule.  

The result is that the IndexerJob is waiting for resource locks, while the manager is waiting on the indexer job => deadlock.

Backtrace attached.
Comment 1 Markus Schorn CLA 2010-10-07 07:15:51 EDT
(In reply to comment #0)
> Created an attachment (id=180343) [details]
> backtrace
> I'm tentatively filing this against the indexer - I think this is the easiest
> place to fix this...
> I'm writing some ManagedBuild tests, and, as part of test tear down I'm
> deleting CDT projects. 
>   - The PDOMManager, on post change notification is trying to cancel the active
> indexer job and waiting on the taskMutex.  
>   - The PDOMIndexerJob is getProjectDescription which is reconciling the
> project description requiring a resource scheduling rule.  
> The result is that the IndexerJob is waiting for resource locks, while the
> manager is waiting on the indexer job => deadlock.
> Backtrace attached.

I don't see an easy fix. The indexer needs to be cancelled during the pre-delete notification. Otherwise, the indexer runs into exceptions working on resources that no longer exist. 
I find it a bit strange, that obtaining a read-only copy of the project 
description requires a workspace lock.

Is it possible to notify the listeners of CProjectDescriptionManager in a job?
Alternatively can CExternalSettingsManager do the work that requires the 
workspace lock in a job?
Comment 2 James Blackburn CLA 2010-10-07 07:38:40 EDT
Ok, thanks Markus.

(In reply to comment #1)
> Is it possible to notify the listeners of CProjectDescriptionManager in a job?

That might work in this case. The loaded event fires with a read-only description, so technically listeners could be notified asynchronously -- see below for why this will break tests.

> Alternatively can CExternalSettingsManager do the work that requires the 
> workspace lock in a job?

This is actually quite horrible.  The External Settings manager, old CDescriptor model, scanner discovery, and other cdt.core listeners are called-back by the project description event.
They update bits of the model during the call-back. 
In the case of the external settings provider: paths & symbols exported by other projects are reconciled in during project description load.  Effectively this manager is stateless, and pushes the exported paths & symbols language settings entries directly into the loaded project description.

From my POV this is really bad.  It doesn't scale; we end up with circular get/set call paths; .cprojects change when exported settings change in other projects; etc.  
However Making this particular call-back asynchronous will break the CConfigurationDescriptionExportSettings tests. 

I think the ExternalSettingsManager needs to be re-written so that this state is picked up dynamically and not stored as .cproject language settings entries.  All the data provided by this manager comes from an external source (and is recomputable).
Comment 3 Doug Schaefer CLA 2011-09-12 16:24:18 EDT
I have to agree with Markus. Why is a read causing a write? We seem to be firing a notifySettingsChange event on a load. The CExternalSettingsManager blindly saves the settings (from what I can see), which it probably doesn't need to do if the settings haven't actually changed. Need to dig more. This is causing the deadlock I see occasionally in the core.tests suite.
Comment 4 Doug Schaefer CLA 2011-09-12 16:28:46 EDT
Interesting enough, the fix to 312575 seems to have caused it.
Comment 5 James Blackburn CLA 2011-09-12 16:30:27 EDT
(In reply to comment #3)

I agree it's bad (see comment 2).  It's incredibly hard to fix without breaking other bits though.  Due to the way that the external settings provider is hooked in, you can end up with incorrect in memory project descriptions if you don't let it do its reconcile.
Comment 6 Doug Schaefer CLA 2011-09-12 16:37:36 EDT
(In reply to comment #2)
I think I get what James is saying here. This is really CProjectDescription rearing it's ugly head again. It does need a redesign.
Comment 7 James Blackburn CLA 2011-09-12 16:48:32 EDT
So the lock in question are:
  preDelete (WS lock held) -> PDOMIndexerJob
  PDOMIndexerJob ->  getProjectDescription (WS Lock)

In some sense this is another instance of Bug 249951.  It's unsafe to acquire any lock during a resource delta. (The lock in question being in the pdom indexer job.)

Of course this issue only arises because getProjectDescription calls into the WS.  Note the ext settings manager isn't the only place we do this -- it also happens if we convert the .cproject from an older version, and if we discover the .cproject out-of-sync.  There may be other places too.

This is likely hard-ish to fix properly, perhaps a workaround for now is to cancel the indexer before calling into project.delete() in the tests?
Comment 8 Doug Schaefer CLA 2011-09-12 18:10:05 EDT
I've added a 2 second delay before the delete to let the indexer get started properly before trying to cancel it. It seems to be a pretty small window while the indexer is starting up.

At any rate, I'd definitely like to open up the descriptor storage design again. I have some features from the Wind River build system I'd like to bring down and that would open up even more places to fetch and store build settings. We also need to simplify how the scanner discovery settings get in.