Bug 413768 - PDOMWriter creates too many IncludeInformations
Summary: PDOMWriter creates too many IncludeInformations
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-indexer (show other bugs)
Version: Next   Edit
Hardware: PC Linux
: P3 normal with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-25 16:18 EDT by Andrew Eidsness CLA
Modified: 2020-09-04 15:21 EDT (History)
4 users (show)

See Also:


Attachments
test case to reproduce the problem (2.67 KB, application/octet-stream)
2013-07-25 18:07 EDT, Andrew Eidsness CLA
no flags Details
project as described in comment 2 (2.78 KB, application/octet-stream)
2013-07-26 06:58 EDT, Andrew Eidsness CLA
no flags Details
updated sample project (2.60 KB, application/bzip2)
2013-07-26 07:48 EDT, Andrew Eidsness CLA
no flags Details
test code (467 bytes, application/bzip2)
2013-07-29 07:17 EDT, Andrew Eidsness CLA
no flags Details
Project that produces a problem due to #pragma once sematics (5.46 KB, application/octet-stream)
2013-09-19 10:04 EDT, Joseph Henry CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Eidsness CLA 2013-07-25 16:18:01 EDT
I'll attach a junit to reproduce this once I get the bug number.

With these four files:

src.c:
#include "h1.h"
#include "h2.h"
int m1 = MACRO; // ProblemBinding for MACRO

-----

h1.h:
#ifndef H1
#define H1

#include "h1a.h" // line 5 (see below)
#undef D1
#include "h1a.h" // line 7 (see below)

#endif

-----

h1a.h:
extern int i;
#ifndef H1a
#define H1a
#define D1
#endif

-----

h2.h:
#define MACRO 1
#ifdef D1
#endif

-----

I get this trace while putting h1.h into the index (instrumentation was added to the PDOMFile and PDOMInclude constructors:

PDOMFile(5498) /JI449520-C2/src/h1.h {}
PDOMInclude(5802) stmt-line 5 ‘#include “h1a.h”’ -> PDOMFile(4506)
PDOMInclude(5834) stmt-line 7 ‘#include “h1a.h”’ -> PDOMFile(4506)
PDOMInclude(5866) stmt-line 7 ‘#include “h1a.h”’ -> PDOMFile(5330)

The file has two #include directives, but we are putting three elements into the PDOMFile's list of includes.

The second instance is generated from the statement on line 7, but points to the same PDOMFile as the instance generated from the statement on line 5.

I think this is related to the creation of IncludeInformations in PDOMWriter:

PDOMWriter.java:
600: for (ISignificantMacros sig : stmt.getLoadedVersions()) {
601:     if (!sig.equals(mainSig)) {
602:         includeInfos.add(new IncludeInformation(stmt, targetLoc, sig, false));
603:     }
604: }
605: final boolean isContext = stmt.isActive() && stmt.isResolved() &&
606:         (data.fContextIncludes.contains(stmt) || isContextFor(oldFile, stmt));
607: includeInfos.add(new IncludeInformation(stmt, targetLoc, mainSig, isContext));

The condition on line 601 seems backward.  Also, I think there should be code to ensure only one IncludeInformation is created for each #include directive.  I.e., if 602 executes, then we should stop the loop on line 600 and we should not execute line 607.
Comment 1 Andrew Eidsness CLA 2013-07-25 18:07:29 EDT
Created attachment 233808 [details]
test case to reproduce the problem
Comment 2 Andrew Eidsness CLA 2013-07-26 06:01:18 EDT
Email from Markus Schorn:

The 3 includes are necessary to deal with pragma-once files that include non-pragma-once files.

I try to explain:

(1)    Consider a non-pragma once file like stddef.h that is parsed in 2 versions. Let’s assume that version S1 contains what we want and version S2 is used when stddef.h gets included for the second time in the same compilation unit.

(2)    Consider a pragma once header file header.h that needs some declarations out of stddef.h plus an arbitrary source file with the following include structure:

    a.c
     |_ stddef.h  (version S1)
     |_ header.h
         |_ stddef.h  (version S2)
         |_ use declaration found in S1

(3)    Consider a second source file that makes use of header.h and needs some declaration out of stddef.h. However rather than including stddef.h it relies on the fact that header.h already includes stddef.h:

    b.c
     |_ header.h 
     |    |_ stddef.h  (version S1)
     |_ use declaration found in S1

(4)    Now, when the indexer looks at b.c, it finds that header.h is pragma once and instead of parsing the file it uses whatever is stored in the index. Therefore, when the index would only store the inclusion header.h->stddef(S2), it would miss out on the declarations within stddef(S1).

To overcome this problem the index needs to store <all> versions of a non-pragma once include that have been loaded within the translation unit before.
Comment 3 Andrew Eidsness CLA 2013-07-26 06:04:06 EDT
Thanks, I better understand the motivation for this behaviour now.

Do you have ideas on how to fix this test case?  I don't mind doing the work, but I'm not sure which part to look at.
Comment 4 Andrew Eidsness CLA 2013-07-26 06:30:33 EDT
Now I understand why my h1a.h file needed some code outside the include guard.  When an include guard is detected in a file (CPreprocessor.detectIncludeGuard) then the file is marked with pragma once semantics, and the description that Markus provided kicks in.

With code outside the include guard, the file does not get marked as pragma once.
Comment 5 Andrew Eidsness CLA 2013-07-26 06:58:01 EDT
Created attachment 233821 [details]
project as described in comment 2

I've attached a sample project with code as described in comment 2.

I'm not following point (4).  I think that the index should store both versions of the file.  I just think that the second #include directive shouldn't point to the content of the first #include.  I think that would still cover the cases described in (2) and (3).

Maybe some other parts are needed in the sample project?  E.g., maybe the pragma once needs to be in a conditional part of header.h so that it is seen in S1 but not S2?  I haven't used pragma once, so maybe this doesn't make sense.

Secondly, is the identity test on line 601 valid?  I wonder if it should be checking something like the char[] returned by ISignificantMacros.encode?
Comment 6 Andrew Eidsness CLA 2013-07-26 07:48:49 EDT
Created attachment 233822 [details]
updated sample project

I've made some changes to the sample project.

1) Since pragma once only applies to the current compilation unit, I don't think a.c is relevant.  I've removed it.

2) I created two versions of the src, header, and stddef files.  In one version the header is pragma once, in the other it is not (and has no include guard).

I manipulate the src file so that header is included twice, the first time there will be no content from stddef, the second time there could be.  Then I reference things from stddef.

In the pragma once case we expect to see errors since the declarations that we need will never be seen.  In the always case we don't expect to see any errors.

With this file I get the same behaviour regardless of the condition on PDOMWriter.java:601.
Comment 7 Andrew Eidsness CLA 2013-07-26 07:54:35 EDT
Sorry to keep flooding this bug with separate comments.  I'm still having trouble figuring out the intended behaviour.

Going back to my initial sample (except I've changed some of the terminology to match your sample), the code looks like:

    #include "h1a.h" // S1
    #undef D1
    #include "h1a.h" // S2

Which is persisted as:

    #include S1
    #undef D1
    #include S1
    #include S2

If I understand things properly, then the intent is to be like:

    #include "h1a.h" // S1
    #undef D1
    #include ( S1 isPragmaOnce ? S1 : S2 )

I.e., we still only want to include one file for the second directive, but the file that is chosen depends on whether that target file is pragma once.
Comment 8 Markus Schorn CLA 2013-07-29 00:50:56 EDT
(In reply to comment #7)
The intent really is to do include both S1 and S2:

    #include "h1a.h" // S1
    #undef D1
    #include "h1a.h" // S1
    #include "h1a.h" // S2

This is done to deal with header files like "stddef.h", which can incrementally add new declarations (depending on #NEED_X , #NEED_Y, ....): In case a pragma once header H1 includes "stddef.h" with other inclusions of "stddef.h" in the compilation unit before, we don't know which version we are actually interested in.
Note, that this mechanism is important, when the other inclusions happen <outside> of H1 as in the example I gave. In the particular example above it would not be important, because version S1 is already known to be included when you get to line 3. So the mechanism could be refined not to repeat the includes of versions known to be included (directly or indirectly) by the current header.

Above that, the mechanism is helpful for headers like 'stddef.h' that can incrementally add new declarations. However, it is disturbing for headers like yours where one version undoes the effect of another. The trouble is that we cannot automatically distinguish betweeen these kinds of headers.
Comment 9 Andrew Eidsness CLA 2013-07-29 07:17:33 EDT
Created attachment 233883 [details]
test code

Disturbing, yes  :-)  My sample was reduced from our implementations of stdlib.h, sys/platform.h, __SIZE_T (the macro that is undef'ed) and EXIT_SUCCESS (the macro that is conditionally defined).

I still don't understand why the behaviour is valid.  I've attached a small test project that looks at what happens for pragma once headers.  I've used gcc 4.4.3.  The behaviour of this compiler doesn't seem to match what the CDT is doing.

There are two features that are tested in this code.  I've used both a pragma once header and a unconditional header for comparison.

The first feature that is tested is behaviour when a macro changes value between the duplicate include statements.  In the always case this means that the second part of the header is processed and a new macro is introduced.  In the pragma once case that does not happen.  The second include uses the content from the first time the header was used in the compilation unit.

This means that if the header is pragma once, then S2 should never be included.

The second feature that is tested is undef'ing macros between versions.  The header always defines the macro.  In the always case the macro is re-defined, in the pragma once case it is not.

This means that if the header is marked pragma once, then S1 should not be included the second time.

Putting those two points together, I think the persisted sequence (when the header is pragma once should be:

#include "h1a.h" // S1
#undef D1

We don't want to include anything after that, because the declarations from S1 should not be reintroduced.

----

Slightly related, I don't understand why it is valid to treat headers with include guards the same as headers that are pragma once.  The include guard can be undef'ed, but the pragma once cannot be undone.
Comment 10 Markus Schorn CLA 2013-07-30 02:30:46 EDT
(In reply to comment #9)
We have tried to implement an exact solution but failed due to way too many versions for each header file. Therefore we settled with an heuristic approach that assumes that a file with an include guard always provides the same content regardless with which macro-dictonary it gets included. Since the approach is an heuristic you can certainly find examples where it does not work.
Comment 11 Andrew Eidsness CLA 2013-08-13 08:44:53 EDT
I see the value of the "treat include guard as pragma once" optimization.  Turning it off makes the indexer take 2-7 times longer (depending on the target project) and the .pdom file is about 50% bigger.

Leaving that aside, I've narrowed in on the source of my problem -- there are differences between getting data from the AST and getting it from the PDOM.

When the project is initially indexed (with 'Index - Rebuild') all data is pulled from the AST as it is constructed.  No errors are found.  However, when I open the file, the CReconciler reads data from the PDOM and finds an error.

There are two versions of the target header file.  During parsing, the proper version is selected and all bindings are resolved.  When reading from the PDOM, both versions are selected -- the incorrect version #undef's a required macro.

I think that PDOMWriter needs to be changed to stop including all loadedVersions of the include statement.  At the very least, I think it should be using the macro dictionary at the point of inclusion in order to filter out invalid versions.
Comment 12 Joseph Henry CLA 2013-09-18 14:04:33 EDT
Is there a proposed solution to this problem?

I would not mind trying to implement a solution if somebody could point me in the right direction.

I really need this bug fixed.
Comment 13 Joseph Henry CLA 2013-09-19 10:04:00 EDT
Created attachment 235628 [details]
Project that produces a problem due to #pragma once sematics

I have added an attachment to this bug. 

This project show a problem that I found that seems to be related to what is going on here.

If you load the project, and then open file S2.c and re-index, you will see that there is an undefined symbol, but if you go to definition of the symbol you will get 2 options.