Bug 246129 - Indexer: includes of the form "path/../file.hpp" not found correctly
Summary: Indexer: includes of the form "path/../file.hpp" not found correctly
Status: RESOLVED FIXED
Alias: None
Product: CDT
Classification: Tools
Component: cdt-parser (show other bugs)
Version: 5.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 6.0   Edit
Assignee: Markus Schorn CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed
Depends on:
Blocks:
 
Reported: 2008-09-03 14:32 EDT by Jens Elmenthaler CLA
Modified: 2008-11-27 04:26 EST (History)
0 users

See Also:


Attachments
CDT project reproducing the issue (90.00 KB, application/octet-stream)
2008-09-03 14:33 EDT, Jens Elmenthaler CLA
no flags Details
Patch against HEAD (2.17 KB, patch)
2008-10-27 04:01 EDT, Jens Elmenthaler CLA
mschorn.eclipse: iplog+
Details | Diff
Unit test checking the desired behavior (platform dependent) (10.29 KB, patch)
2008-11-11 09:30 EST, Jens Elmenthaler CLA
mschorn.eclipse: iplog+
Details | Diff
combined patch including the additional fix (29.38 KB, patch)
2008-11-12 08:38 EST, Markus Schorn CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Elmenthaler CLA 2008-09-03 14:32:10 EDT
Build ID: CDT 5.0.0

Steps To Reproduce:
1.In the attached project, open the file include.cpp
2.Position the cursor over the #include and press F3
3.Move the cursor again to #include an press F3

This opens the include1/type.hpp again, instead of include2/type.hpp.
Comment 1 Jens Elmenthaler CLA 2008-09-03 14:33:58 EDT
Created attachment 111609 [details]
CDT project reproducing the issue
Comment 2 Markus Schorn CLA 2008-09-04 09:09:47 EDT
Interesting example :-).
CDT normalizes the include (as in URI.normalize()) before searching it on the include path. Note that gcc behaves the same:

g++ -I"D:\prj\bug246129\include1" -I"D:\prj\bug246129\include2" -O0 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"src/include.d" -MT"src/include.d" -o"src/include.o" "../src/include.cpp"
../src/include.cpp: In function `int main()':
../src/include.cpp:5: error: `Type' undeclared (first use this function)
../src/include.cpp:5: error: (Each undeclared identifier is reported only once for each function it appears in.)
../src/include.cpp:5: error: expected `;' before "t"
Comment 3 Jens Elmenthaler CLA 2008-09-04 15:22:50 EDT
Sometimes you just can't control what people do, and you can't change the compiler, either.

We use the g++ in the version version 4.1.2 20070626 (Red Hat 4.1.2-14). And the given example builds successfully.
Comment 4 Jens Elmenthaler CLA 2008-10-27 04:01:00 EDT
Created attachment 116164 [details]
Patch against HEAD
Comment 5 Jens Elmenthaler CLA 2008-10-27 04:07:07 EDT
Attached a fix, which seems almost too simple, I guess.

But I tested it on a our large and diverse large code, and it removes a couple of thousands unresolved names, which is because the STL vector class is found now (this bug should affect every C++ open office developer on a recent RedHat).

Another argument why collapsing any "../" before looking it up in the file system is not correct:

ls include2/ext/../type.hpp
include2/ext/../type.hpp

ls include1/ext/../type.hpp
ls: include1/ext/../type.hpp: No such file or directory
Comment 6 Markus Schorn CLA 2008-10-27 04:43:44 EDT
(In reply to comment #5)
> Attached a fix, which seems almost too simple, I guess.
The paths need to be normalized at a later point such that IASTFileLocation.getFileName() returns paths with '..' removed. 

> Another argument why collapsing any "../" before looking it up in the file
> system is not correct:
> ls include2/ext/../type.hpp
> include2/ext/../type.hpp
> ls include1/ext/../type.hpp
> ls: include1/ext/../type.hpp: No such file or directory

I cannot follow this argument, it seems natural that 'type.hpp' can exist in 
'include2' but does not exist in 'include1'.

However you made me think, and with symbolic links (include2/ext -> somewhere)
it would be possible to get:
> ls include2/ext/../type.hpp
> include2/ext/../type.hpp
> ls include2/type.hpp
> ls: include2/type.hpp: No such file or directory

I find it unlikely that there are symbolic links within the include directories of gcc, so I am uncertain whether I fully understand the issue, yet. Please try to nail down the circumstances that lead to the unresolved includes in your setup.

To resolve the issue with the symbolic links the normalization cannot be done purly on the syntax of the path, a file-system operation would be necessary.
Comment 7 Jens Elmenthaler CLA 2008-10-27 06:24:38 EDT
(In reply to comment #6)
> The paths need to be normalized at a later point such that
> IASTFileLocation.getFileName() returns paths with '..' removed. 
Yes, I was assuming this, too. Just didn't know were to look at. My reasoning was the following:
- in include.cpp, hit f3 on type.hpp. Hit f3 again on <ext/../type.hpp>. 
The editor tooltiop shows cdt_bugs/include2/type.hpp. Well, where ever it happens, somebody else took care.
- run the indexer over my larger code base and check the number of unresolved names. Follow the the vector members (this was the type in our code base that was affected), again everything very resonably. The nameresolution improved significantly.
- I checked the refences to the the patched methods, it only affected resolving the full path to include files. So the fix would be correct of all (well, current) usages. In usages like this, you shouldn't treat include1/ext/../type.hpp and include2/ext/../type.hpp as being interchangeable, period.

I can tune my fix of course (unit test?), some mentoring provided.

> > Another argument why collapsing any "../" before looking it up in the file
> > system is not correct:
> > ls include2/ext/../type.hpp
> > include2/ext/../type.hpp
> > ls include1/ext/../type.hpp
> > ls: include1/ext/../type.hpp: No such file or directory
> I cannot follow this argument, it seems natural that 'type.hpp' can exist in 
> 'include2' but does not exist in 'include1'.
My argument was something like this: dont' treat include1/ext/../type.hpp and include2/ext/../type.hpp, include2/type.hpp as being interchangeable. You can only collapse ext/.. if the "ext" exists.
Other tools had to learn this lesson, too.

> Please try
> to nail down the circumstances that lead to the unresolved includes in your
> setup.
The original setup is as straight is the example. The gcc provides the STL vector in include2/type.hpp, and this how everybode wants it to be.

The OpenOffice SDK comes with its vector in include1/type.hpp, and expect all the user code to have the OpenOffice STL in its include path, such that their vector is found first. In addition they rely on the native stl include tree to have the "ext" directory (well no comment about this, but see below).
Here the (relevant) content of the OpenOffice vector:

#include <ext/../vector>

namespace std
{
    typedef vector<bool, std::allocator<bool> > bit_vector;
}

I don't know why they did this, but I have an assmumption: Looking at their history, they startet with their own STL implementation, and this came with the bit_vector above. They later switched to useing the native STL version, and want to be compabible to existing user code. With your back to the wall like this, I can see some reason.

Nevertheless the fact that something like this works, will sooner or later make others do the same.

> To resolve the issue with the symbolic links the normalization cannot be done
> purly on the syntax of the path, a file-system operation would be necessary.
Yes, reworking the link resolution, make sure sure you don't run into the same trap.
Comment 8 Markus Schorn CLA 2008-10-27 09:32:46 EDT
(In reply to comment #7)
I guess I caused some confusion (I thought that your fix removed unresolved includes, while it removed unresolved names). So I do understand your usecase, now.

In the meantime I cannot revert my knowledge, that there is an additional issue, because even if 'include/bla/../test.h' exists you cannot conclude that the file 'include/test.h' is meant to be included. When 'bla' is a symbolic link it would be '<parent-of-link-target>/test.h' Although this is a different issue I am wondering whether it should also be fixed. A fix for this would require a file-system operation, to correctly get rid of the '..'.(In reply to comment #7)
> (In reply to comment #6)
> ...
> - in include.cpp, hit f3 on type.hpp. Hit f3 again on <ext/../type.hpp>. 
> The editor tooltiop shows cdt_bugs/include2/type.hpp. Well, where ever it
> happens, somebody else took care.
Whenever someone constructs a path (new Path(location)) the '..' will be removed in the constructor of Path. (You cannot represent locations using .. in terms of IPath). I guess that is why they don't show up.

> I can tune my fix of course (unit test?), some mentoring provided.
I have not looked into the implementation, yet. However, the location
needs to be normalized after the existance has been tested. It might happen
somewhere anyways because of Path's constructor. All methods, where the location with the potential '..' gets passed to have to be reviewed. Also if the additional issue described above shall be fixed, Path must not be used before the '..' are removed in a proper way.
Comment 9 Jens Elmenthaler CLA 2008-10-28 03:33:16 EDT
Here some learnings:
The ../ is removed by the constructor of java.io.File, which in turn calls java.io.FileSystem.normalize() (see FileExistsCache.isFile, which all methods I traced, ran into).
This seems like the perfect solution (not considering symbolic links yet) to me, because it applies all the file system specific rules for existence of a file (which presumably is identical to what the late gcc and ls, etc use for its job).

(In reply to comment #8)
> In the meantime I cannot revert my knowledge, that there is an additional
> issue, because even if 'include/bla/../test.h' exists you cannot conclude that
> the file 'include/test.h' is meant to be included. When 'bla' is a symbolic
> link it would be '<parent-of-link-target>/test.h' Although this is a different
> issue I am wondering whether it should also be fixed.
Well, in my opionen selecting the header file path and resolving symbolic links belong to 2 different stages.

The first stage is only needed for #include directives, and does not remove "../" from paths. Its task is to translate the concatenation of a path candidate and the file base name to an OS compliant string (primarily changing '/' to the OS specific separator). The changed methods in ScannerUtil where only used by those clients. My proposal would be to rename the ScannerUtil methods into something like toOSString, as it does something similar as the IPath.toOSString does.

The second stage is required in general for any resource that info depending on its content is cached for (e.g. the index), because it needs be used as some kind of key into a map/database.
Its task is to make sure, that regardless how its logical path is written, the correct and unique physical path is determined, before looking up any of its associated information.
It is not just resolving symbolic links, but also determining whether it's a resource belonging to a project (in which case it should be treated as relative to the project root, but still is a physical path).
Collapsing ./ and ../ becomes the easiest part of this task, and FileSystem.normalize could possibly do the job, as well as take over the check for existence of the logical path.
Writing these lines, the ResourceLookup.findFileForLocation seems like a good place to own this responsibility.

Could this work?
Comment 10 Markus Schorn CLA 2008-10-28 05:11:50 EDT
(In reply to comment #9)
> Here some learnings:
> The ../ is removed by the constructor of java.io.File, which in turn calls
> java.io.FileSystem.normalize() (see FileExistsCache.isFile, which all methods I
> traced, ran into)....
The FileExistsCache behaves correctly, however it is not responsible for creating the IndexFileLocations where a normalized path should be used.

> This seems like the perfect solution (not considering symbolic links yet) to
> me, because it applies all the file system specific rules for existence of a
> file (which presumably is identical to what the late gcc and ls, etc use for
> its job).
Yes, we could use this method for normalization, however it is probably not necessary, see below.

> ...
> Well, in my opionen selecting the header file path and resolving symbolic links
> belong to 2 different stages.  ...
I fully agree.

> ...  My proposal would be to rename the ScannerUtil
> methods into something like toOSString, as it does something similar as the
> IPath.toOSString does.
Is the name correct?  toOSString uses '/' on unix and '\' on windows. ScannerUtil uses '/' on all platforms, I believe?

> The second stage ...
For the AST/index it is relevant what kind of locations are stored in the CodeReader that is computed by one of the code-reader factories. Currently
it will be the canonical location (File.getCanonicalPath()) for external files and the location as provided by the resources plugin for workspace files. Neither one of these will contain '..'.
--> I think your original simple patch will actually do it. We should have testcases though, that assert that the '..' don't end up in the AST or in the index.

Furthermore currently all the code-reader factories compute the normalized paths, which makes it hard to verify that the paths are correct in all cases (there are many code reader factories). It would be good to enhance the API of the CodeReaderFactory such that we can pass in the normalized location for creating the code reader.
I am starting to think that we do not need the second stage, because for files of the workspace the location that ends up in the AST/index is obtained from the resources system (see IndexLocationFactory.getWorkspaceIFL(...) and for external files the canonical file is used (see 
Comment 11 Jens Elmenthaler CLA 2008-10-28 08:42:38 EDT
> Is the name correct?  toOSString uses '/' on unix and '\' on windows.
> ScannerUtil uses '/' on all platforms, I believe?
No, the reconcilePath replaces each '\' and '/' by File.separatorChar.

> --> I think your original simple patch will actually do it. We should have
> testcases though, that assert that the '..' don't end up in the AST or in the
> index.
I will see how easy it is to add test cases. If I succeed I create another patch. If I have the choice between writing 100% safe code or a unit test that detects 100% of the relevant bugs in all future, I decide for the last.

> Furthermore currently all the code-reader factories compute the normalized
> paths, which makes it hard to verify that the paths are correct in all cases
> (there are many code reader factories). It would be good to enhance the API of
> the CodeReaderFactory such that we can pass in the normalized location for
> creating the code reader.
> I am starting to think that we do not need the second stage, because for files
> of the workspace the location that ends up in the AST/index is obtained from
> the resources system (see IndexLocationFactory.getWorkspaceIFL(...) and for
> external files the canonical file is used (see 
Well, this actually means, the second stage is already there;-)
Is IndexLocationFactory.getWorkspaceIFL resolving symbolic links, such that I always end at the same file?

Otherwise I could change a resource that is a target of a symbolic link, and the index is updated for the target location. But other clients could see the resource via a symbolic link. They don't see the change since there exist 2 index databases (consider this my naive assumption). I have the suspicion that something like this happens from time to time, but I haven't proven yet.

But this might already leave the scope of this bug.
Comment 12 Markus Schorn CLA 2008-10-28 12:17:27 EDT
(In reply to comment #11)
> > Is the name correct?  toOSString uses '/' on unix and '\' on windows.
> > ScannerUtil uses '/' on all platforms, I believe?
> No, the reconcilePath replaces each '\' and '/' by File.separatorChar.
ok.

> Well, this actually means, the second stage is already there;-)
yes, that's the right way to look at it.

> Is IndexLocationFactory.getWorkspaceIFL resolving symbolic links, such that I
> always end at the same file? ...
For external files, the canonical path is taken (this will resolve the
links). For workspace files the IFile is selected using ResourceLookup.selectFileForLocation(), which is deterministic. It does not resolve symbolic links, rather than that it uses the location as provided by
the resources plugin.
Whether or not all of the resource-change notifications are sent by the platform is a different question. If a change is notified, the index will be
updated.
Comment 13 Jens Elmenthaler CLA 2008-11-11 09:30:43 EST
Created attachment 117548 [details]
Unit test checking the desired behavior (platform dependent)

This patch only provides the unit test part of the fix. It tests:
- that the desired behaviour is achieved
- that no ../ is contained in the index
- that no ../ is contained in the ast

The approach for the fix is to rely on the normalization of the java.io.File implementation of the underlying virtual machine and OS.
The test runs on both behaviours and adapts accordingly.

It passes under (my XP) Windows, which does not check whether the parent cancelled out by /.. exists or not.

For (my RedHat 5) Linux, it currently fails because the IASTPreprocessorIncludeStatement.getPath is not normalized.

So, good tip to check that. I.e. the patch for the actual fix is not OK.
I will correct it.
Comment 14 Jens Elmenthaler CLA 2008-11-11 10:09:48 EST
I just forgot to say about the unit test page that it is a patch against the org.eclipse.cdt.core.tests plugin.
Comment 15 Markus Schorn CLA 2008-11-12 08:38:11 EST
Created attachment 117653 [details]
combined patch including the additional fix

I did see the failure on Windows (fFalseFriendsAccepted==true and the path in the preprocessor node contains '..'). I looked into the failure and cleaned up some of the code-reader stuff. Testcase is passing on my windows-machine now.

Please check whether the entire patch works in your scenarios.
Comment 16 Jens Elmenthaler CLA 2008-11-12 10:08:25 EST
Passes both my platforms (Windows XP, RedHat 5).
Comment 17 Markus Schorn CLA 2008-11-27 04:26:15 EST
Jens, thanks for the work and your patience, I have finally committed the patch.

Fixed in 6.0 > 20081127.