Re: [cdt-dev] Indexing performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [cdt-dev] Indexing performance

From: Vyacheslav Chigrin <vyacheslav.chigryn@xxxxxxxxx>
Date: Mon, 11 Feb 2013 01:04:25 +0400
Delivered-to: cdt-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/cdt-dev>
List-help: <mailto:cdt-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130106 Thunderbird/17.0.2

Yes, I was also surprised so big difference between Linux and Windows.I'll Try to reconfigure antivirus, thank you!

I work on Windows most time, so I did not noticed that on Linux eclipsemuch faster.

Thank you for the description about how to find indexing problems, ithelps me more properly configure workspace.


Best regards,
Vyacheslav

On 02/10/2013 11:47 AM, Oberhuber, Martin wrote:

Actually, regarding the Windows times...

Is it possible that you have an on-access virus scanner scanning all your .c and .h files ?
That would explain the 50 min CPU time in the kernel...

Specifying "exclude folders" with the virus scanner for your workspaces may help a lot, also for compilation.

Thanks,
Martin
--
Martin Oberhuber, SMTS / Product Architect – Development Tools, Wind River
direct +43.662.457915.85  fax +43.662.457915.6


-----Original Message-----
From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Oberhuber, Martin
Sent: Saturday, February 09, 2013 10:14 PM
To: Vyacheslav Chigrin; CDT General developers list.
Subject: Re: [cdt-dev] Indexing performance

Hello Vyacheslav,

With 10k unresolved includes and 100k syntax errors, your indexing quality is certainly not outstanding.
Compare against my project - I had almost twice as many sources/declarations/references, but only 1k unresolved includes and 40k syntax errors.

I suggest that you try some navigations to see whether the index is actually usable for you.
For instance, try some STL container (like a map) and see whether in an expression like
       myMap[key].myMethod()
you can navigate to myMethod and whether the calltree for myMethod does show that caller.
If not, then your STL headers have not been found and I'd consider your index not really usable and you will need to work on your indexer setup (the -I and -D settings).

Your 1.5% unresolved names is not extremely bad though - essentially it means that 1.5% of your navigation requests or calltree requests are going to fail because the respective symbol was not found; But 100k syntax errors is pretty bad, it says that some preprocessor macros were likely not understood which might mean that a good deal of your index might be garbage.

Syntax errors should usually be shown with a yellow squiggly underline in the editor; or you can log all syntax errors during parsing, when you save attached file somewhere and than launch your Eclipse like this:
         eclipse -debug $HOME/indexer-debug-options.txt > indexer-log.txt with your 100k syntax errors, that log is going to grow pretty huge.

But anyways, in my experience the index quality is largely unrelated to index performance.
45 minutes indexing time is absolutely reasonable for the size of your code base; I don't think that is going to change much when you improve index quality. Regarding the Windows time, I assume it's so much worse because the file system is so slow on Windows (it spends 50 minutes in the Kernel, either accessing the file system or doing stupid spinlocks for multithreading; a very good example that Windows is not the right OS if you're looking at Performance).

Personally, I find 45 minutes pretty good for the size of your code base; why would you want to further tune performance ? What actions / workflow feels too slow for you ?

Thanks,
Martin
--
Martin Oberhuber, SMTS / Product Architect – Development Tools, Wind River direct +43.662.457915.85  fax +43.662.457915.6


-----Original Message-----
From: Vyacheslav Chigrin [mailto:vyacheslav.chigryn@xxxxxxxxx]
Sent: Saturday, February 09, 2013 9:52 PM
To: CDT General developers list.
Cc: Oberhuber, Martin
Subject: Re: [cdt-dev] Indexing performance

Hello, Martin

I performed experiments you suggested on my system (Core i5 3.4 GHz, 6GB RAM, SSD Hard drive). I use Linux (Ubuntu 12.04) and Windows (Windows 7), project sources on local hard drive (on Linux - on Linux partition, on Windows - on Windows partition). I have eclipse Juno, run on 64-bit JRE, modified eclipse.ini by adding values you suggested. Project does not define much C++ template classes, although STL used intensively.
Most code is C++, although some C files also present. Some source files are generated during build process, may be because of this number of indexed files on Windows and Linux differ (see below).

On Linux indexing took about 47min, here is last line from error log after indexing

Indexed 'project' (30,428 sources, 53,681 headers) in 2,751.48 sec:
6,571,612 declarations; 30,354,126 references; 10,737 unresolved inclusions; 102,640 syntax errors; 570,779 unresolved names (1.52%) 'time' command output is
real	47m18.385s
user	50m18.485s
sys	3m29.209s
Since user time here is even greater then real (seems, because of multithreading), I suppose that CPU, not hard drive is the limiting factor.

On Windows  last line is
Indexed 'project' (31 752 sources, 32 657 headers) in 7 341,33 sec:
7 254 957 declarations; 33 186 193 references; 8 786 unresolved inclusions; 449 262 syntax errors; 746 128 unresolved names (1,81 %) Indexing took about 2 hours, and Process Explorer shows that javaw.exe process used CPU during following times Kernel  0:50:20.850
User:   1:26:51.385
Total:  2:17:12.235
Here TotalTime also near indexing time.

Do you think that behavior caused by not properly set workspace (too many unresolved includes etc.)? Or may be it is because of SSD hard drive, with faster response times?

Thank you,
Vyacheslav

On Fri, Feb 8, 2013 at 1:12 PM, Oberhuber, Martin <Martin.Oberhuber@xxxxxxxxxxxxx> wrote:

Hello Vyacheslav,

Before going anywhere deeper, I would suggest collecting some basic data:

    - What is your host OS ?
    - Where are the files located (local disk / NFS) ?
    - Is your indexer setup reasonably correct (include paths, preprocessor macros; #unresolved symbols statistics) ?
    - How many files are you looking at ?
    - Do you use many C++ templates ?
    - Running Eclipse under "time" when reparsing your project, what is the amount of user / sys / real time ?

To give you some reference point, I've been indexing a project with
150.000 files (mostly C but some C++) On a Linux box, all files local, with 0.33% unresolved symbols in 90 minutes real-time (user: 60 min, sys: 4 min).
Note that you need -vmargs -Xmx2048m -XX:MaxPermSize=512m for such a large project.
Details here: https://bugs.eclipse.org/bugs/show_bug.cgi?id=394151#c3

As you see, 30 % of the time goes into plain file access, waiting for the disk (90 min real - 60 min user).
Personally I do not think that there is much optimization potential
left in this scenario -- the index has to be a shared database, so I don't see much potential for improvements by multi-threading here.

I think the easiest way for you to get to these numbers is this, using
the "time" command on Linux (on Windows you could probably use Task Manager and read the numbers):

    1. Set up your project In Eclipse. Make sure that Preferences "Refresh Workspace on startup" is OFF.
    2. Window > Show View > Other : Errorlog and look at the indexing statistics.
         - unresolved includes should not be too many, unresolved symbols should be < 5%
         - if these criteria are not met, you likely have incorrect config of macros/includes, and a massive index quality problem.
    3. Quit Eclipse
    4. time eclipse
    5. Right-click project > Index > Rebuild

I'm curious to see what numbers you have.

Thanks,
Martin
--
Martin Oberhuber, SMTS / Product Architect - Development Tools, Wind
River direct +43.662.457915.85  fax +43.662.457915.6


-----Original Message-----
From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx]
On Behalf Of Vyacheslav Chigrin
Sent: Thursday, February 07, 2013 10:01 PM
To: cdt-dev@xxxxxxxxxxx
Subject: [cdt-dev] Indexing performance

Hello,

I am using Eclipse CDT on very large C++ project and I am very interested in improving indexing performance. Searching web shows that there are already a lot of work performed in this direction. I am very new in Eclipse developing, so I am asking - is there known good start point for this task? Are there any known bottle necks? Is parallel indexing considered to do it faster on multi-core systems? I will be happy If I could to do anything useful for the project.

Thank you,
Vyacheslav
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev

References:
- [cdt-dev] Indexing performance
  - From: Vyacheslav Chigrin
- Re: [cdt-dev] Indexing performance
  - From: Oberhuber, Martin
- Re: [cdt-dev] Indexing performance
  - From: Vyacheslav Chigrin
- Re: [cdt-dev] Indexing performance
  - From: Oberhuber, Martin
- Re: [cdt-dev] Indexing performance
  - From: Oberhuber, Martin

Prev by Date: Re: [cdt-dev] Indexing performance
Next by Date: Re: [cdt-dev] Issue with TreeModelViewer in Linux OS
Previous by thread: Re: [cdt-dev] Indexing performance
Next by thread: [cdt-dev] Issue with TreeModelViewer in Linux OS
Index(es):
- Date
- Thread

Breadcrumbs