Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [egit-dev] speeding up re-indexing

On 05/30/2012 09:32 AM, Markus Duft wrote:
[snip]
>> phew. i traced some more, and i found out that neither mightBeRacilyClean() nor smudgeRacilyClean() are involved in smudging the index?!
>>
>> i added printlns() there and everything looks ok - only one single file coming by, but 33000 smudged?!
> 
> and also DirCacheEntry#setLength is not hit. it is called with 0 for every empty file, but not a single other one. for me it does not look like jgit does something wrong "intentionally". maybe the field with the size is accidentally overwritten somewhere? it's not smudging related code that causes smudging here...
> 
> i'm out of ideas now - seems you guys have to take over the rest... can somebody reproduce the problem? just rebase a commit on another branch, and according to "git ls-files --debug -s | grep "size: 0" | wc -l" all tracked files are smudged.

And the changes

https://git.eclipse.org/r/#/c/6137/
https://git.eclipse.org/r/#/c/6138/

make the problem go away. index is not smudged anymore after rebase (only a single file, but thats not so dramatic, and probably ok). Not sure whether they just "hide" a underlying problem or really fix it...

Regards,
Markus

> 
> regards,
> Markus
> 
>>
>> markus
>>
>>>
>>> Markus
>>>
>>>>
>>>> i have this as starting point:
>>>>
>>>> A--B
>>>>
>>>> now i create a new branch from A and commit something
>>>>
>>>>  --C
>>>> /
>>>> A--B
>>>>
>>>> smudged count changes from ~195 (all empty files) to ~198.
>>>>
>>>> then i rebase C on B
>>>>
>>>> A--B--C'
>>>>
>>>> the smudged count increases to 33994. a more intelligent grep (i counted bin directories mistakingly last time) showed that this is every single file that is tracked... bad?
>>>>
>>>> re-indexing in the ide still does not take notably longer (maybe off by 2-5 seconds), but i suspect that my machine has a super-great filesystem cache (i have a 16-core workstation with 12G ram and fast discs).
>>>>
>>>> Regards,
>>>> Markus
>>>>
>>>> On 05/30/2012 06:47 AM, Kevin Sawicki wrote:
>>>>> It would be good to know if the latest master exhibits the same behavior.
>>>>>
>>>>> Kevin
>>>>>
>>>>> On Tue, May 29, 2012 at 9:41 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> wrote:
>>>>>
>>>>>     master from 2.0.0.201204111131 - would an update to current master help..?
>>>>>
>>>>>     On 05/30/2012 06:40 AM, Kevin Sawicki wrote:
>>>>>     > What version of EGit/JGit are you currently using?
>>>>>     >
>>>>>     > On Tue, May 29, 2012 at 9:34 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> wrote:
>>>>>     >
>>>>>     >     On 05/30/2012 06:27 AM, Kevin Sawicki wrote:
>>>>>     >     > Hopefully both fixes will still make it into the 2.0 release next month.
>>>>>     >
>>>>>     >     oh - interesting; i did a fetch and rebase on the repo (just the two straight forward), and it gave me back all 33000 smudged index entries?!
>>>>>     >
>>>>>     >     Markus
>>>>>     >
>>>>>     >     >
>>>>>     >     > On Tue, May 29, 2012 at 9:24 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>> wrote:
>>>>>     >     >
>>>>>     >     >     On 05/30/2012 06:22 AM, Kevin Sawicki wrote:
>>>>>     >     >     > Smudged index entries occur when the index is written and the timestamp of the index file is close to the timestamp of the file in the working directory.
>>>>>     >     >     >
>>>>>     >     >     > You can update them by running a git status on the command line.
>>>>>     >     >
>>>>>     >     >     Thanks for the hint. Sadly i cannot force all my developers to have a command line git, although i recommend it. Still, since we're building our own JGit/EGit versions with some minor workarounds anyway, i may apply one of the two if applicable.
>>>>>     >     >
>>>>>     >     >     Regards,
>>>>>     >     >     Markus
>>>>>     >     >
>>>>>     >     >     >
>>>>>     >     >     > You can read more about it here: https://raw.github.com/git/git/master/Documentation/technical/racy-git.txt
>>>>>     >     >     >
>>>>>     >     >     > There are a couple JGit fixes proposed to prevent these from being left in the index:
>>>>>     >     >     >
>>>>>     >     >     > https://git.eclipse.org/r/#/c/6137/
>>>>>     >     >     >
>>>>>     >     >     > https://git.eclipse.org/r/#/c/6138/
>>>>>     >     >     >
>>>>>     >     >     > Just to clarify, there are very valid cases for having smudged index entries.  The issue with JGit currently is that they aren't being updated and removed when applicable.
>>>>>     >     >     >
>>>>>     >     >     > Kevin
>>>>>     >     >     >
>>>>>     >     >     > On Tue, May 29, 2012 at 9:16 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>>> wrote:
>>>>>     >     >     >
>>>>>     >     >     >     On 05/29/2012 05:20 PM, Kevin Sawicki wrote:
>>>>>     >     >     >     > Have you tried checking if your index contains smudged entries?
>>>>>     >     >     >
>>>>>     >     >     >     hm. not yet. how can smudged entries occur?
>>>>>     >     >     >
>>>>>     >     >     >     >
>>>>>     >     >     >     > This will trigger a full SHA-1 redigest each time the indexing occurs.
>>>>>     >     >     >
>>>>>     >     >     >     that could explain why it takes so long.
>>>>>     >     >     >
>>>>>     >     >     >     >
>>>>>     >     >     >     > From the command line: git ls-files --debug -s | grep -B5 "size: 0"
>>>>>     >     >     >
>>>>>     >     >     >     "a few" - approximately 33993 smudged entries - what can i do about it?
>>>>>     >     >     >
>>>>>     >     >     >     thanks for helping!
>>>>>     >     >     >     Markus
>>>>>     >     >     >
>>>>>     >     >     >     >
>>>>>     >     >     >     > If this command shows any output for files where the SHA-1 is not equal to "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" then the index entry is smudged.
>>>>>     >     >     >     >
>>>>>     >     >     >     > Kevin
>>>>>     >     >     >     >
>>>>>     >     >     >     > On Tue, May 29, 2012 at 8:14 AM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx
>>>>>     <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>>>> wrote:
>>>>>     >     >     >     >
>>>>>     >     >     >     >     On 05/29/2012 03:41 PM, Baumgart, Jens wrote:
>>>>>     >     >     >     >     > I assume there is no "easy" way to speed it up.
>>>>>     >     >     >     >     > I could imagine JGit giving more details in its events (indexChanged and
>>>>>     >     >     >     >     > refsChanged). This would allow to avoid complete re-indexing in many cases.
>>>>>     >     >     >     >     > I am wondering why it's so slow for your repo. Re-indexing takes some
>>>>>     >     >     >     >     > seconds for big repos like the linux kernel.
>>>>>     >     >     >     >     > Do you store large binaries in your repo?
>>>>>     >     >     >     >
>>>>>     >     >     >     >     hm, not binaries. we have .xml.zip files (models) that are ~5-10MB in size, but what we also have are .xml files that are ~50MB in size, and not one but a dozen of them. could that matter?
>>>>>     >     >     >     >
>>>>>     >     >     >     >     any chance to find out /what/ takes so long?
>>>>>     >     >     >     >
>>>>>     >     >     >     >     Regards,
>>>>>     >     >     >     >     Markus
>>>>>     >     >     >     >
>>>>>     >     >     >     >     > --
>>>>>     >     >     >     >     > Jens
>>>>>     >     >     >     >     >
>>>>>     >     >     >     >     > On 29.05.12 15:34, "Markus Duft" <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> <mailto:markus.duft@xxxxxxxxxx
>>>>>     <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>>>>> wrote:
>>>>>     >     >     >     >     >
>>>>>     >     >     >     >     >> Hey!
>>>>>     >     >     >     >     >>
>>>>>     >     >     >     >     >> is there an "easy" (meaning not weeks of work) way to speed up the
>>>>>     >     >     >     >     >> re-indexing of repositories? it takes approx. 2 minutes for our repo
>>>>>     >     >     >     >     >> (~95177 files to scan) on a linux machine with _fast_ discs. Not to speak
>>>>>     >     >     >     >     >> of our poor windows developers with notebooks (~5-10 minutes!)
>>>>>     >     >     >     >     >>
>>>>>     >     >     >     >     >> Regards,
>>>>>     >     >     >     >     >> Markus
>>>>>     >     >     >     >     >> _______________________________________________
>>>>>     >     >     >     >     >> egit-dev mailing list
>>>>>     >     >     >     >     >> egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>>>
>>>>>     >     >     >     >     >> https://dev.eclipse.org/mailman/listinfo/egit-dev
>>>>>     >     >     >     >     >
>>>>>     >     >     >     >     _______________________________________________
>>>>>     >     >     >     >     egit-dev mailing list
>>>>>     >     >     >     >     egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>>>>
>>>>>     >     >     >     >     https://dev.eclipse.org/mailman/listinfo/egit-dev
>>>>>     >     >     >     >
>>>>>     >     >     >     >
>>>>>     >     >     >
>>>>>     >     >     >
>>>>>     >     >
>>>>>     >     >
>>>>>     >
>>>>>     >
>>>>>
>>>>>
>>> _______________________________________________
>>> egit-dev mailing list
>>> egit-dev@xxxxxxxxxxx
>>> https://dev.eclipse.org/mailman/listinfo/egit-dev
>> _______________________________________________
>> egit-dev mailing list
>> egit-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/egit-dev
> _______________________________________________
> egit-dev mailing list
> egit-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/egit-dev


Back to the top