Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] [egit-dev] speeding up re-indexing

On 05/30/2012 06:22 AM, Kevin Sawicki wrote:
> Smudged index entries occur when the index is written and the timestamp of the index file is close to the timestamp of the file in the working directory.
> 
> You can update them by running a git status on the command line.

May of interest for the List too. Kevin helped me out a little. It seems i had ~33000 smudged index entries, causing re-indexing to re-SHA-1 all the files. bad. after getting rid if those index entries, i'm back down to ~20 seconds for the repo, which is not perfect, but way better.

(Must i say that i would highly appreciate the two changes mentioned below getting into 2.0? ;))

Thanks Kevin!
Markus

> 
> You can read more about it here: https://raw.github.com/git/git/master/Documentation/technical/racy-git.txt
> 
> There are a couple JGit fixes proposed to prevent these from being left in the index:
> 
> https://git.eclipse.org/r/#/c/6137/
> 
> https://git.eclipse.org/r/#/c/6138/
> 
> Just to clarify, there are very valid cases for having smudged index entries.  The issue with JGit currently is that they aren't being updated and removed when applicable.
> 
> Kevin
> 
> On Tue, May 29, 2012 at 9:16 PM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>> wrote:
> 
>     On 05/29/2012 05:20 PM, Kevin Sawicki wrote:
>     > Have you tried checking if your index contains smudged entries?
> 
>     hm. not yet. how can smudged entries occur?
> 
>     >
>     > This will trigger a full SHA-1 redigest each time the indexing occurs.
> 
>     that could explain why it takes so long.
> 
>     >
>     > From the command line: git ls-files --debug -s | grep -B5 "size: 0"
> 
>     "a few" - approximately 33993 smudged entries - what can i do about it?
> 
>     thanks for helping!
>     Markus
> 
>     >
>     > If this command shows any output for files where the SHA-1 is not equal to "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" then the index entry is smudged.
>     >
>     > Kevin
>     >
>     > On Tue, May 29, 2012 at 8:14 AM, Markus Duft <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> wrote:
>     >
>     >     On 05/29/2012 03:41 PM, Baumgart, Jens wrote:
>     >     > I assume there is no "easy" way to speed it up.
>     >     > I could imagine JGit giving more details in its events (indexChanged and
>     >     > refsChanged). This would allow to avoid complete re-indexing in many cases.
>     >     > I am wondering why it's so slow for your repo. Re-indexing takes some
>     >     > seconds for big repos like the linux kernel.
>     >     > Do you store large binaries in your repo?
>     >
>     >     hm, not binaries. we have .xml.zip files (models) that are ~5-10MB in size, but what we also have are .xml files that are ~50MB in size, and not one but a dozen of them. could that matter?
>     >
>     >     any chance to find out /what/ takes so long?
>     >
>     >     Regards,
>     >     Markus
>     >
>     >     > --
>     >     > Jens
>     >     >
>     >     > On 29.05.12 15:34, "Markus Duft" <markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx> <mailto:markus.duft@xxxxxxxxxx <mailto:markus.duft@xxxxxxxxxx>>> wrote:
>     >     >
>     >     >> Hey!
>     >     >>
>     >     >> is there an "easy" (meaning not weeks of work) way to speed up the
>     >     >> re-indexing of repositories? it takes approx. 2 minutes for our repo
>     >     >> (~95177 files to scan) on a linux machine with _fast_ discs. Not to speak
>     >     >> of our poor windows developers with notebooks (~5-10 minutes!)
>     >     >>
>     >     >> Regards,
>     >     >> Markus
>     >     >> _______________________________________________
>     >     >> egit-dev mailing list
>     >     >> egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>
>     >     >> https://dev.eclipse.org/mailman/listinfo/egit-dev
>     >     >
>     >     _______________________________________________
>     >     egit-dev mailing list
>     >     egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx> <mailto:egit-dev@xxxxxxxxxxx <mailto:egit-dev@xxxxxxxxxxx>>
>     >     https://dev.eclipse.org/mailman/listinfo/egit-dev
>     >
>     >
> 
> 


Back to the top