Bug 474064 - A performance issue when creating a git repository in a big file system
Summary: A performance issue when creating a git repository in a big file system
Status: RESOLVED FIXED
Alias: None
Product: EGit
Classification: Technology
Component: Core (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: 4.1   Edit
Assignee: Andrey Loskutov CLA
QA Contact:
URL:
Whiteboard:
Keywords: needinfo
Depends on:
Blocks:
 
Reported: 2015-07-31 16:03 EDT by Snjezana Peco CLA
Modified: 2015-09-16 06:53 EDT (History)
4 users (show)

See Also:


Attachments
A stack trace (20.46 KB, text/plain)
2015-08-03 13:21 EDT, Snjezana Peco CLA
no flags Details
Repo with lot of files (124.69 KB, application/zip)
2015-08-03 14:37 EDT, Andrey Loskutov CLA
no flags Details
screenshot with no activity on large folder (212.61 KB, image/png)
2015-08-05 14:46 EDT, Andrey Loskutov CLA
no flags Details
a screenshot (94.22 KB, image/png)
2015-08-06 11:56 EDT, Snjezana Peco CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Snjezana Peco CLA 2015-07-31 16:03:00 EDT
We have faced this issue when importing the wildfly repository (https://github.com/wildfly/wildfly) to Eclipse.

Steps to reproduce:

- initialize a git repository in a large file system
You don't need to add anything to that repository.
- create a simple Eclipse project in that file system

EGit will continuously scan the whole file system and CPU will be very busy. If you import some big project (wildfly, for instance) to such a configuration, you will probably lock Eclipse.


https://issues.jboss.org/browse/JBIDE-16379 contains more details about this issue.
Comment 1 Andrey Loskutov CLA 2015-07-31 16:12:42 EDT
Which egit version was used? Can you try nightly 4.1 builds please? Can you create few jstack dumps while the system is busy?
Comment 2 Andrey Loskutov CLA 2015-08-01 14:56:34 EDT
I can't reproduce, using 4.1 nightly build. 

After cloning repo (I've cloned master branch only) I import the root of it as "general" project into Eclipse - in few seconds is everything there. I can use history, I can work with staging view with no delays at all.

The repository itself isn't that big, I would say it is small, and I have EGit running on *much* bigger repos (~7GB with more .gitignore files as regular files on master branch here :-)).

So after all this I need more info to proceed, first of all you should try to use EGit 4.1 nightly in your environment (update site: http://download.eclipse.org/egit/updates-nightly) and if this still will not help, create jstack dumps and add exact steps to reproduce.
Comment 3 Snjezana Peco CLA 2015-08-03 13:19:52 EDT
I have tested EGit included in Mars (EGit 4.0.0), 4.1.0.201507271903, as well as many previous versions.

You don't need either clone or import any git repositories. 
You have to choose a large directory containing a lot of files and subdirectories that aren't included in any git repositories and

- initialize git in that directory from the command line (git init)
- create an Eclipse project using some Eclipse distribution including EGit with default settings.

You will see an EGit job in the Progress view taking a long time (depending on the size of the directory you have chosen.
If you import a large repository (as wildfly, for instance), you will probably freeze Eclipse. 
We faced that issue when we created some symbolic links in Linux. 
Please see https://issues.jboss.org/browse/JBIDE-16379 for more details.

EGit usually calls the File.exists() method from the IndexDiffCacheEntry class.
Comment 4 Snjezana Peco CLA 2015-08-03 13:21:42 EDT
Created attachment 255597 [details]
A stack trace
Comment 5 Snjezana Peco CLA 2015-08-03 13:24:52 EDT
(In reply to Snjezana Peco from comment #3)
> - create an Eclipse project using some Eclipse distribution including EGit
> with default settings.

- create an Eclipse project in that directory
Comment 6 Andrey Loskutov CLA 2015-08-03 14:14:43 EDT
(In reply to Snjezana Peco from comment #3)
> I have tested EGit included in Mars (EGit 4.0.0), 4.1.0.201507271903, as
> well as many previous versions.
> 
> You don't need either clone or import any git repositories. 
> You have to choose a large directory containing a lot of files and
> subdirectories that aren't included in any git repositories and

What is "large"? 10000 files in one directory? 10000 sub-directories one in another?

> - initialize git in that directory from the command line (git init)
> - create an Eclipse project using some Eclipse distribution including EGit
> with default settings.

In which directory should I create a project? In the root of that "large" directory? Or any special place?

> You will see an EGit job in the Progress view taking a long time (depending
> on the size of the directory you have chosen.

By "directory size" do you mean all children recursively or only the first level children?

> If you import a large repository (as wildfly, for instance), you will
> probably freeze Eclipse. 

"Freeze" means UI is frozen? Or hight CPU load? From the stack trace you've attached, UI is just fine.

> We faced that issue when we created some symbolic links in Linux. 

But you do not mention any links in any of the steps to reproduce. So do we need symbolic links to be added to the repo? How many? Are they special in any kind - e.g. recursive or "broken"?

> Please see https://issues.jboss.org/browse/JBIDE-16379 for more details.

Please if you want the bug fixed, put all relevant details here. BTW the bug you've mentioned also does not provide steps to reproduce and is called "Using symbolic links in the workspace path causes an infinite loop for m2e builder". It talks about wildfly repo, "auto-share" projects, validators, builders etc but still does not have steps to reproduce.

> EGit usually calls the File.exists() method from the IndexDiffCacheEntry
> class.

Yes, and this should not freeze anyone, since all this calls are done from non-UI thread.

Please share *exact* steps to reproduce.
Comment 7 Andrey Loskutov CLA 2015-08-03 14:37:37 EDT
Created attachment 255600 [details]
Repo with lot of files

I've attached an example repo to start playing with.
This is just a repository with lot of empty files and few scripts to create even more files. Please do whatever needed to make your case reproducible with this repo, commit and attach it here back.

From playing with that repo one can see that most of the time Eclipse spends on refreshing packages view (while creating & updating SWT elements for nodes) or any other "explorer" one can use to browse files in that directory. Egit doesn't play any role until you open staging view, which of again spends lot of time in creating & updating SWT elements for nodes. If this is your problem, than it is not an EGit issue but SWT/JFace related issue - they just don't scale on directories with >= 10000 files each.
Comment 8 Snjezana Peco CLA 2015-08-03 15:01:01 EDT
The only you need to do is to choose a large file system that doesn't contain any git repositories. I have tested a directory with 900000 files/subdirectories.
All you have to do is to initialize a git repo in that directory and create an Eclipse project in it. Do not add anything to that git repository. Just initialize it.
Comment 9 Snjezana Peco CLA 2015-08-03 15:02:56 EDT
The issue can't be reproduced in Eclipse without EGit.
Comment 10 Andrey Loskutov CLA 2015-08-03 17:37:03 EDT
(In reply to Snjezana Peco from comment #9)
> The issue can't be reproduced in Eclipse without EGit.

Can it be that you have missed one point in the steps to reproduce: one has to open staging view? As I've already mentioned, SWT/JFace is not able to properly (mean fast enough) handle tables/trees with huge number of elements - and this is the case if you have a git repo with 900000 not tracked elements inside. Staging view will try to create a table with that number of elements and this will make SWT/JFace busy for a while.

So can you please confirm that your actual issue is that if users have staging view open while importing/creating new projects with 900000 not tracked elements can cause UI freeze? This is what I observe right now. If this view is not opened, everything is OK. Is this correct?
Comment 11 Eclipse Genie CLA 2015-08-03 18:06:31 EDT
New Gerrit change created: https://git.eclipse.org/r/53105
Comment 12 Snjezana Peco CLA 2015-08-03 18:53:35 EDT
You don't have to open any views. EGit refreshes the repository automatically which takes a long time.
Check the stack trace.
Comment 13 Andrey Loskutov CLA 2015-08-04 15:27:14 EDT
(In reply to Snjezana Peco from comment #12)
> You don't have to open any views. EGit refreshes the repository
> automatically which takes a long time.
> Check the stack trace.

I can't reproduce. I have really tried many times. Please provide *exact* steps to reproduce.

I've opened separated bug 474258 to track issues with staging view, but I can't proceed here anymore.

Try to describe exact things you do:
1) Install plain Eclipse 4.5.0 SDK
2) Create "Huge" project
3) Use "xyz" script to create 100000 files
4) Select project in the package explorer view
5) Select "abc" node
6) Observe *something*. I don't observe *anything* bad.

Take an example from bug 474258 - this is an empty repo with scripts which crate you 100000 files. I don't see *any* automatic Git refreshes the repository which would take any considerably long time. *All* UI freezes I can observe are coming from rendering that amount of data in JFace.
Comment 14 Snjezana Peco CLA 2015-08-05 07:40:56 EDT
> 1) Install plain Eclipse 4.5.0 SDK

and some EGit version

> 2) Create "Huge" project

No. All you have to do is to select a large file system and initialize a git repository. You can try your ~/.m2 if you use Maven.
For example:

cd ~/.m2
git init

> 3) Use "xyz" script to create 100000 files

No. 

> 4) Select a project in the package explorer view

I have created a new project (an Eclipse General project) in the directory selected in step 2

> 5) Select "abc" node

Not necessary.

> 6) Observe *something*. I don't observe *anything* bad.

Open the Progress view, generate a stack trace ...
If you have selected a large directory, you will notice an EGit job taking a long time.
You don't need to use EGit at all, it is only necessary to have it installed.
Most of the time is used to call File.exists (see the attached stack trace).
There is no UI freeze or slowing down in this moment, but Eclipse will become slow later on because this job runs continuously.
Basically, EGit will significantly slow down Eclipse if there is a git repository(ies) containing a lot of untracked files.
I suppose this is the reason why EGit excludes directories like  "/", "C:\", "/home", "/home/username".

Please take a look at my comment at https://issues.jboss.org/browse/JBIDE-16379?focusedCommentId=13011208&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13011208
Comment 15 Andrey Loskutov CLA 2015-08-05 07:54:04 EDT
(In reply to Snjezana Peco from comment #14)

> > 2) Create "Huge" project
> 
> No. All you have to do is to select a large file system and initialize a git
> repository. You can try your ~/.m2 if you use Maven.

No, I will not do it for a simple reason: it is not reproducible, since no one can reproduce what Maven does.

Please give me an example which can be reproduced, something like I've attached to bug 474258.

> You don't need to use EGit at all, it is only necessary to have it installed.
> Most of the time is used to call File.exists (see the attached stack trace).
> There is no UI freeze or slowing down in this moment, but Eclipse will
> become slow later on because this job runs continuously.

This is what I cannot observe with the example from bug 474258 tuned to create 200000 files in the directory. Looking at your jboss issue you refer to - can it be that "infinite loop for m2e builder" is the root cause? Then for sure if the builder is *building* something, those file system changes must be also reflected by EGit, therefore permanent checks for File.exists. I have no m2e installed and this seem not needed in all preconditions listed by you - but can it be that this m2e is actually *required* to reproduce the problem?

> Please take a look at my comment at
> https://issues.jboss.org/browse/JBIDE-
> 16379?focusedCommentId=13011208&page=com.atlassian.jira.plugin.system.
> issuetabpanels:comment-tabpanel#comment-13011208

This does not add any details to what we had already here - please correct me if I'm wrong and miss something.
Comment 16 Snjezana Peco CLA 2015-08-05 11:33:43 EDT
The issue can be reproduced without m2e or JBoss Tools, but can't without EGit.
You need to choose a large directory. 
Start Eclipse SDK+EGit in a new workspace and create a project in that directory.
Using the two above mentioned steps, the issue can be reproduced every time with different versions of EGit.
Comment 17 Andrey Loskutov CLA 2015-08-05 14:46:03 EDT
Created attachment 255653 [details]
screenshot with no activity on large folder

(In reply to Snjezana Peco from comment #16)
> The issue can be reproduced without m2e or JBoss Tools, but can't without
> EGit.
> You need to choose a large directory.

Snjezana, please, can you be *concrete*?

Give me please reproducible definition of "large". Is this "10000 files in one single directory"? Or "200000 files in 10 directories"? Or "2000 files in 2000 directories"? Or "5 GB in 10 files"? As said, I've tried with your wildfly repo example, I've tried "200000 files in 10 directories" case and could not see anything strange - you can repeat my steps 1:1 because I've provided you all the data, but I can't repeat your steps because "large" is not repeatable.

So today I've also tried with my Linux lib64 directory - it's 2.1 GB big, contains lot of symlinks, ~25000 files in ~1500 folders - with the same effect - it "just works". Is that "large" or not? You can see on the attached picture that there is no CPU/disk activity at all.

I *will* fix the issue but I can't see it!
Comment 18 Snjezana Peco CLA 2015-08-06 11:54:28 EDT
> Snjezana, please, can you be *concrete*?

I said I tested a directory containing 900000+ files and directories. It occupies 26GB.
It doesn't matter what directory you will choose, but it is easier to reproduce the issue if it is bigger.
Can you see the "Computing Git status for repository X ..." job in the Progress view?
When I create a git repository and an Eclipse project in the mentioned directory, EGit starts this job that lasts for hours.

The issue can also be reproduced on Windows. 
Attached is a screenshot. The following is a stack trace:

"Worker-3" #30 prio=5 os_prio=0 tid=0x000000000da69000 nid=0x1750 runnable [0x0000000022eff000]
   java.lang.Thread.State: RUNNABLE
	at java.io.WinNTFileSystem.getBooleanAttributes(Native Method)
	at java.io.File.exists(Unknown Source)
	at org.eclipse.jgit.treewalk.FileTreeIterator$FileEntry.<init>(FileTreeIterator.java:173)
	at org.eclipse.jgit.treewalk.FileTreeIterator.entries(FileTreeIterator.java:144)
	at org.eclipse.jgit.treewalk.FileTreeIterator.<init>(FileTreeIterator.java:129)
	at org.eclipse.egit.core.AdaptableFileTreeIterator.<init>(AdaptableFileTreeIterator.java:74)
	at org.eclipse.egit.core.AdaptableFileTreeIterator.createSubtreeIterator(AdaptableFileTreeIterator.java:85)
	at org.eclipse.jgit.treewalk.AbstractTreeIterator.createSubtreeIterator(AbstractTreeIterator.java:535)
	at org.eclipse.jgit.treewalk.TreeWalk.enterSubtree(TreeWalk.java:924)
	at org.eclipse.jgit.treewalk.TreeWalk.next(TreeWalk.java:578)
	at org.eclipse.jgit.lib.IndexDiff.diff(IndexDiff.java:434)
	at org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.calcIndexDiffDataFull(IndexDiffCacheEntry.java:534)
	at org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.access$6(IndexDiffCacheEntry.java:523)
	at org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry$4.run(IndexDiffCacheEntry.java:290)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 19 Snjezana Peco CLA 2015-08-06 11:56:25 EDT
Created attachment 255677 [details]
a screenshot
Comment 20 Max Rydahl Andersen CLA 2015-08-09 10:48:19 EDT
Snjezana, I think you must be missing some step since even I cannot reproduce it. 

As requested by Andrey lets be *concrete* as in list *all* the steps, not just "create project in that directory". You also mention wildfly repos and symbolic links - but none of these seem to be used/relevant in your steps ?

Here is what I read from your comments:

1. Have a filesystem with a lot of files (>10000) like the root of your filesystem. Lets call that /bigfs

2. then do this:
  $ cd /bigfs
  $ git init

3. go into eclipse and create a generic eclipse project with filesystem location /bigfs

4. After clicking finish, egit will autodetect /bigs/.git and freeze.
 

Is that what you see *exactly* ? 

Are you sure that this large filesystem does not have to have something present to fail ? i..e like nested other .git projects or similar that could cause some confusion ?
Comment 21 Snjezana Peco CLA 2015-08-10 10:34:45 EDT
(In reply to Max Rydahl Andersen from comment #20)
> Snjezana, I think you must be missing some step since even I cannot
> reproduce it. 
> 
> As requested by Andrey lets be *concrete* as in list *all* the steps, not
> just "create project in that directory". You also mention wildfly repos and
> symbolic links - but none of these seem to be used/relevant in your steps ?
> 
> Here is what I read from your comments:
> 
> 1. Have a filesystem with a lot of files (>10000) like the root of your
> filesystem. Lets call that /bigfs
> 
> 2. then do this:
>   $ cd /bigfs
>   $ git init
> 
> 3. go into eclipse and create a generic eclipse project with filesystem
> location /bigfs
> 
> 4. After clicking finish, egit will autodetect /bigs/.git and freeze.
>  
> 
> Is that what you see *exactly* ? 
> 

Steps 1, 2 and 3 are correct, but I have never said they will freeze Eclipse. 
See comment #14

>  There is no UI freeze or slowing down in this moment, but Eclipse will become slow later on because this job runs continuously.
> Basically, EGit will significantly slow down Eclipse if there is a git repository(ies) containing a lot of untracked files.

Steps 1-3 cause EGit to autodetect /bigs/.git and create the "Computing Git status for repository X ..." job that lasts for hours. This is reproducible.

Try steps 1-3, import wildfly , build, rebuild wildfly ... It is also important that you use /bigfs ...
You will certainly notice a signifficant slow down.
It can sometimes be the cause of a lock (that isn't a UI freeze, but Eclipse gets unresponsive).
Steps 4 isn't always reproducible. 

> Are you sure that this large filesystem does not have to have something
> present to fail ? i..e like nested other .git projects or similar that could
> cause some confusion ?

Not sure. I have tested using an old backup that has .git directory only in the root. As I can see the issue (slowing down) appears when a git repository contains a lot of untracked files.


EGit has the "Refresh only when workspace is active" property(Window>Preferences>Team>Git). A user can require EGit always to refresh resources or in the case a workspace is active. 
My suggestion is to add a property that would enable us to disable refreshing resources even if a workspace is active.
Comment 22 Eclipse Genie CLA 2015-09-12 18:27:29 EDT
New Gerrit change created: https://git.eclipse.org/r/55805
Comment 23 Eclipse Genie CLA 2015-09-13 19:13:35 EDT
Gerrit change https://git.eclipse.org/r/53105 was merged to [master].
Commit: http://git.eclipse.org/c/egit/egit.git/commit/?id=b2055bf5822dceb006fad020d6fe5ea2042ec3d7
Comment 24 Eclipse Genie CLA 2015-09-13 19:13:41 EDT
Gerrit change https://git.eclipse.org/r/55805 was merged to [master].
Commit: http://git.eclipse.org/c/egit/egit.git/commit/?id=dd2adb23465d983616ffe9ab121f4d78d32f984a
Comment 25 Matthias Sohn CLA 2015-09-13 19:15:41 EDT
submitted
Comment 26 Andrey Loskutov CLA 2015-09-14 00:48:45 EDT
We fixed the part with staging view, but since we cannot observe the original issue we don't have an idea what else we should fix.

Feel free to reopen, but please provide reproducible example with clear steps to reproduce, ideally by attaching simple standalone example or script which can create or download the "big" filesystem. I would love to fix it, really.