Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] Fix EOL issue using .gitattributes and git config without performance drawback

As requested from Christian, here the comments fort he mailing list.
I hope these comments can help finding a good solution for the eol issue.
Whenever I can help, I will be happy to propose parts of a possible solution.

--

this patch adds basic attribute support as it's name says.
Therefore it looks good, but is no solution for the real issue eol, just a small step towards it.
That's ok.

However there are several questions that arise. 
These are not blocking this commit, but should partly be addressed
in order to have a good base to continue.

1) What is the transaction boundary? Currently a new AttributesNodeProvider MAY be created 
when needed. The effective global and info nodes are then cached/freezed on exactly that version when created on the working tree.
On the other hand a call to FileRepository.getConfig always checks for repo config changes using FileSnapshot.
The transaction therefore uses the life repo config and a potentially dirty attributes/ignore nodes config.
There are two options: real-time config data or freeze time config using a transactional concept (not JTA, just the theory of it).
What sense does it make, that the git config changes while a checkout is done? none.

My recommendation is the same as in the bug description: Use the git commands as the transaction boundary.
Therefore use the good practice pattern of the Callables already in jgit and delegate the current call to a callImpl in subclasses.
This allows for per-command refresh of whatever state of the repository. Even ThreadLocals.
Then the git config and attributes/ignore should only be refreshed before and/or after a git command.
(An exception is of course the instant ignore/attributes update while doing the treewalk)

When doing the realtime data strategy, what happens is that every call to getAttribute for every file in the repo will HAVE TO call
getConfig()....getAutoCRLF() and also getAttributes(path,...) and check on all involved .gitattributes files if they have changed.
This leads to very many filestat and may even though produce corrupt results due a just missed change in the filesystem while processing the working tree in a certain order.

For attribute macro support i can imagine that the AttributesNodeProvider can be the starting point.
But again, what is the concept here? Will every getAttributes calculation for every file in the repo create a new 
AttributesNodeProvider using repo.newAttributesNodeProvider or is there some freeze point?
If the former is the case then every attributes calculation on a path will have to create a new macro resolution context, which in some projects
may be rather expensive. Again why should that macro context change during a git command? (and again excluding the case when the treewalker detects a new .gitattribute/.gitignore on its first traversal).

Last but not least I think this patch should be committed in order to get ahead with the eol issue afterwards.
Do you want me to create a patch on top of Arthurs? using per git command freeze and performance caching?

What can always be of use are my tests EndOfLineRepositoryTest and AttributesHierarchyTest which are RepositoryTestCases and try to test the git logic regarding eol, attributes and macros.


Regards
Ivan

--

-----Ursprüngliche Nachricht-----
Von: Christian Halstrick (Code Review) [mailto:gerrit@xxxxxxxxxxx] 
Gesendet: Freitag, 6. November 2015 09:58
An: Matthias Sohn; Arthur Daussy
Cc: Christian Halstrick; Chris Aniszczyk; Eclipse Genie; Laurent Delaigue; Ivan Motsch
Betreff: Change 35377 in jgit/jgit[master]: Adds the git attributes computation on the treewalk

Christian Halstrick has posted comments on this change.

Change subject: Adds the git attributes computation on the treewalk ......................................................................


Patch Set 18:

(1 comment)

https://git.eclipse.org/r/#/c/35377/18/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/FileRepository.java
File org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/FileRepository.java:

Line 499: 	static class AttributesNodeProviderImpl implements
> this patch adds basic attribute support as it's name says.
Ok, got your points. But since the "transactional" issues are different from the change here I don't want to mix it. I would suggest we discuss on the bug or even better on the jgit mailing list jgit-dev@xxxxxxxxxxx. Could you forward your comments to that list? Same text as here, just another forum to discuss.

Regarding the relevance to this change: have you noticed that this change doesn't have many problems with excessive config reads. In a 50.000 files repo the whole java process (including starting java,reading jar files) needs 64.000 io calls to compute a jgit status. We are mainly doing one stat per working tree file and we are not touching the config per once per working tree file.

Attachment: smime.p7s
Description: S/MIME cryptographic signature


Back to the top