Bug 496885 - Github IP validation incorrectly fails on merge commit
Summary: Github IP validation incorrectly fails on merge commit
Status: CLOSED MOVED
Alias: None
Product: Community
Classification: Eclipse Foundation
Component: GitHub (show other bugs)
Version: unspecified   Edit
Hardware: PC Mac OS X
: P2 critical (vote)
Target Milestone: ---   Edit
Assignee: Eclipse Web CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 545305 547153 (view as bug list)
Depends on: 540694
Blocks: 552628
  Show dependency tree
 
Reported: 2016-06-27 19:06 EDT by Jeen Broekstra CLA
Modified: 2021-12-23 06:42 EST (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeen Broekstra CLA 2016-06-27 19:06:17 EDT
A Github pull request for the RDF4J project currently has a failed IP validation check that I think is incorrect.

The PR in question: https://github.com/eclipse/rdf4j/pull/198

The PR contains a commit by a non-member, but he does have signed CLA and the initial merge of his contribution to the source branch went without problems (see https://github.com/eclipse/rdf4j/pull/190). 

The error message is as follows:

 "The following users do not have valid CLA. Please see this page for details, 
  or to sign a CLA.

    noreply@github.com " 

and is lodged against the merge-commit that happend when I accepted the non-member's PR (see https://github.com/eclipse/rdf4j/pull/198/commits ). 

Now this is odd as the actual author of that merge-commit according to Github (as well as my own git client) is me, not 'noreply@github.com'.
Comment 1 Jeen Broekstra CLA 2016-06-30 00:18:48 EDT
Sorry to push, but can someone take a quick look? Development on these branches continues, and I'm worried that if it turns out we _did_ make a mistake in signing off after all, we will have a lot of trouble correcting the problem after the fact.
Comment 2 Jeen Broekstra CLA 2016-06-30 21:55:04 EDT
I have noticed that the specific merge commits on which this is an issue have me as the author, but noreply@github.com as the committer. for example:

   Commit: b09c8a7788363a8a9fb9253cce28e248c71a9f73 [b09c8a7]
  Parents: 66fa34b146, ff1d1d5709
   Author: Jeen Broekstra <jeen.broekstra@gmail.com>
     Date: 28 June 2016 at 08:27:09 NZST
Committer: GitHub <noreply@github.com>

I have no idea why GitHub suddenly sets this value for the committer. To be clear: this merge commit was done via the GitHub web site, by me merging a Pull Request. I looked at older merges, in all older cases I could find the Committer field was simply not set. 

Could it be that GitHub changed its behavior recently, and that this is now tripping up the validation script?
Comment 3 Jeen Broekstra CLA 2016-07-04 21:14:37 EDT
I sent a support request to github.com about the issue and got this as a reply:

"Sorry for the trouble.

This is something that we have recently added for web based actioned.

If a commit is merged using GitHub.com the committer is shown as GitHub.

There isn't currently a way to disable this but I can certainly pass your feedback onto the team to consider. "

So it seems that this is a new behavior in Github, and the IP validation script  needs to be updated to take this into account.
Comment 4 Jeen Broekstra CLA 2016-07-05 01:14:04 EDT
Upping to critical severity - it's currently holding up our next milestone build. 

As an aside: I _think_ I can merge these PRs regardless of the validation script. Given that this is now clearly a technical problem in the script and not an actual IP ownership problem, I will proceed to do that soon, unless I hear something to the contrary.
Comment 5 Eclipse Webmaster CLA 2016-07-05 11:58:47 EDT
Adding Wayne and Denis.

-M.
Comment 6 Wayne Beaton CLA 2016-07-05 12:08:07 EDT
I looks like our validation checker is getting tripped up by a change in behaviour in the GitHub API.

AFAICT, the two contributors in the pull request have valid CLAs and the comments are all correctly signed off. If you agree, just ignore the warning and merge the pull request.

Let's leave this bug open to ensure that the issue is addressed.

I recommend that we just ignore the "Commit" field and focus exclusively on the "Author" field when checking for the CLA. This is what we do with the Gerrit plug-in.
Comment 7 Jeen Broekstra CLA 2016-07-20 16:30:05 EDT
Problem persists and is getting more pressing. It has now also failed on a PR by a new contributor, with the same message ('noreply@github.com' not recognized). 

Because it fails at that point, I now can't easily verify that the contributor has signed the CLA.
Comment 8 Wayne Beaton CLA 2016-07-20 23:29:02 EDT
(In reply to Jeen Broekstra from comment #7)
> Because it fails at that point, I now can't easily verify that the
> author has signed the CLA.

The workaround while we sort this out is to check for the CLA yourself. There's a "validate" link on the CLA page [1]. The GitHub hook won't stop you from pushing if you believe that everything is in order. 

[1] https://www.eclipse.org/legal/CLA.php
Comment 9 Jeen Broekstra CLA 2016-11-03 20:49:04 EDT
I'm not sure if something changed at Github's end or if the validator was tweaked, but I haven't seen this problem happen in a while now. If you folks are happy that this has been solved, this can be closed AFAIC.
Comment 10 Jeen Broekstra CLA 2016-11-16 19:47:22 EST
Noticed that problem still occassionally occurs, after all - seems to happen specifically when a user edits something directly through the GitHub web interface.
Comment 11 Jonah Graham CLA 2017-03-24 06:11:35 EDT
This is affecting the Eclipse January project too. We set up GitHub so the PRs could not be merged in without passing ip-validation, but now we are faced with PRs that are essentially passing validation according to this bug report (see Comment 6) but we can't merge them.

For now we have asked contributors to manually rebase and force-push their PRs in this situation to remove the offending merge commit.
Comment 12 Christopher Guindon CLA 2019-02-22 11:55:14 EST
We are currently working a new version of the IP validation hook but it dosen't this problem:
Bug 540694 - Github IP validation needs to be more robust

The following is an excerpt of the Github webhook payload of a merge commit:

    "commit": {
      "author": {
        "name": "Anonymous user",
        "email": "test@test.org",
        "date": "2019-02-22T15:25:45Z"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "date": "2019-02-22T15:25:45Z"
      },
      "message": "Merge branch 'master' into test/master/more_menu",
     }


The IP validation hook will fail if a contributor (author) does not include the sign-off footer for each commit. In this instance, "message" does not include the sign-off-by footer: 

"Merge branch 'master' into test/master/more_menu"

Committers on a project won't have this problem since the sign-off footer is not required for them.

As far as I can't tell, the Github Webhook payload does not indicate that a commit is a merge commit. The only way that I could potentially solve this is by making some assumptions with the "message" value but that won't be 100% accurate.

Here's an example of a merge commit:
https://api.github.com/repos/EclipseFdn/hugo-solstice-theme/git/commits/18fa727298e05c3bd210bb5b5bc486e9e49e733e
Comment 13 Jeen Broekstra CLA 2019-03-12 10:02:33 EDT
*** Bug 545305 has been marked as a duplicate of this bug. ***
Comment 14 Jeen Broekstra CLA 2019-03-12 10:04:58 EDT
    Actually, as a committer, I am also running into this (or a very similar) issue.

    See for example this PR https://github.com/eclipse/rdf4j/pull/1341

    The ECA validation fails on the last commit in that PR (which is a merge commit caused by me merging another pull request, earlier). The details give the following notification:

        Anonymous (jeenbroekstra@****.noreply) is not covered by the necessary legal agreements in order to proceed.


    Output from `git log` for that merge commit is this:

        commit de2f9de7d328897977aaaf6d2ce5d735d58cadf8 (HEAD -> master,  origin/master, origin/HEAD)
        Merge: 836f8ba0 b564b240
        Author: Jeen Broekstra <jeenbroekstra@users.noreply.github.com>
        Date:   Wed Mar 13 00:44:43 2019 +110
Comment 15 Jeen Broekstra CLA 2019-03-12 10:21:32 EDT
> As far as I can't tell, the Github Webhook payload does not indicate that a commit is a merge commit. 

I'm no expert but isn't the fact that a commit has two parent entries instead of one an indication that it is a merge commit?
Comment 16 Christopher Guindon CLA 2019-03-12 11:27:49 EDT
(In reply to Jeen Broekstra from comment #15)
> > As far as I can't tell, the Github Webhook payload does not indicate that a commit is a merge commit. 
> 
> I'm no expert but isn't the fact that a commit has two parent entries
> instead of one an indication that it is a merge commit?

It seems that your solution could work. Here's what I am working with:
https://api.github.com/repos/eclipse/rdf4j/pulls/1341/commits

I do not consider myself an expert, I am not sure if two parent entries always === a merge commit.

I could also lookup users via their github id:

"email": "jeenbroekstra@users.noreply.github.com",

If such email is used, I could verify if jeenbroekstra is associated with an Eclipse Account and validate if we have all the legal documents to continue.

This would solve this problem for committers but not contributors since we require the sign-off footer for every commit made by a contributor.
Comment 17 Christopher Guindon CLA 2019-03-18 09:48:56 EDT
There is two issues issue here: 

1. Github will use an email address which is unknown to us:
github_id@users.noreply.github.com

The first part of the @users.noreply.github.com email address seems to be the Github username. We might be able to track the user if the Github ID is registered to an account on accounts.eclipse.org. 

2. The sign-off-by footer is not included for contributors.


@Wayne, do you think we should ignore merge commits?
Comment 18 Christopher Guindon CLA 2019-03-29 13:38:46 EDT
(In reply to Christopher Guindon from comment #17)
> There is two issues issue here: 
> 
> 1. Github will use an email address which is unknown to us:
> github_id@users.noreply.github.com
> 
> The first part of the @users.noreply.github.com email address seems to be
> the Github username. We might be able to track the user if the Github ID is
> registered to an account on accounts.eclipse.org. 
> 
> 2. The sign-off-by footer is not included for contributors.
> 
> 
> @Wayne, do you think we should ignore merge commits?

I spoke to Wayne about this and we decided the following:

1. We will try to match github_id@users.noreply.github.com email address with the Github id we have on file.

2. We will stop validating merge commits

With this said, the validation will fail if a commit is done by a contributor without a sign-off-by footer in the commit message since it's a requirement.
Comment 19 Christopher Guindon CLA 2019-05-13 09:21:34 EDT
*** Bug 547153 has been marked as a duplicate of this bug. ***
Comment 20 Christopher Guindon CLA 2019-08-02 08:02:31 EDT
I am upping the priority of this bug. This issue is occurring more often. I am aiming to fix this issue by the end of this quarter.
Comment 21 Martin Monperrus CLA 2019-09-26 05:34:58 EDT
This is important for us https://github.com/eclipse/repairnator/ where we have a process 100% on Gituhb.
Comment 22 Jan Supol CLA 2019-09-27 11:38:01 EDT
We would appreciate this to have working on our https://github.com/eclipse-ee4j/jersey project, too, as it would allow us to use GitHub tools for modifying PR, solving merge conflicts, etc.
Comment 23 Christopher Guindon CLA 2019-09-30 15:53:04 EDT
(In reply to Jan Supol from comment #22)
> We would appreciate this to have working on our
> https://github.com/eclipse-ee4j/jersey project, too, as it would allow us to
> use GitHub tools for modifying PR, solving merge conflicts, etc.

We have  patch under code review for this. 

The code is making 2 significant changes in an effort to address this issue:

1. We will try to match an Eclipse Account from a @users.noreply.github.com email address.

2. We will stop validating commits with 2 parents. 

From what I understand, a commit with 2 parent commits is the result of merging one branch to another in the way of "true merge".

If this is incorrect, please let us know.
Comment 24 Christopher Guindon CLA 2019-09-30 17:37:51 EDT
(In reply to Christopher Guindon from comment #23)
> (In reply to Jan Supol from comment #22)
> > We would appreciate this to have working on our
> > https://github.com/eclipse-ee4j/jersey project, too, as it would allow us to
> > use GitHub tools for modifying PR, solving merge conflicts, etc.
> 
> We have  patch under code review for this. 
> 
> The code is making 2 significant changes in an effort to address this issue:
> 
> 1. We will try to match an Eclipse Account from a @users.noreply.github.com
> email address.
> 
> 2. We will stop validating commits with 2 parents. 
> 
> From what I understand, a commit with 2 parent commits is the result of
> merging one branch to another in the way of "true merge".
> 
> If this is incorrect, please let us know.

This is now live!

For example, the following validation is now passing:
https://accounts.eclipse.org/legal/eca/validation/14613

This pr was done with a github.noreply email and we were able link that email address with an Eclipse account with a valid ECA on file.

Regarding, merge commits:

https://accounts.eclipse.org/legal/eca/validation/6421

This PR has a merge commit and the validator is no longer requiring the author to include a sign-off-by.

The validation is still failing here because the author of the first commit is not a valid Eclipse account:

https://api.github.com/repos/eclipse/omr/pulls/4001/commits

I am closing this bug but please re-open if you are experiencing issues with this update to our Github ECA app!
Comment 25 Jonah Graham CLA 2019-11-01 16:13:06 EDT
The "Tips" on a failed ECA validation says this is still in progress:

 Tips
Eclipse Foundation does not currently support changes made with GitHub no-reply e-mail addresses. Support for these e-mail addresses is currently in progress. More information can be found in bug 496885.
Additional information on contributing to Eclipse projects can be found on the Eclipse wiki under Development Resources/Handling Git Contributions, or under Development Resources/Contributing via Git for new committers.

e.g. https://accounts.eclipse.org/legal/eca/validation/18130
Comment 26 Christopher Guindon CLA 2019-11-01 16:17:22 EDT
(In reply to Jonah Graham from comment #25)
> The "Tips" on a failed ECA validation says this is still in progress:
> 
>  Tips
> Eclipse Foundation does not currently support changes made with GitHub
> no-reply e-mail addresses. Support for these e-mail addresses is currently
> in progress. More information can be found in bug 496885.
> Additional information on contributing to Eclipse projects can be found on
> the Eclipse wiki under Development Resources/Handling Git Contributions, or
> under Development Resources/Contributing via Git for new committers.
> 
> e.g. https://accounts.eclipse.org/legal/eca/validation/18130

Thanks for reporting this, we will need to remove that error message. This validation seems to have failed because we were unable to link the github id to an Eclipse Account.
Comment 27 Erik Boasson CLA 2020-03-06 04:48:27 EST
Re-opening based on the final sentence of comment #24:

> I am closing this bug but please re-open if you are experiencing issues with this update to our Github ECA app!

I have a PR at https://github.com/eclipse-cyclonedds/cyclonedds/pull/424 that fails the IP validation check.  The failure is caused by either or both of the two commits that have a GitHub user address as author

  Dan Rose (rotu@****.noreply)

but that are signed off with his actual email address.  I know the user name and address do represent the same individual, and the ECA validation succeeds for his address (as evidenced by other commits in the same PR).

Those two commits originate in two PRs:

  https://github.com/eclipse-cyclonedds/cyclonedds/pull/408
  https://github.com/eclipse-cyclonedds/cyclonedds/pull/413

both of which passed the ECA validation (and were merged by me), and so, in short, it looks like the "GitHub noreply author rewrite" problem is rearing its head again.

While there is no great urgency in merging this particular PR, I can't delay it indefinitely and so I suspect I may have to go ahead and merge it while stating explicitly that all is well based on the historical record.  But perhaps it is a triviality and it can be solved quickly.
Comment 28 Eclipse Webmaster CLA 2020-03-06 08:15:06 EST
Updating assignee.

-M.
Comment 29 Christopher Guindon CLA 2020-03-06 09:28:05 EST
Erik,

In an effort to make sure I understand the issue:

Dan made a commit using his xxxxxx@users.noreply.github.com email but used xxxxxx@digilabs.io in his sign-off msg?

https://api.github.com/repos/eclipse-cyclonedds/cyclonedds/pulls/424/commits
Comment 30 Erik Boasson CLA 2020-03-06 09:35:59 EST
Hi Christopher,

Thanks for looking into it so quickly!

I double-checked the two original PRs and the commits in those consistently use the digilabs.io address.  I checked to be sure, but no doubt they would have been flagged by the ECA validation otherwise.

They must have been rewritten by GitHub as a consequence of pressing the rebase/squash/merge button.
Comment 31 Christopher Guindon CLA 2020-03-06 10:55:56 EST
(In reply to Erik Boasson from comment #30)
> Hi Christopher,
> 
> Thanks for looking into it so quickly!
> 
> I double-checked the two original PRs and the commits in those consistently
> use the digilabs.io address.  I checked to be sure, but no doubt they would
> have been flagged by the ECA validation otherwise.
> 
> They must have been rewritten by GitHub as a consequence of pressing the
> rebase/squash/merge button.

Thanks for the info! 

If we deem that this is a problem, we should allow contributors to use their xxxxxx@users.noreply.github.com email as the commit_athor_email but allow them use their eclipse email address in the sign-off-by footer.

If I am not mistaken, the ECA validation assume that the email in the sign-off-by footer will match the commit_athor_email value.

Wayne, do you have any concerns with this change from an IP perspective?
Comment 32 Erik Boasson CLA 2020-03-17 05:23:23 EDT
Hi Christopher and Wayne,

> If we deem that this is a problem, we should allow contributors to use their
>  xxxxxx@users.noreply.github.com email as the commit_athor_email but allow
> them use their eclipse email address in the sign-off-by footer.

It is a bit of a problem if merging only PRs that pass the ECA validation leads to having a branch that then fails ECA validation if one wants to merge it (as is the case here). One doesn't know in advance, and one can't reasonably back out ...

> If I am not mistaken, the ECA validation assume that the email in the sign-off-by
> footer will match the commit_athor_email value.
>
> Wayne, do you have any concerns with this change from an IP perspective?

An alternative would be to accept an ECA validation failure caused by a commit originating in commit(s) that originally did pass the ECA validation.

(It does make me wonder what happens to historical commits if someone withdraws an ECA? -- only a vaguely related question, of course.)

In any case, I've been sitting on this merge request to update a long-running feature branch for 10 days or so and it is now a much more urgent issue.
Comment 33 Wayne Beaton CLA 2020-03-17 12:23:25 EDT
> Wayne, do you have any concerns with this change from an IP perspective?

As long as we can track a commit back to the author, we should be good-to-go from an IP perspective.

I do not like the prospect of GitHub changing author credentials. Can anybody confirm that this behaviour is actually observed?

I'm also not at all fussed about merge commits that contain no actual intellectual property (if that helps).
Comment 34 Erik Boasson CLA 2020-03-18 07:21:38 EDT
Thanks Wayne, that helps a lot.
Comment 35 Jeen Broekstra CLA 2020-10-07 18:31:18 EDT
Hi Wayne,

Thanks for your input on this. I can confirm that this is definitely the case. 

I have a recent example of a third party contributor on our repo. They put up a pull request, and all their commits were properly signed off, and the ECA check script accepted it. The pull request in question is here: https://github.com/eclipse/rdf4j/pull/2571. 

That pull request was then merged into our master branch using 'squash and merge'. When Github does this, it creates a new commit on the master branch, but sometimes (if the user has configured their Github account to keep their e-mail address private) this sets the Author field to a 'noreply' address. 

In this case, the git log for the squash-and-merge commit looks like this:

commit 219ed1d21626f7b8e2e903cdd945826e2baaccdf
Author: AB <6105462+knoan@users.noreply.github.com>
Date:   Wed Oct 7 02:13:48 2020 +0200

    Specify Triple.equals()/.hashCode() implementation (#2571)
    
    * GH-2564 Specify Triple.equals()/.hashCode() implementation
    
    Signed-off-by: Alessandro Bollini <22@metreeca.com>
    
    * GH-2564 Turn warning on Triple into an @implNote
    
    Signed-off-by: Alessandro Bollini <22@metreeca.com>


Note the address in the Author field. This has (I suspect) happened because the author has "keep email address private" enabled in their personal github profile. However, this is not something I, as a maintainer, can verify in advance before I click the merge button. 

I have also found some background discussion on a change where apparently github uses a user's _default_ author details on the author field in a squash-merge, rather than whatever author details they have configured for the specific repository. See https://github.com/isaacs/github/issues/1368 for an extended discsusion.
Comment 36 Frederic Gurr CLA 2021-12-23 06:42:37 EST
This issue has been migrated to https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/255.