Bug 303756 - Divorce Submission System from Foundation DB
Summary: Divorce Submission System from Foundation DB
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Dash Submission System (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P4 normal (vote)
Target Milestone: ---   Edit
Assignee: Denis Roy CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 324277
Blocks:
  Show dependency tree
 
Reported: 2010-02-24 09:15 EST by Sharon Corbett CLA
Modified: 2013-02-25 16:01 EST (History)
6 users (show)

See Also:


Attachments
Early Patch (82.23 KB, patch)
2011-02-02 16:59 EST, Denis Roy CLA
no flags Details | Diff
Early Patch v2 (107.93 KB, patch)
2011-02-03 15:10 EST, Denis Roy CLA
no flags Details | Diff
Screenshot (61.45 KB, image/jpeg)
2011-02-03 15:12 EST, Denis Roy CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sharon Corbett CLA 2010-02-24 09:15:25 EST
I'd like to understand if the Submission System is still inputting entries into the Foundation DB.

The records the Submission System has created in the past is causing the DB to have duplicate records for those folks that become committers at a later date as all new committers must complete the online committer questionnaire which populates a DB record for them.  

This further complicates the issue of committer provisioning as the Portal doesn't always understand which record is to be used for the committer account.  When this happens, IP Team must override date in the Portal with information to get the correct record updated.  Webmaster than has the chore to determine which record is the correct record for the committer account. We also believe this is further causing some bugs to appear with respect to committer records in the DB.  As the Foundation DB is really the "bible" for committer records; there shouldn't be duplicate records for folks.

If the Submission System is no longer functioning, then perhaps Webmaster can run a script again on the duplicates and delete accordingly.

If the Submission System is still functioning or if it will in the future, perhaps we can figure out a way to avoid having duplicate records in the system.

Thanks,
Sharon
Comment 1 Wayne Beaton CLA 2010-02-24 11:00:33 EST
Gabe, can you help out here?
Comment 2 Gabe O'Brien CLA 2010-02-24 13:56:07 EST
The Submissions System does enter peoples name and email addresses into the Foundation DB.  It has been doing this from the very begging of it existence.  The idea was that we would have more people in the Foundation DB.

The issue with duplicates boils down to people having multiple emails address (and Bugzilla accounts).  When the Submission System looks up an email address and it can't find it, it make a new record in the Foundation DB.

There was some code started to allow committers to link all their email address together.  I am not sure what the state of that code, but I can try and find out what happen to it.

I can't think an automagical way we can handle the duplicated in the Foundation DB.  It would be possible to create a page where the duplicates could be found and removed by a human however.

It should also be possible to extend the Submission System to handle the duplicates better or to not insert new records into the Foundation DB at all.
Comment 3 Sharon Corbett CLA 2010-02-25 09:29:36 EST
Thanks for the information Gabe.  

I've included Webmaster on this bug as they may have some suggestions and/or know about the previous linking code you are referring to as well.

Even if this issue wasn't causing problems with committer provisioning, I still think it incorrect to have a system hold duplicate records for one person or one entity.  It just sets up for problems.
For example; 

Submission System creates record for droy

Questionnaire creates record for committer droy1

Portal becomes confused about which record is which.  Manual intervention required by IP Team and Webmaster in order to provision committer...

Webmaster:  Do you have anything you care to add here.

Regards,
Sharon
Comment 4 Sharon Corbett CLA 2010-02-25 09:33:29 EST
Infrazilla bug 532 also caused by submission system.  Thanks Denis for fixing, I was really concerned about that one (https://foundation.eclipse.org/infrazilla/show_bug.cgi?id=532).

Sharon
Comment 5 Denis Roy CLA 2010-02-25 16:35:37 EST
> Submission System creates record for droy
> 
> Questionnaire creates record for committer droy1


I think that a userid collision should be our first red flag.  Just adding a '1' at the end and ending it there is not good.  For instances like this, could a simple email be sent out to indicate a collision?  This would allow us to intervene manually and delete the dupe, update email addresses, etc.
Comment 6 Sharon Corbett CLA 2010-03-01 12:14:52 EST
Check out the saved SQL query in the DB entitled  "People created by Submissions System".  You'll soon notice that even the Submission System is creating duplicates - some even created on the same day....
Comment 7 Gabe O'Brien CLA 2010-03-01 12:50:24 EST
I looked at the foundation DB web interface and couldn't see any saved queries.
Comment 8 Denis Roy CLA 2010-03-01 13:00:40 EST
(In reply to comment #6)
> Check out the saved SQL query in the DB entitled  "People created by
> Submissions System".  You'll soon notice that even the Submission System is
> creating duplicates - some even created on the same day....

Sharon, saved queries are your own ... Can you just paste the SQL here?
Comment 9 Sharon Corbett CLA 2010-03-01 13:03:33 EST
my bad...I guess only I can see it.

Here is the query....

select P.PersonID, P.FName, P.LName, P.EMail from People AS P Where P.Comments LIKE "%Submissions System%"

Thanks,
Sharon
Comment 10 Gabe O'Brien CLA 2010-03-01 13:11:28 EST
Just as a note non of these people seemed to have the same email address.  I added 'order by email' to the query to make finding duplicate email addresses easier.

select P.PersonID, P.FName, P.LName, P.EMail from People AS P Where P.Comments
LIKE "%Submissions System%" order by EMail
Comment 11 Gabe O'Brien CLA 2010-03-01 13:38:51 EST
To help prune this list down I can cross reference this list with the submission system database.  I can look for any of these people who don't have any talks proposed for any of the past conferences (and econ2010).  These people could then be removed from the foundation DB (provided that don't have any other reason to be in the DB).
Comment 12 Sharon Corbett CLA 2010-03-04 10:31:12 EST
As many parts of the organization depend on the Database, the scope of this bug is being expanded to include membership's requirement as well.  

A meeting/call will take place later in March (following EclipseCon) to discuss the DB for moving forward with a review to resolving issues being experienced.

Thanks,
Sharon
Comment 13 Sharon Corbett CLA 2010-07-19 16:22:13 EDT
Denis and I have discussed - this issue is currently being set on the back burner for now....
Comment 14 Denis Roy CLA 2010-07-19 16:24:26 EDT
Moving to the Submission System.
Comment 15 Denis Roy CLA 2010-08-20 17:04:34 EDT
For reference, here is where work needs to be done:

http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.dash/submissions/classes/user.class.php?root=Technology_Project
Comment 16 Denis Roy CLA 2010-09-02 10:21:30 EDT
If we change the submission system as I propose in bug 303756 comment 1, this problem will go away.

I see no real reason to 'pollute' the Foundation DB with entries for People who simply want to create submissions, and we have already been centralizing on the Bugzilla login anyway.
Comment 17 Denis Roy CLA 2010-09-02 10:22:24 EDT
That reference should be bug 324277 comment 1.
Comment 18 Donald Smith CLA 2010-09-02 10:40:07 EDT
+1 -- I never liked the idea of putting EclipseCon Data into Foundation DB, and was told my view of what the FoundationDB should be for was too "narrow"...
Comment 19 Denis Roy CLA 2010-12-21 15:12:40 EST
In bug 324277 we're saying we should use the Bugzilla ID (an integer) as the primary means of identifying people instead of their email address (which changes in BZ but is hard to propagate) and instead of crafting up a user id (which is error prone).

To do that, we must divorce the Conference Submission System from our internal data.  This means the conference will refer to Bugzilla directly for user email addresses and names.
Comment 20 Denis Roy CLA 2011-01-31 15:34:28 EST
One thing to consider is the company affiliation of a submitter, which is currently fetched from live data in the Foundation's internal database.

https://www.eclipsecon.org/submissions/2011/view_talk.php?id=2393

We could still TRY and make the connection between the Bugzilla email address and some record in the Foundation DB, but that will certainly break for non-committers.

Or we can capture this in the bio pages and record it as a point-in-time value (ie, it would be stored with the user's data for that conference).

Does that even remotely make sense?
Comment 21 Denis Roy CLA 2011-02-01 16:28:43 EST
I am working on this now... (and bug 324277), so if we can keep changes to the Sub System to a minimum, that will make my life easier.
Comment 22 Denis Roy CLA 2011-02-01 21:27:15 EST
Progress is coming along nicely.  I can log in and create a submission without ever touching the Foundation DB.

The big work will be fixing all the references to PersonID ("droy") so that the integer BugzillaID is used instead.  The patch for this bug will be just massive.
Comment 23 Denis Roy CLA 2011-02-02 16:59:51 EST
Created attachment 188198 [details]
Early Patch

Made good progress today.  Just about all the "user" functionality is refactored, but it all needs tons of testing.  Tomorrow I'll focus on the conf queen/chair/pc functionality.

Here is a patch of what is done so far, in case anyone with a local copy wants to try it out.
Comment 24 Denis Roy CLA 2011-02-03 15:10:13 EST
Created attachment 188271 [details]
Early Patch v2

Didn't do as much progress as I would have liked today, but I started on the chair/pc stuff.  I'll continue today, but I have some questions for Anne in the next post.

In the meanwhile, here is an updated patch of where I am at.

So far, these SQL statements also need to be executed to bring the schema up to date:

use conferences;
drop table people; create table people (BugzillaUserID mediumint unsigned not null primary key, Picture blob null, PictureMime varchar(32) null);

alter table speakers add BugzillaUserID mediumint unsigned after PersonID;
alter table submission_authors add BugzillaUserID mediumint unsigned after PersonID;
alter table submission_comments add BugzillaUserID mediumint unsigned after PersonID;
alter table people_roles add BugzillaUserID mediumint unsigned after PersonID;
alter table watching_submissions add BugzillaUserID mediumint unsigned after PersonID;
alter table watching_everything add BugzillaUserID mediumint unsigned after PersonID;
alter table watching_categories add BugzillaUserID mediumint unsigned after PersonID;
alter table watching_types add BugzillaUserID mediumint unsigned after PersonID;
alter table watching_tags add BugzillaUserID mediumint unsigned after PersonID;
alter table watching_new_submissions add BugzillaUserID mediumint unsigned after PersonID;



# Getting Bugzilla ID from email address
select distinct s.PersonID, P.EMail, B.userid from speakers as s LEFT JOIN myfoundation_demo.People AS P ON P.PersonID = s.PersonID LEFT JOIN myfoundation_demo.profiles AS B on B.login_name = P.EMail;


# Updating PersonID tables to BugzillaUserID
update speakers as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update submission_authors as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update submission_comments as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update people_roles as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update watching_submissions as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update watching_everything as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update watching_categories as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update watching_types as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update watching_tags as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
update watching_new_submissions as s, People_to_Bugzilla as b set s.BugzillaUserID = b.userid where s.PersonID = b.PersonID;
Comment 25 Denis Roy CLA 2011-02-03 15:12:02 EST
Created attachment 188272 [details]
Screenshot

Anne:

Are there any options in that list that you know are no longer used?

The Scheduling Tool contains references to 2009, and it doesn't work on my dev environment.  Since the other buttons seem to rely on the schedule, I hesitate to continue.

Can you fill me in on how the schedule is done?  Point me to a doc?

Thanks
Comment 26 Denis Roy CLA 2011-03-28 14:10:31 EDT
I'll be revisiting this next week once the excitement of EclipseCon 2011 has begun to die down.
Comment 27 Denis Roy CLA 2011-04-07 13:26:31 EDT
The final patch has been committed.  I've also fixed some SQL injection vulnerabilities, as well as information leakage (user.php).

On the surface it all looks good, and I've tested plenty, but there will be issues.  Please file separate requests and I'll fix them as quickly as I can.