RE: [higgins-dev] attributes vs relationships (was Higgins data model)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

RE: [higgins-dev] attributes vs relationships (was Higgins data model)

From: "Duane Buss" <DBuss@xxxxxxxxxx>
Date: Thu, 30 Mar 2006 14:35:14 -0700
Delivered-to: higgins-dev@xxxxxxxxxxx
List-archive: <http://eclipse.org/pipermail/higgins-dev>
List-help: <mailto:higgins-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/higgins-dev>, <mailto:higgins-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/higgins-dev>, <mailto:higgins-dev-request@eclipse.org?subject=unsubscribe>

Reply contained only in the top because all the easily readable colors have been taken.

My assumptions jumping into the middle of this are as follows, if any of these are incorrect please feel free to flame me.

The Higgins framework plans on consuming identity information from a wide variety of existing 'identity' data stores, which stores already have a data model which may or may not be open to change. In an ideal world we all stores would eventually upgrade to Higgins new and improved model, but reality is we will have to work with legacy systems composed of feral identities.
A digital identity may be composed of facets from multiple data stores.
Management of the identity facets may take place through means other than the Higgins framework.
Higgins references may be within an identity store or cross identity stores.
We are discussing the data model not the interfaces or class diagrams.
The Higgins data model is intended to facilitate creation of digital subjects, without limiting the type of feral data stores from which attributes are retrieved.

Attributes vs Relationships

Many of the existing identity data stores from my first assumption support references within the data store, often as an attributes. In Jim's examples below objects refer to each other via attributes which contain an object identifier. In an SQL data base any field (aka attribute) may be used to join tables, resulting in object references via an attribute value. Am I correct in assuming that the Context Provider would be required to represent those attributes as relationships rather than as attributes?

Depending on the store these attributes which are references links may not allow for concepts like properties on the link. If an application built on top of the Higgins framework were to attempt to add properties to the link I can see at least two implementation options. First the Context provider could deny the operation, second the Context Provider could have a data store for additional relationship properties. (Please don't rathole on the referential nightmare.)

This brings up the issue of where Higgins data is actually stored. And while a glib 'where ever the context provider wants to store it' might allow us to proceed to other data model issues, it is a significant implementation detail. In the example above the link existed as a attribute, what if no link existed and it was being created by application. The context provider could:

Store the link as an attribute within the identity store, and live with any limitations that brings
Store the link as a new object or reference to an entry in a table within the identity store. Since the identity store may be maintained using management tools which know nothing about Higgins notion of relationships (assumption #3) , these extra bits of data might be orphaned, tampered with or otherwise mismanaged by an unaware user/application.
Store the link information in some separate data store. This last option is the most flexible, but involves overhead maintaining the additional store, joining results, and dealing with referential integrity.

This problem is compounded when instead of linking facets within a single identity store we are composing a digital subject from a multiple identity stores. Do both identity stores get links? Or do we store link information somewhere else?

As a procedural side note I would like to see the end results of some of these discussions summed up on the wiki (with references to the mailing list archives).

Duane

>>>

From:	"Paul Trevithick" <paul@xxxxxxxxxxxxxxxxx>
To:	<higgins-dev@xxxxxxxxxxx>
Date:	3/29/2006 2:52:47 pm
Subject:	RE: [higgins-dev] attributes vs relationships (was Higgins data model)

Replies in green.

-----Original Message-----
From: higgins-dev-bounces@xxxxxxxxxxx [mailto:higgins-dev-bounces@xxxxxxxxxxx] On Behalf Of Jim Sermersheim
Sent: Wednesday, March 29, 2006 12:51 PM
To: higgins-dev@xxxxxxxxxxx
Subject: RE: [higgins-dev] attributes vs relationships (was Higgins data model)

Replies in red

>>> On Wednesday, March 29, 2006 at 9:23:23 am, in message <01e601c6534d$19bb4b90$9601a8c0@VGCRB30>, "Paul Trevithick" <paul@xxxxxxxxxxxxxxxxx> wrote:

Hi Jim,

You present a scenario where Case 2 wins. But I think that was an unusual scenario. I think that the attribute/relationship distinction is clear and obvious most of the time. Do you disagree?

I'm not sure. I do know that in the world of directories, all relationships but one (hierarchy) are modeled via attributes and that even the built in Hierarchy mechanism only adds confusion (could have/should have been done with attributes). That's not to say I want the model to behave just like a directory, just anecdotal evidence.

FWIW, I wasn't trying hard to contrive that example, it's more that I have the feeling that this kind of thing will come up over and over where people will start out using attributes for something, and later find they need to switch over to using relationships in order to minimize data duplication.

Actually, thinking about it more, I think the example is pretty common. At livejournal.com (and I imagine other blog sites as well), one can list their interests. If an interest is shared by another user, or if there's a community for that interest, then a relationship (link) is formed. If not, the interest is only simple text. How is it stored in the back-end? I dunno for sure, but I suspect not as two quite different sets of data.

I need to think about this some more.

On a somewhat related matter.

Your use of the word type (as in type = "string") is interpreted by me to mean "the type of this attribute's value". Yet I would have expected that the value of an attribute was an object whose type was discoverable through reflection. In other words I would have expected you to write your examples like this:

{name = "interest", "B-Movies"}

where "B-Movies" was a String object. Is this an implementation-related issue, is this just common practice in directory work. Or am I just missing something entirely?

Not being sure how attributes are going to be typed in the higgins model, I only did that as a means of clarification.

I see.

With directories, each named attribute has a separate schema definition which dictates its form. One has to use the attribute identifier to go look up the schema definition to discover the type/form (or just have a-priori knowledge of that attribute's type/form).

Seems reasonable.

If higgins proposes to use reflection, we better make sure that the target programming languages support it (do we have a list of target programming languages yet?).

I should have said "discoverable through reflection or lookup of some kind". I was just trying to understand if you thought that there really would be a "type = "String"" property or not. You answered my question. There is no need for this in the model.

As for a list of target programming languages. We don't have one but it's probably true that assuming reflection is available is a bad idea.

(Further, I'm hoping that "interest" is really a URI like "http://foo/bar/baz/interest")

Yeah, that's the problem with writing quick examples, I let side details slide. I think everyone would agree that Attribute identifiers need to be unique (and a URI is one good way to do that).

-Paul

So, I'm not sure where we are with this. If we stick with Case 1, we can decide to ignore it, or we can state that all values of an attribute must be of the same type, and if that type has a need to (always or sometimes) link to another facet, it really should be a relationship.

Here's what I see as the rub in Case 1. As long as there's a way to "point at" another facet (using it's identifier for example), then there's nothing to stop anyone from coming up with an attribute (complex or simple) which has as a field, a facet pointer. Once that practice is established, it will be confusing to know when to do that versus using relationship objects.

Yes, as I said above, I need to mull this all over a bit.

What prevents this kind of confusion in the graph-world?

Well let me pick one "graph-world". In the RDF world you don't have this confusion cuz everything is, in a sense, a relationship. E.g. {Tom isInterestedIn B-Movies}. "isInterestedIn" is the property (predicate), "Tom" is the subject, and "B-Movies" is the object. If two people are interested in B-Movies, the B-Movies object (technically a "resource") can be shared. And since "isInterestedIn" can act as a subject, you can even attach a Property to it: {isInterestedIn degreeOfInterest "obsessive"}.

Jim

-----Original Message-----
From: higgins-dev-bounces@xxxxxxxxxxx [mailto:higgins-dev-bounces@xxxxxxxxxxx] On Behalf Of Jim Sermersheim
Sent: Wednesday, March 29, 2006 2:32 AM
To: higgins-dev@xxxxxxxxxxx
Subject: Re: [higgins-dev] attributes vs relationships (was Higgins data model)

After reading this, I dislike the word "like" as used. replace it with "interest" and it reads better (purely aesthetic).

>>> On Tuesday, March 28, 2006 at 6:46:53 pm, in message <4429849D.D091.001C.0@xxxxxxxxxx>, "Jim Sermersheim" <jimse@xxxxxxxxxx> wrote:

One example that I have a hard time making fit into what I previously called "Case 1" is where an attribute is sometimes a link to another facet and other times not.

Say I want to represent my likes. I see this as an attribute. For example, I could list:

{name = "like", type = "string", stringVal = "B-Movies"}

{name = "like", type = "string", stringVal = "Skiing"}

{name = "like", type = "radioStation", callLetters = "KRCL", band = "FM", frequency = "90.9", preferredDJs = {name = "The Old Man", name = "Robert Nelson"}}

But hold on, I happen to note that there already exists another facet in my context which represents KRCL (the radio station). Rather than typing all that garbage into the attribute on my facet, I'd prefer to link to it. Of course, *only* linking to it causes me to lose my "preferredDJs" list. So now I want to associate a property with the link. Both Case 1 and Case 2 allow for this. The difference as I see it is that Case 1 now causes my list of likes to be spread across my attributes and relationships. The modified Case 2 follows:

Jim {

Attributes {

{name = "like", type = "string", stringVal = "B-Movies"}

{name = "like", type = "string", stringVal = "Skiing"}

{name = "like", type = "radioStationRelationship", relatedTo = "xyz://myContext/KRCL", preferredDJs = {name = "The Old Man", name = "Robert Nelson"}}

...

}

Case 1 looks something like:

Jim {

Attributes {

{name = "like", type = "string", stringVal = "B-Movies"}

{name = "like", type = "string", stringVal = "Skiing"}

...

}

Relationships {

{type = "like", from = "xyz://myContext/Jim", to = "xyz://MyContext/KRCL", toType = "radioStation", preferredDJs = {name = "The Old Man", name = "Robert Nelson"}}

...

}

The interrogator of my likes in Case 2 enumerates the "like" attribute types, discovers their types, and processes. In processing a "relationship" type, it must dereference the target facet and add the appropriate properties.

The interrogator of my likes in Case 1 enumerates the "like" attribute types, discovers their types, and processes. Then enumerates the "like" relationship types, and for each, dereference the target facet, discovers its type, processes data from that facet, and adds the target facet's properties to those on the link.

Note that I added toType to the relationship. This was to avoid having to dereference the target facet in order to know what properties to expect on the relationship object. Similarly, I used type = "radioStationRelationship" in Case 2. Both cases can be simplified (type = "relationship" in Case 2, and remove the toType in Case 1), but that causes the interrogator to dereference the target and read it's type to know what to expect in terms of further properties.

If the group prefers Case 1 over Case 2, how can we make this example less awkward? I don't really like going the other possible direction to fix it (make facets for B-Movies and Skiing, and any other potential "like" out there).

Jim

Follow-Ups:
- RE: [higgins-dev] attributes vs relationships (was Higginsdata model)
  - From: Paul Trevithick

References:
- RE: [higgins-dev] attributes vs relationships (was Higgins data model)
  - From: Jim Sermersheim
- RE: [higgins-dev] attributes vs relationships (was Higgins data model)
  - From: Paul Trevithick

Prev by Date: [higgins-dev] Weekly Higgins-dev development conference call at 5:00 EST today
Next by Date: [higgins-dev] M4 Architecture
Previous by thread: RE: [higgins-dev] attributes vs relationships (was Higgins data model)
Next by thread: RE: [higgins-dev] attributes vs relationships (was Higginsdata model)
Index(es):
- Date
- Thread

Breadcrumbs