RE: [higgins-dev] attributes vs relationships (was Higginsdata model)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

RE: [higgins-dev] attributes vs relationships (was Higginsdata model)

From: "Paul Trevithick" <paul@xxxxxxxxxxxxxxxxx>
Date: Wed, 19 Apr 2006 17:46:58 -0400
Delivered-to: higgins-dev@xxxxxxxxxxx
Importance: Normal
List-archive: <http://eclipse.org/pipermail/higgins-dev>
List-help: <mailto:higgins-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/higgins-dev>, <mailto:higgins-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/higgins-dev>, <mailto:higgins-dev-request@eclipse.org?subject=unsubscribe>

Duane wrote:

<snip>

My assumptions jumping into the middle of this are as follows, if any of these are incorrect please feel free to flame me.

The Higgins framework plans on consuming identity information from a wide variety of existing 'identity' data stores, which stores already have a data model which may or may not be open to change. In an ideal world we all stores would eventually upgrade to Higgins new and improved model, but reality is we will have to work with legacy systems composed of feral identities.

Existing stores will use their existing internal models. Most new ones will too. Few will use the data model “natively”. (At least the data model I have in mind).

A digital identity may be composed of facets from multiple data stores.

yes

Management of the identity facets may take place through means other than the Higgins framework.

yes

Higgins references may be within an identity store or cross identity stores.

yes

We are discussing the data model not the interfaces or class diagrams.
The Higgins data model is intended to facilitate creation of digital subjects, without limiting the type of feral data stores from which attributes are retrieved.

yes

Attributes vs Relationships

Many of the existing identity data stores from my first assumption support references within the data store, often as an attributes. In Jim's examples below objects refer to each other via attributes which contain an object identifier. In an SQL data base any field (aka attribute) may be used to join tables, resulting in object references via an attribute value. Am I correct in assuming that the Context Provider would be required to represent those attributes as relationships rather than as attributes?

The provider will be responsible for presenting them to the CPI using the appropriate data model semantics. In http://spwiki.editme.com/RevisedDataModelGoalsM4 we’ve added [6] that stipulates that there be only one canonical way to express a given semantic. There is a lot of data transformation going on here. The only good news is that the provider has complete control over what schema it declares that it supports (which might be a lot simpler than what it actually supports).

Depending on the store these attributes which are references links may not allow for concepts like properties on the link. If an application built on top of the Higgins framework were to attempt to add properties to the link I can see at least two implementation options. First the Context provider could deny the operation, second the Context Provider could have a data store for additional relationship properties. (Please don't rathole on the referential nightmare.)

The idea is that each Context Provider declares the schema(s) that it supports. In the data model proposal that I’m going to make tomorrow (it is always tomorrow it seems), the bar that defines the minimal level of support is set to pretty much zero. If a provider supports DigitalSubjects with a single string valued literal attribute and nothing else. That’s fine.

This brings up the issue of where Higgins data is actually stored. And while a glib 'where ever the context provider wants to store it' might allow us to proceed to other data model issues, it is a significant implementation detail. In the example above the link existed as a attribute, what if no link existed and it was being created by application. The context provider could:

Store the link as an attribute within the identity store, and live with any limitations that brings
Store the link as a new object or reference to an entry in a table within the identity store. Since the identity store may be maintained using management tools which know nothing about Higgins notion of relationships (assumption #3) , these extra bits of data might be orphaned, tampered with or otherwise mismanaged by an unaware user/application.
Store the link information in some separate data store. This last option is the most flexible, but involves overhead maintaining the additional store, joining results, and dealing with referential integrity.

This problem is compounded when instead of linking facets within a single identity store we are composing a digital subject from a multiple identity stores. Do both identity stores get links? Or do we store link information somewhere else?

I’m not sure, but I think that with our new goals, there is now enough control over all of this afforded to the provider that these issues can be resolved within it. A provider can, if I follow your example, say that it doesn’t support a link. And an attempt to add one would be flagged as a schema violation.

As a procedural side note I would like to see the end results of some of these discussions summed up on the wiki (with references to the mailing list archives).

Yes we should do better at this. So far it is all we’ve been able to do is iterate the wiki pages in response to these threads.

Duane

>>>

From:	"Paul Trevithick" <paul@xxxxxxxxxxxxxxxxx>
To:	<higgins-dev@xxxxxxxxxxx>
Date:	3/29/2006 2:52:47 pm
Subject:	RE: [higgins-dev] attributes vs relationships (was Higgins data model)

Replies in green.

-----Original Message-----
From: higgins-dev-bounces@xxxxxxxxxxx [mailto:higgins-dev-bounces@xxxxxxxxxxx] On Behalf Of Jim Sermersheim
Sent: Wednesday, March 29, 2006 12:51 PM
To: higgins-dev@xxxxxxxxxxx
Subject: RE: [higgins-dev] attributes vs relationships (was Higgins data model)

Replies in red

>>> On Wednesday, March 29, 2006 at 9:23:23 am, in message <01e601c6534d$19bb4b90$9601a8c0@VGCRB30>, "Paul Trevithick" <paul@xxxxxxxxxxxxxxxxx> wrote:

Hi Jim,

You present a scenario where Case 2 wins. But I think that was an unusual scenario. I think that the attribute/relationship distinction is clear and obvious most of the time. Do you disagree?

I'm not sure. I do know that in the world of directories, all relationships but one (hierarchy) are modeled via attributes and that even the built in Hierarchy mechanism only adds confusion (could have/should have been done with attributes). That's not to say I want the model to behave just like a directory, just anecdotal evidence.

FWIW, I wasn't trying hard to contrive that example, it's more that I have the feeling that this kind of thing will come up over and over where people will start out using attributes for something, and later find they need to switch over to using relationships in order to minimize data duplication.

Actually, thinking about it more, I think the example is pretty common. At livejournal.com (and I imagine other blog sites as well), one can list their interests. If an interest is shared by another user, or if there's a community for that interest, then a relationship (link) is formed. If not, the interest is only simple text. How is it stored in the back-end? I dunno for sure, but I suspect not as two quite different sets of data.

I need to think about this some more.

On a somewhat related matter.

Your use of the word type (as in type = "string") is interpreted by me to mean "the type of this attribute's value". Yet I would have expected that the value of an attribute was an object whose type was discoverable through reflection. In other words I would have expected you to write your examples like this:

{name = "interest", "B-Movies"}

where "B-Movies" was a String object. Is this an implementation-related issue, is this just common practice in directory work. Or am I just missing something entirely?

Not being sure how attributes are going to be typed in the higgins model, I only did that as a means of clarification.

I see.

With directories, each named attribute has a separate schema definition which dictates its form. One has to use the attribute identifier to go look up the schema definition to discover the type/form (or just have a-priori knowledge of that attribute's type/form).

Seems reasonable.

If higgins proposes to use reflection, we better make sure that the target programming languages support it (do we have a list of target programming languages yet?).

I should have said "discoverable through reflection or lookup of some kind". I was just trying to understand if you thought that there really would be a "type = "String"" property or not. You answered my question. There is no need for this in the model.

As for a list of target programming languages. We don't have one but it's probably true that assuming reflection is available is a bad idea.

(Further, I'm hoping that "interest" is really a URI like "http://foo/bar/baz/interest")

Yeah, that's the problem with writing quick examples, I let side details slide. I think everyone would agree that Attribute identifiers need to be unique (and a URI is one good way to do that).

-Paul

So, I'm not sure where we are with this. If we stick with Case 1, we can decide to ignore it, or we can state that all values of an attribute must be of the same type, and if that type has a need to (always or sometimes) link to another facet, it really should be a relationship.

Here's what I see as the rub in Case 1. As long as there's a way to "point at" another facet (using it's identifier for example), then there's nothing to stop anyone from coming up with an attribute (complex or simple) which has as a field, a facet pointer. Once that practice is established, it will be confusing to know when to do that versus using relationship objects.

Yes, as I said above, I need to mull this all over a bit.

What prevents this kind of confusion in the graph-world?

Well let me pick one "graph-world". In the RDF world you don't have this confusion cuz everything is, in a sense, a relationship. E.g. {Tom isInterestedIn B-Movies}. "isInterestedIn" is the property (predicate), "Tom" is the subject, and "B-Movies" is the object. If two people are interested in B-Movies, the B-Movies object (technically a "resource") can be shared. And since "isInterestedIn" can act as a subject, you can even attach a Property to it: {isInterestedIn degreeOfInterest "obsessive"}.

Jim

-----Original Message-----
From: higgins-dev-bounces@xxxxxxxxxxx [mailto:higgins-dev-bounces@xxxxxxxxxxx] On Behalf Of Jim Sermersheim
Sent: Wednesday, March 29, 2006 2:32 AM
To: higgins-dev@xxxxxxxxxxx
Subject: Re: [higgins-dev] attributes vs relationships (was Higgins data model)

After reading this, I dislike the word "like" as used. replace it with "interest" and it reads better (purely aesthetic).

>>> On Tuesday, March 28, 2006 at 6:46:53 pm, in message <4429849D.D091.001C.0@xxxxxxxxxx>, "Jim Sermersheim" <jimse@xxxxxxxxxx> wrote:

One example that I have a hard time making fit into what I previously called "Case 1" is where an attribute is sometimes a link to another facet and other times not.

Say I want to represent my likes. I see this as an attribute. For example, I could list:

{name = "like", type = "string", stringVal = "B-Movies"}

{name = "like", type = "string", stringVal = "Skiing"}

{name = "like", type = "radioStation", callLetters = "KRCL", band = "FM", frequency = "90.9", preferredDJs = {name = "The Old Man", name = "Robert Nelson"}}

But hold on, I happen to note that there already exists another facet in my context which represents KRCL (the radio station). Rather than typing all that garbage into the attribute on my facet, I'd prefer to link to it. Of course, *only* linking to it causes me to lose my "preferredDJs" list. So now I want to associate a property with the link. Both Case 1 and Case 2 allow for this. The difference as I see it is that Case 1 now causes my list of likes to be spread across my attributes and relationships. The modified Case 2 follows:

Jim {

Attributes {

{name = "like", type = "string", stringVal = "B-Movies"}

{name = "like", type = "string", stringVal = "Skiing"}

{name = "like", type = "radioStationRelationship", relatedTo = "xyz://myContext/KRCL", preferredDJs = {name = "The Old Man", name = "Robert Nelson"}}

...

}

Case 1 looks something like:

Jim {

Attributes {

{name = "like", type = "string", stringVal = "B-Movies"}

{name = "like", type = "string", stringVal = "Skiing"}

...

}

Relationships {

{type = "like", from = "xyz://myContext/Jim", to = "xyz://MyContext/KRCL", toType = "radioStation", preferredDJs = {name = "The Old Man", name = "Robert Nelson"}}

...

}

The interrogator of my likes in Case 2 enumerates the "like" attribute types, discovers their types, and processes. In processing a "relationship" type, it must dereference the target facet and add the appropriate properties.

The interrogator of my likes in Case 1 enumerates the "like" attribute types, discovers their types, and processes. Then enumerates the "like" relationship types, and for each, dereference the target facet, discovers its type, processes data from that facet, and adds the target facet's properties to those on the link.

Note that I added toType to the relationship. This was to avoid having to dereference the target facet in order to know what properties to expect on the relationship object. Similarly, I used type = "radioStationRelationship" in Case 2. Both cases can be simplified (type = "relationship" in Case 2, and remove the toType in Case 1), but that causes the interrogator to dereference the target and read it's type to know what to expect in terms of further properties.

If the group prefers Case 1 over Case 2, how can we make this example less awkward? I don't really like going the other possible direction to fix it (make facets for B-Movies and Skiing, and any other potential "like" out there).

Jim

References:
- RE: [higgins-dev] attributes vs relationships (was Higgins data model)
  - From: Duane Buss

Prev by Date: RE: [higgins-dev] JAAS Scenario
Next by Date: RE: [higgins-dev] JAAS Scenario
Previous by thread: RE: [higgins-dev] attributes vs relationships (was Higgins data model)
Next by thread: RE: [higgins-dev] Higgins data model
Index(es):
- Date
- Thread

Breadcrumbs