Duane
wrote:
My assumptions jumping into the middle of this are as
follows, if any of these are incorrect please feel free to flame me.
- The Higgins framework plans on consuming identity
information from a wide variety of existing 'identity' data stores, which
stores already have a data model which may or may not be open to
change. In an ideal world we all stores would eventually
upgrade to Higgins new and improved model, but reality is we will have to
work with legacy systems composed of feral identities.
Existing
stores will use their existing internal models. Most new ones will too. Few will
use the data model “natively”. (At least the data model I have in
mind).
- A digital identity may be composed of facets from
multiple data stores.
yes
- Management of the identity facets may take place
through means other than the Higgins framework.
yes
- Higgins references may be within an identity store or
cross identity stores.
yes
- We are discussing the data model not the interfaces or
class diagrams.
- The Higgins data model is intended to facilitate
creation of digital subjects, without limiting the type of feral data
stores from which attributes are retrieved.
yes
Attributes vs Relationships
Many of the existing identity data stores
from my first assumption support references within the data store,
often as an attributes. In Jim's examples below objects refer
to each other via attributes which contain an object identifier. In
an SQL data base any field (aka attribute) may be used to join tables,
resulting in object references via an attribute
value. Am I correct in assuming that the Context
Provider would be required to represent those attributes as relationships rather
than as attributes?
The
provider will be responsible for presenting them to the CPI using the
appropriate data model semantics. In http://spwiki.editme.com/RevisedDataModelGoalsM4
we’ve added [6] that stipulates that there be only one canonical way to
express a given semantic. There is a lot of data transformation going on here.
The only good news is that the provider has complete control over what schema
it declares that it supports (which might be a lot simpler than what it
actually supports).
Depending on the store these attributes which
are references links may not allow for concepts like properties on the
link. If an application built on top of the Higgins framework were
to attempt to add properties to the link I can see at least two implementation
options. First the Context provider could deny the operation,
second the Context Provider could have a data store for additional
relationship properties. (Please don't rathole on the
referential nightmare.)
The idea is
that each Context Provider declares the schema(s) that it supports. In the data
model proposal that I’m going to make tomorrow (it is always tomorrow it
seems), the bar that defines the minimal level of support is set to pretty much
zero. If a provider supports DigitalSubjects with a single string valued
literal attribute and nothing else. That’s fine.
This brings up the issue of where Higgins data is actually
stored. And while a glib 'where ever the context provider wants to
store it' might allow us to proceed to other data model issues, it is a
significant implementation detail. In the example above the
link existed as a attribute, what if no link existed and it was being created
by application. The context provider could:
- Store the link as an attribute within the identity
store, and live with any limitations that brings
- Store the link as a new object or reference to an
entry in a table within the identity store. Since the identity
store may be maintained using management tools which know nothing about
Higgins notion of relationships (assumption #3) , these extra bits of
data might be orphaned, tampered with or otherwise mismanaged by an
unaware user/application.
- Store the link information in some separate data
store. This last option is the most flexible, but involves
overhead maintaining the additional store, joining results, and dealing
with referential integrity.
This problem is compounded when instead of linking facets
within a single identity store we are composing a digital subject from a
multiple identity stores. Do both identity stores get links? Or do
we store link information somewhere else?
I’m
not sure, but I think that with our new goals, there is now enough control over
all of this afforded to the provider that these issues can be resolved within
it. A provider can, if I follow your example, say that it doesn’t support
a link. And an attempt to add one would be flagged as a schema violation.
As a procedural side note I would like
to see the end results of some of these discussions summed up on the wiki (with
references to the mailing list archives).
Yes we
should do better at this. So far it is all we’ve been able to do is
iterate the wiki pages in response to these threads.
From:
|
"Paul Trevithick" <paul@xxxxxxxxxxxxxxxxx>
|
To:
|
<higgins-dev@xxxxxxxxxxx>
|
Date:
|
3/29/2006 2:52:47 pm
|
Subject:
|
RE: [higgins-dev] attributes vs relationships (was
Higgins data model)
|
Replies in green.
-----Original Message-----
From: higgins-dev-bounces@xxxxxxxxxxx
[mailto:higgins-dev-bounces@xxxxxxxxxxx] On
Behalf Of Jim Sermersheim
Sent: Wednesday, March 29, 2006
12:51 PM
To: higgins-dev@xxxxxxxxxxx
Subject: RE: [higgins-dev]
attributes vs relationships (was Higgins data model)
Replies in red
>>> On Wednesday, March 29, 2006 at 9:23:23 am, in message
<01e601c6534d$19bb4b90$9601a8c0@VGCRB30>, "Paul Trevithick"
<paul@xxxxxxxxxxxxxxxxx> wrote:
Hi Jim,
You present a scenario where Case 2 wins. But I think that was an
unusual scenario. I think that the attribute/relationship distinction is clear
and obvious most of the time. Do you disagree?
I'm not sure. I do know that in the world of directories, all
relationships but one (hierarchy) are modeled via attributes and that even
the built in Hierarchy mechanism only adds confusion (could have/should have
been done with attributes). That's not to say I want the model to behave just
like a directory, just anecdotal evidence.
FWIW, I wasn't trying hard to contrive that example, it's more that
I have the feeling that this kind of thing will come up over and over where
people will start out using attributes for something, and later find they need
to switch over to using relationships in order to minimize data duplication.
Actually, thinking about it more, I think the example is pretty
common. At livejournal.com (and I imagine other blog sites as well), one
can list their interests. If an interest is shared by another user, or if
there's a community for that interest, then a relationship (link) is formed. If
not, the interest is only simple text. How is it stored in the back-end? I
dunno for sure, but I suspect not as two quite different sets of data.
I need to
think about this some more.
On a somewhat related matter.
Your use of the word type (as in type = "string") is
interpreted by me to mean "the type of this attribute's value". Yet I
would have expected that the value of an attribute was an object whose type was
discoverable through reflection. In other words I would have expected you to
write your examples like this:
{name = "interest", "B-Movies"}
where "B-Movies" was a String object. Is this an
implementation-related issue, is this just common practice in directory work.
Or am I just missing something entirely?
Not being sure how attributes are going to be typed in the higgins
model, I only did that as a means of clarification.
I see.
With directories, each named attribute has a separate schema
definition which dictates its form. One has to use the attribute identifier to
go look up the schema definition to discover the type/form (or just have
a-priori knowledge of that attribute's type/form).
Seems reasonable.
If higgins proposes to use reflection, we better make sure that the
target programming languages support it (do we have a list of target
programming languages yet?).
I should have said "discoverable through reflection or lookup of some kind". I was just
trying to understand if you thought that there really would be a "type =
"String"" property or not. You answered my question. There is no
need for this in the model.
As for a list of target programming languages. We don't have one
but it's probably true that assuming reflection is available is a bad idea.
(Further, I'm hoping that "interest" is really a URI like
"http://foo/bar/baz/interest")
Yeah, that's the problem with writing quick examples, I let side
details slide. I think everyone would agree that Attribute identifiers need to
be unique (and a URI is one good way to do that).
-Paul
So, I'm not sure where we are with this. If we stick with Case 1, we
can decide to ignore it, or we can state that all values of an attribute must
be of the same type, and if that type has a need to (always or sometimes) link
to another facet, it really should be a relationship.
Here's what I see as the rub in Case 1. As long as there's a
way to "point at" another facet (using it's identifier
for example), then there's nothing to stop anyone from coming up with an
attribute (complex or simple) which has as a field, a facet pointer. Once that
practice is established, it will be confusing to know when to do that versus
using relationship objects.
Yes, as I said above, I need to mull this all
over a bit.
What prevents this kind of confusion in the graph-world?
Well let me pick one "graph-world". In the RDF world you
don't have this confusion cuz everything is, in a sense, a relationship. E.g.
{Tom isInterestedIn B-Movies}. "isInterestedIn" is the property
(predicate), "Tom" is the subject, and "B-Movies" is the
object. If two people are interested in B-Movies, the B-Movies object
(technically a "resource") can be shared. And since
"isInterestedIn" can act as a subject, you can even attach a Property
to it: {isInterestedIn degreeOfInterest "obsessive"}.
Jim
-----Original Message-----
From:
higgins-dev-bounces@xxxxxxxxxxx [mailto:higgins-dev-bounces@xxxxxxxxxxx] On Behalf Of Jim Sermersheim
Sent: Wednesday, March 29, 2006
2:32 AM
To: higgins-dev@xxxxxxxxxxx
Subject: Re: [higgins-dev]
attributes vs relationships (was Higgins data model)
After reading this, I dislike the
word "like" as used. replace it with "interest" and it
reads better (purely aesthetic).
>>> On Tuesday, March 28, 2006 at 6:46:53 pm, in message
<4429849D.D091.001C.0@xxxxxxxxxx>, "Jim Sermersheim"
<jimse@xxxxxxxxxx> wrote:
One example that I have a hard time
making fit into what I previously called "Case 1" is where an
attribute is sometimes a link to another facet and other times not.
Say I want to represent my likes. I
see this as an attribute. For example, I could list:
{name = "like", type =
"string", stringVal = "B-Movies"}
{name = "like", type =
"string", stringVal = "Skiing"}
{name = "like", type =
"radioStation", callLetters = "KRCL", band =
"FM", frequency = "90.9", preferredDJs = {name = "The
Old Man", name = "Robert Nelson"}}
But hold on, I happen to note that
there already exists another facet in my context which represents KRCL (the
radio station). Rather than typing all that garbage into the attribute on my
facet, I'd prefer to link to it. Of course, *only* linking to it causes me to
lose my "preferredDJs" list. So now I want to associate a property
with the link. Both Case 1 and Case 2 allow for this. The difference as I see
it is that Case 1 now causes my list of likes to be spread across my
attributes and relationships. The modified Case 2 follows:
{name =
"like", type = "string", stringVal = "B-Movies"}
{name =
"like", type = "string", stringVal = "Skiing"}
{name =
"like", type = "radioStationRelationship", relatedTo =
"xyz://myContext/KRCL", preferredDJs = {name = "The Old
Man", name = "Robert Nelson"}}
Case 1 looks something like:
{name =
"like", type = "string", stringVal = "B-Movies"}
{name =
"like", type = "string", stringVal = "Skiing"}
{type =
"like", from = "xyz://myContext/Jim", to =
"xyz://MyContext/KRCL", toType = "radioStation",
preferredDJs = {name = "The Old Man", name = "Robert
Nelson"}}
The interrogator of my likes in
Case 2 enumerates the "like" attribute types, discovers their types,
and processes. In processing a "relationship" type, it must
dereference the target facet and add the appropriate properties.
The interrogator of my likes in
Case 1 enumerates the "like" attribute types, discovers their types,
and processes. Then enumerates the "like" relationship types, and for
each, dereference the target facet, discovers its type, processes data from
that facet, and adds the target facet's properties to those on the link.
Note that I added toType to
the relationship. This was to avoid having to dereference the target facet in
order to know what properties to expect on the relationship object. Similarly,
I used type = "radioStationRelationship" in Case 2. Both cases
can be simplified (type = "relationship" in Case 2, and remove the
toType in Case 1), but that causes the interrogator to dereference the target
and read it's type to know what to expect in terms of further properties.
If the group prefers Case 1 over
Case 2, how can we make this example less awkward? I don't really like going
the other possible direction to fix it (make facets for B-Movies and Skiing,
and any other potential "like" out there).
|