[Data-modeling] software versions/releases in the computer domain?

Dan Milbrath dmilbrath at metaweb.com
Tue Apr 29 23:57:05 UTC 2008


Tom, et al...
lots to respond to here. I've had some conversations with some folks about this offline and I think Robert summarizes one approach quite nicely:
"Denormalization is the only approach we have at the moment -- that is, use two properties -- one for the current and another for the historical, which, incidentally, includes the current.  At some point we could have meta-structures that say these two properties have some semantic equivalence, but for now we just have two properties perhaps distinguished through a naming convention.  I would suggest trying it on your rosters and pose it to the data modeling group as a near-term solution.  Perhaps use the prefix "Current" and "Historical" as prefixes on the property name pairs. "

To this end, I've modeled current and historical rosters on basketball team as separate properties. See: http://freebase.com/tools/schema/basketball/basketball_team and populated examples here: http://www.freebase.com/view/en/rasheed_wallace and http://www.freebase.com/view/en/detroit_pistons

I completely understand the different perspectives here, but in order to model this 'correctly', one quickly becomes paralyzed. 

Some other approaches I considered:
1. as discussed, we could combine current and historical roster but at this time the Freebase UI doesn't have any special logic to display the current roster, which is what most folks want to see when looking at the team page. Per Robert's suggestion, I modeled them separately for now with the idea that the data and model cold be revisited later.
2. I considered using 'employment history' from the 'employer' type instead for historical roster, but this seemed a little abstract. It could work but might be odd when the user is confronted with 'title' for 'point guard' and the like... and other properties that aren't a perfect fit here.
3. I also considered a more generalized 'historical roster position' type in the sports domain that all sports team types could use, but this would be limiting if we decided we wanted to add historical positions in that we'd want constrained by sport.
4. Another option is to pull historical roster in as an incoming link from the player statistics type. This was actually my longer term strategy. This type will contain players historical statistics for each season (for instance, Rasheed Wallace blocked 129 shots for Detroit in the 2007/2008 season). Again, however, this would likely require some special sauce in the Freebase UI to display in a meaningful way on the team topic page.

Beyond this, I know I've left out other details that might be interesting (previous jersey numbers and position played). This same pattern needs to be settled on for coaches as well. The point is that the level of fidelity of information is kind of at the discretion of the schema modeler, who has to make her/his best guess at what will be useful to consumers of the information. 

On a related note, I believe there will ultimately be support in Freebase for the concept of 'present' in a date field -- which should be more dynamic than simply inputing the date/time at the moment the user enters the data -- but should represent the more abstract notion of a moving point in time... Rasheed Wallace is, for instance, on the Detroit Pistons right now and 2 days from now -- basically until such time as he isn't. Others on the list may be able to comment on when such support will exist in MQL, but this is clearly a future feature that will help address this scenario as well.




----- Original Message -----
From: "Tom Morris" <tfmorris at gmail.com>
To: "Freebase data modeling mailing list" <data-modeling at freebase.com>
Sent: Tuesday, April 29, 2008 2:09:47 PM (GMT-0800) America/Los_Angeles
Subject: Re: [Data-modeling] software versions/releases in the computer domain?

On Tue, Apr 29, 2008 at 3:10 PM, Ed Laurent <spatial.db at gmail.com> wrote:

> Brendan, I think you've bumped into another instance of Freebase focusing on
> "current" information while ignoring everything that came before and not
> documenting the currentness (it's a word, look it up!) of the data. While
> this certainly keeps the schema simple and approachable to a wide audience,
> I'm not sold on it being the best default approach to modeling over the long
> term.

I'm almost certain it's not.  Any queries asking for the "current"
value should actually be asking for "current as of ___."  Similarly,
having something called "current value" or similar in schemas is just
tempting people to be lazy and not specify the dates over which the
value is/was valid.  Even if they just pick "today" for the date valid
and the system fills in today's date, we'll be ahead of the game as
compared to having it unspecified and having to guess at some point
the future what "current" meant when the data was entered.

Tom
_______________________________________________
Data-modeling mailing list
Data-modeling at freebase.com
http://lists.freebase.com/mailman/listinfo/data-modeling



More information about the Data-modeling mailing list