[Data-modeling] Upcoming Schema Changes
Jeff Thompson
jeff at thefirst.org
Wed Feb 13 01:40:51 UTC 2008
I agree that "denormalization" (separate properties for the current value and the historical series) is
attractive simply because we can do it with the existing code. But is that the only attraction?
The question I'm pushing is: If we have a time-series of "places lived" for Robert Cook, why has
no one yet called for a separate property for "current living place" for Robert Cook? Is that because
they are happy to manually view the time series on the Freebase page, or manually cook up (no pun intended)
a property-specific query to get the current value from the time series? I'm trying to shift the burden
of proof on the question...
Christopher R. Maden wrote:
> Jeff Thompson <jeff at thefirst.org> wrote:
>> Concerning option #2, can MQL answer the following queries right now:
>> * Who is the spouse of Nicole Kidman? (i.e., the latest non-couch-jumping spouse)
>> http://www.freebase.com/view/en/nicole_kidman
>> * Where did Barack Obama get his degree? (i.e., the latest degree)
>> http://www.freebase.com/view/en/barack_obama
>> * Where does Robert Cook live? (i.e., now)
>> http://www.freebase.com/view/guid/9202a8c04000641f80000000008427e9
>
> Kind of. One can structure a query for all of Nicole Kidman’s domestic relationships which have a start date but no end date. Similarly, one can ask for the first of Obama’s degrees, reverse-sorted by date.
>
> However, this does not handle incomplete knowledge; if, for instance, we know that Obama received these three degrees, but only a date for one of them. And the client doesn’t currently know how to distinguish time-series properties from others. It would be possible, as Robert suggests, to add something like the property hints that currently let the client know about disambiguating properties; these would suggest that a property is expected to be time-valued. That would, in turn, require standardizing on the way of representing the dates themselves...
>
> In short, this is a feature that we want, and have been batting around for some time, but we need some help defining the use cases so we can best meet your needs. Is the denormalization that Robert proposes prohibitively problematic? Is a certain amount of data contortion acceptable, to fit a standardized representation of time-series data? Are there other approaches we haven’t considered?
>
> ~Chris
More information about the Data-modeling
mailing list