[Data-modeling] Modeling uncertainty
Tom Morris
tfmorris at gmail.com
Tue Feb 17 18:55:32 UTC 2009
I didn't realize that Jeff wasn't available. That probably makes this
a sub-optimal time to have the conversation.
It sounds like no one has done any work modeling this and that dates
are the most interesting than locations, so let's continue with them.
Here's the first cut that I put together:
date_range_start - date - beginning date of the range of uncertainty
which is known to include the actual date. The value -MAXDATE
indicates an open ended range where the begin date is not known
date_range_end - date - ending date of the range. The special value
MAXDATE indicates an open ended range where the end date is not known.
circa - boolean - date is approximate. Primarily for human
consumption as opposed to calculations
Estimated dates, mentioned by Kirrily, are different from dates which
are approximate or contained in an interval of uncertainty, so an
'estimated' boolean flag could be a useful addition. Ditto for
calculated dates which can be useful in certain historical situations
(e.g. a birth date calculated from "Yeoman Smith died September 12
1764 ae 47 y 6 m 3 d").
To truly be useful, I don't think this can be a CVT, but rather needs
to be a primitive type which supports the relational operators that
MQL needs. From an internal implementation point of view it may be
useful to have an additional flag which indicates that start==end (ie
range semantics aren't being used).
On Tue, Feb 17, 2009 at 1:53 AM, Iain Sproat <iainsproat at gmail.com> wrote:
> I have been entering data for early medieval chiefs and this is a minefield
> of uncertainty (there's a reason it's called the dark ages!). Most dates
> are given as circa, and even where dates are given, these will conflict with
> other information given in another reference (the stories of medieval
> chroniclers tend not to be too reliable and disagree with one another).
> UI-wise it would be great to be able to put in a date of c 0510, or 0614 -
> 0620, and it automatically be noted as an uncertain date with a date range.
For serious historical work, you probably want to record the date in
the form that it's given (freeform text) along with your
interpretation in machine readable form. This allows other
researchers to double check your interpretation, calendar conversions,
etc (e.g. is the eighth month October, as its name implies, or
August).
> On a similar matter, I'd like to see a way in the UI to provide references
> for all links so that someone can query the source material which provided
> the data. e.g. to attribute a date given to Origo Gentis Langobardorum or
> Historia gentis Langobardorum . Is this the purpose of the 'attribution'
> property in the link type?
Having attribute information a little bit more readily available would
be nice, but I think it'll just confuse most people if it's too
visible. To be useful people would have to fill it in accurately.
Did they check the original manuscript of _Origo Gentis Langobardorum_
or a transcription (who's?) or a translation into English (again,
who's?), or did they just take Wikipedia's word that that's what the
text says (in which case they're citing Wikipedia, not the original
text). If you look at the Mass Data Operation type (e.g. instance
http://www.freebase.com/view/en/television_infoboxes_29_nov_2006_example),
you can see that there are some fairly powerful ways of capturing
multiple pieces of information for an attribution, but, as far as I
know, there's no way to set this up currently in the client (ie say
"I'm now transcribing information from Vol. I pg 3" or even "this
structured data that I'm adding is just my transcription of what's in
the Wikipedia blurb sitting in front of me").
Although I'd love to see this level of power available, I question how
useful it would be for most people given that they can't distinguish
between, for example, Chumby, the product, and Chumby Industries, the
company.
Tom
More information about the Data-modeling
mailing list