[Data-modeling] Modeling uncertainty

Scott Meyer sm at metaweb.com
Wed Feb 18 22:13:26 UTC 2009


spencer kelly wrote:
> hey, on a slightly different note, what about modelling 'known
> unknowns'? for example the cause of the 1914 toronto fire was never
> known...

See:

http://lists.freebase.com/pipermail/data-modeling/2008-October/001159.html

Basically, missing data in Freebase is "unknown"

The None topic is a bit of a quandry.  On the one hand, None is badly broken
(fx. because many people seem to share the same spouse).  Real queries
actually return nonsensical results because of None.  On the other hand,
people seem to like to use it.

We can do "the right thing" detailed in the attached email, but it is
doubtful that it would be used correctly, and even if it were used
correctly, the results don't seem to be all that useful.  As it stands,
if I were looking for "fires of unknown cause, I'd look for missing causes,
as well as causes which had been edited more than once.  Adding
completeness to the mix just make the query more complex because
you have to look for changes in completeness as well.  You're
still stuck with the fact that everything is ultimately "unknown"
and you would have to invest time proportional to the value of
the conclusion you're attempting to draw validating the data
upon which it is based.

-Scott


More information about the Data-modeling mailing list