[Data-modeling] None as a topic
Scott Meyer
sm at metaweb.com
Fri Oct 31 22:33:57 UTC 2008
Jameson O'Guinn wrote:
> There has been a pretty intense debate over the status of the "none"
> topic in the past week. I would like to see what the data modeling
> community proposes as the best answer. The discussion can be found here:
>
> http://www.freebase.com/discuss/threads/guid/9202a8c04000641f8000000005c7ab84
>
> So far the solutions discussed have been: 1) keep the topic "none" as
> is; 2) delete the topic "none"; 3) make "none" a reserved word or a
> non-linked property (like an integer).
>
> Let me know if this isn't the place for this discussion.
This is a fine place for this discussion. We've been debating the
question internally as well. Here's a rough summary:
The "right" thing:
The theoretical issue at stake is the completeness. If a property is "complete"
you may assume that absence of a property implies non-existence of whatever
that property represents. For example if the property child is complete,
then, then having no instances of the child property implies having no
children, two child properties means exactly two children and not three
or more.
A practical implementation of this is just another property which indicates
that a given property is complete on a given topic. So "no children" is
indicated by no child properties and a "child property is complete" property.
No child property without a "child property is complete" property indicates
an unknown number of children.
This model allows one to make the distinction between complete and incomplete
information (of which None is just a specific case) in a systematic, well-formed
way.
OK, now the important question: Is completeness a useful feature?
This is much less clear. But the answer that we're tending towards is "No."
If completeness comes "with a type", then users will use the type itself
as the criteria for assuming completeness. Annotating properties
would just be a denormalization. For example, if we get voter registration
data from the government and load it correctly (hard!) then users of that data
could assume that, if a person doesn't have a voter registration, they may not
legally vote.
There are also pathologies: if everyone marks every property as "complete" this
whole exercise is a gigantic waste of storage. Completeness has "odd" interactions
with the type system. Barring excruciating debate on physical versus noumenal
aspects of personhood, a person's height can never be None (no height property,
but height is marked as complete). Time is a general complication: No living
person should have complete child properties. Etc.
Unless someone can come up with a really persuasive example of a distinction
between complete and incomplete data that is
1. independent of type
and
2. useful
We are probably going to forswear any claim to apodictic certainty and
muddle along with incomplete information. The topic None will be converted
into an educational facility.
-Scott
More information about the Data-modeling
mailing list