[Data-modeling] "Public person" and privacy
Michael Scott
michael_scott at mac.com
Thu Jun 5 09:55:57 UTC 2008
there was a deathly silence on this - i wonder why
privacy cuts right to the heart of what Freebase exposes itself to -
the unlocked front door
i've lurked on these lists for some months now in an attempt to get a
measure of Freebase's prospects - how big its community is - to what
extent it will succeed in what it aims to be
from a corporate perspective i can see how the underlying product
could be a very attractive proposition - a malleable database that
would facilitate the unification of information across an enterprise -
no need to lock the front door if there is always someone at home -
there are natural constraints of good behaviour and supervision
associated with wanting to keep one's job - it's fairly obvious what
the policy should be and there's usually sufficient resources to
guarantee that it's more than just words
but as a public database the question of what Freebase contains
depends precisely on what resources are available to back up any
constraints - the "clear community standards" - because if these are
more or less just words then what does Freebase contain
from this perspective we could say that "public" is just another way
of saying "more open to abuse"
so let's say that Freebase becomes the world's number one public
database - terabytes of potentially dirty data - how does the world
handle that
you can see how Wikipedia devolves responsibility to eyeballs -
someone sees something is wrong and flags it or fixes it
but with Freebase the eyeballs are more than likely going to be
downstream somewhere the other end of an application that is
extracting data from Freebase - at that point it is already too late -
the "it's a wiki fix it" mantra doesn't apply - the information has
been served up in a different context - and most likely in a context
that would prefer not to serve dirty data - like water from a tap in a
restaurant
so the question is how from a software perspective does Freebase
address the potential dirtiness of its data
Polya has a nice little formula for calculating the probability of
there still being mistakes in a document after more than one person
has proofread it - perhaps something similar could be used to get
users to estimate the dirtiness of the data in Freebase
http://mathworld.wolfram.com/ProofreadingMistakes.html
also - this lists all the Wikipedia edits by Metaweb
http://wikiscanner.virgil.gr/f.php?ip1=64.81.62.32-63
you can see where Robert Cook fixes some abuse about evil and killing
kittens - is there already a way to do this on Freebase - is there a
way to measure the degree to which any piece of data has been abused -
are there plans for it
More information about the Data-modeling
mailing list