[Data-modeling] "Public person" and privacy

Michael Scott michael_scott at mac.com
Thu Jun 5 09:55:57 UTC 2008


there was a deathly silence on this - i wonder why

privacy cuts right to the heart of what Freebase exposes itself to -   
the unlocked front door

i've lurked on these lists for some months now in an attempt to get a  
measure of Freebase's prospects - how big its community is - to what  
extent it will succeed in what it aims to be

from a corporate perspective i can see how the underlying product  
could be a very attractive proposition - a malleable database that  
would facilitate the unification of information across an enterprise -  
no need to lock the front door if there is always someone at home -  
there are natural constraints of good behaviour and supervision  
associated with wanting to keep one's job - it's fairly obvious what  
the policy should be and there's usually sufficient resources to  
guarantee that it's more than just words

but as a public database the question of what Freebase contains  
depends precisely on what resources are available to back up any  
constraints - the "clear community standards" - because if these are  
more or less just words then what does Freebase contain

from this perspective we could say that "public" is just another way  
of saying "more open to abuse"

so let's say that Freebase becomes the world's number one public  
database - terabytes of potentially dirty data - how does the world  
handle that

you can see how Wikipedia devolves responsibility to eyeballs -  
someone sees something is wrong and flags it or fixes it

but with Freebase the eyeballs are more than likely going to be  
downstream somewhere the other end of an application that is  
extracting data from Freebase - at that point it is already too late -  
the "it's a wiki fix it" mantra doesn't apply - the information has  
been served up in a different context - and most likely in a context  
that would prefer not to serve dirty data - like water from a tap in a  
restaurant

so the question is how from a software perspective does Freebase  
address the potential dirtiness of its data

Polya has a nice little formula for calculating the probability of  
there still being mistakes in a document after more than one person  
has proofread it - perhaps something similar could be used to get  
users to estimate the dirtiness of the data in Freebase

	http://mathworld.wolfram.com/ProofreadingMistakes.html

also - this lists all the Wikipedia edits by Metaweb

	http://wikiscanner.virgil.gr/f.php?ip1=64.81.62.32-63

you can see where Robert Cook fixes some abuse about evil and killing  
kittens - is there already a way to do this on Freebase - is there a  
way to measure the degree to which any piece of data has been abused -  
are there plans for it




More information about the Data-modeling mailing list