[Developers] dealing with duplicates
Chris
odp at freenet.de
Tue Jun 5 21:17:28 UTC 2007
Apologies for posting on a developers list as a completely non-technical
person, but the convergence problem is simply too tempting to keep my
mouth shut ;-)
Alec Flett wrote:
"This is a great question - convergence problems are always going to
exist in freebase and it would be great to flush out some specific
patterns to deal with them. After all, it takes just one broken program
or person creating a second topic called "The Beatles" - even if some
process clear up duplicates in 6 hours, that's 6 hours of "confused"
data. I'd love to hear some suggestions on how to address this. "
I´d try to combine various external authoritative resources, and the
authority of the human contributors.
For many topics, Freebase already has a link to the corresponding
Wikipedia article: it´s currently used to display the Wikipedia
description for a topic if there´s no user-defined description yet. If
you´d not only use these links for finding the Wiki descriptions, but
store them as an extra "Corresponding Wikipedia article" default
property and encourage people to actively maintain these links, that
would give an external authority for a whole lot of topics... especially
if it is forbidden, or at least made a bit more difficult, to enter a
specific article more than once into this "Corresponding Wikipedia
article" field in the complete Freebase. If someone tries to enter a
link to a Wikipedia article that has already been used elsewhere,
display a note "already used for topic x, please make sure there´s no
duplication". This would help people to find duplicates or overlaps, and
create a bit healthy pressure to discuss problem cases too.
The IMDB entries which are already used in the Music domain could
probably be used like this, too.
Other candidates are links to Musicmoz/Chefmoz/ODP categories
(affiliation disclosure: I am editing at dmoz.org), or the Yahoo
directory. Or expert resources that are used only for specific domains.
A developer who wants to use Freebase content combined with an external
authority could choose any of these external resources as his main
authority. E.g. for searches on films IMDB might be the best fit. Or
depending of what you use the data for, you could assign higher value to
a topic that has matches at several of these selected resources.
Regards,
Chris (chris2 at Freebase)
More information about the Developers
mailing list