[Developers] [Data-modeling] Data load issues
Tom Morris
tfmorris at gmail.com
Thu May 28 19:43:29 UTC 2009
Thanks for the update Reilly.
On Wed, May 27, 2009 at 5:34 PM, Reilly Hayes <rfh at metaweb.com> wrote:
> Reconciliation is hard, and reconciling people is particularly hard.
Yup, I know that because I've done a tiny bit of it myself. I'd like
to see it not get any harder. :-)
> There is bound to be error.
Understood.
> The effort to correct a false positive on
> reconciliation is about 20 times the effort to correct a false negative.
...
> There were a few thousand misses on reconciliation of hundreds of thousands
> of authors.
So the goal is to keep false negatives under 5% and you believe you've
achieved 2-3%? Anecdotaly, I'd say I've been seeing a higher rate of
occurrence than that, but perhaps I've just been working in a "bad"
part of the database.
> For this reason, please tread gently on the merge queue.
I'll keep this in mind, but you might want to think how you get that
message out to the folks who actually do the flagging & voting since I
bet a lot of them don't follow either of these lists.
> Your point on poorly disambiguated topics is correct, given the data
> currently visible in Freebase. As it turns out, Book and Edition data *is*
> part of the load We will be loading that after the authors are cleaned up.
That's kind of a chicken and egg thing since it's going to be hard for
folks to clean up the authors without the additional information. Any
chance you guys would consider adding a link back to Open Library so
people can see an author's works when deciding whether to flag or how
to vote? It would significantly reduce the amount of labor involved
for reviewers.
> If you would like to help, we're going to have some related tasks pop up in
> our human judgement capture system (RABJ) in the near future. I'll post the
> URL when they are ready.
After I read your note I started seeing untyped topics which asked me
"Is this a Person?" Is that RABJ or some other new widget? For what
it's worth, I'd be willing to do more than just say "no." If there
was an option to provide the actual type or some other diagnostic
feedback, I'd take it if I wasn't in a rush.
Tom
More information about the Developers
mailing list