[Developers] [Data-modeling] Data load issues
Reilly Hayes
rfh at metaweb.com
Thu May 28 20:03:27 UTC 2009
On May 28, 2009, at 12:43 PM, Tom Morris wrote
>
> So the goal is to keep false negatives under 5% and you believe you've
> achieved 2-3%? Anecdotaly, I'd say I've been seeing a higher rate of
> occurrence than that, but perhaps I've just been working in a "bad"
> part of the database.
The goal is to keep the total number of false positives plus false
negative to less than 1% of entities loaded. There were about 500,000
authors loaded from openlibrary. We have 4,000 that *might* be
duplicates of previously loaded authors. We have another 10,000 to
20,000 that might be duplicates within the source dataset (open
library).
>
> That's kind of a chicken and egg thing since it's going to be hard for
> folks to clean up the authors without the additional information. Any
> chance you guys would consider adding a link back to Open Library so
> people can see an author's works when deciding whether to flag or how
> to vote? It would significantly reduce the amount of labor involved
> for reviewers.
We are not relying on the community for this cleanup. You will not
see these in the merge queue. We did not want to swamp the merge
queue with these.
>>
>
> After I read your note I started seeing untyped topics which asked me
> "Is this a Person?" Is that RABJ or some other new widget? For what
> it's worth, I'd be willing to do more than just say "no." If there
> was an option to provide the actual type or some other diagnostic
> feedback, I'd take it if I wasn't in a rush.
That is from RABJ's ancestor RinRABJ (which stands for RinRABJ is not
RABJ). We've taken self-referential acronyms to a whole new level by
adding time-travel to the equation. The RinRABJ service was built on
top of the RABJ prototype.
-r
>
>
> Tom
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20090528/5e43227c/attachment.htm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2434 bytes
Desc: not available
Url : http://lists.freebase.com/pipermail/developers/attachments/20090528/5e43227c/attachment.bin
More information about the Developers
mailing list