[Freebase-discuss] Identical ISBNs?
Brian Karlak
zenkat at metaweb.com
Wed Apr 7 21:56:38 UTC 2010
On Apr 7, 2010, at 2:21 PM, Vinuth Madinur wrote:
> If there are duplicate entries pointing to an ISBN node, then maybe
> an application can use "publication_date" and time stamps to figure
> out a strategy to merge their content and present to their users,
> rather than showing duplicate entries.
Yes, definitely.
Part of the trick is identifying places where this occurs.
Technically, you should be able to use MQL:
[{
"key": {"namespace":"/soft/isbn", value:null},
"!/book/book_edition/isbn": {"return":"count"},
"sort":"-!/book/book_edition/isbn.count"
}]
Unfortunately, MQL queries like this are inefficient and usually time
out on the production server. However, we can run them batchwise on
our Hadoop cluster every night, and those ISBNs with multiple entries
could be put on a RABJ queue:
http://wiki.freebase.com/wiki/RABJ
Applications could then pull suspect ISBNs off of the queue and
display them to users for resolution. It sounds like Thad was
interested in helping out with a queue like this ... any other takers?
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/freebase-discuss/attachments/20100407/6e929b4a/attachment.htm
More information about the Freebase-discuss
mailing list