[Freebase-discuss] Identical ISBNs?

Brian Karlak zenkat at metaweb.com
Wed Apr 7 21:56:38 UTC 2010


On Apr 7, 2010, at 2:21 PM, Vinuth Madinur wrote:

> If there are duplicate entries pointing to an ISBN node, then maybe  
> an application can use "publication_date" and time stamps to figure  
> out a strategy to merge their content and present to their users,  
> rather than showing duplicate entries.

Yes, definitely.

Part of the trick is identifying places where this occurs.   
Technically, you should be able to use MQL:

[{
   "key": {"namespace":"/soft/isbn", value:null},
   "!/book/book_edition/isbn": {"return":"count"},
   "sort":"-!/book/book_edition/isbn.count"
}]

Unfortunately, MQL queries like this are inefficient and usually time  
out on the production server.  However, we can run them batchwise on  
our Hadoop cluster every night, and those ISBNs with multiple entries  
could be put on a RABJ queue:

	http://wiki.freebase.com/wiki/RABJ

Applications could then pull suspect ISBNs off of the queue and  
display them to users for resolution.  It sounds like Thad was  
interested in helping out with a queue like this ... any other takers?

Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/freebase-discuss/attachments/20100407/6e929b4a/attachment.htm 


More information about the Freebase-discuss mailing list