[Developers] how to make best use of Freebase suggest for reconciling collections of names

Raymond Yee raymond.yee at gmail.com
Wed Jun 24 17:47:58 UTC 2009


[My apologies if you get duplicates of this email -- I sent this out 
already under another email address but didn't see it come through....]

Hi everyone,

I'm finding it a challenge to efficiently match to Freebase items 
entities that are identified by no more than a single string (such as 
the names of US government agencies -- e.g., "Department of State").  
I'd like to describe an approach I'm taking and get your feedback on how 
to make it better.

I'd like to write a Freebase Acre app that will take as input a list of 
strings and return the list of strings with Freebase ids for matches -- 
after an interactive process involving the user.  (This app is modeled 
roughly on http://mqlx.com/reconciliation/recon.html)  The steps 
involved will be:

1) feed each of the strings to the freebase search api 
(http://www.freebase.com/api/service/search?help) to come up with the 
"best match"  (Naively, I'd just use the best match with the highest 
relevance:score -- but I'd like to figure out approaches for 
distinguishing between matches that are head and shoulders beyond the 
other matches vs ones that are just a bit better than the rest....)

2) present the best matches in an input box tied to the Freebase suggest 
jQuery plugin (http://suggest.freebaseapps.com/) so that the user can 
hopefully quickly inspect what Freebase is suggesting.  If the user is 
unhappy with the choice, the user can look through other suggestions, 
create a new Freebase item, or flag the item as having no match.

3) return the complete list of matches.

I'm curious to know whether this approach is basically sound.  If so, I 
plan to implement it and look for ways to make the process efficient.  
For example:

a) I'd like to find ways to make it easy for the user to know which 
matches have been matched with "high confidence" by Freebase  so that 
she can scan through the list quickly....Do people have suggestions 
about how to measure "high confidence" beyond a high relevance score?

b) have input and output mechanisms that tie into, say, Google 
spreadsheet.  I often find it convenient to work with spreadsheets with 
columns of attributes -- one of which I'd like to have is the Freebase 
ID.  I'd like to point this app to a Google spreadsheet (or upload an 
Excel or OpenOffice.org calc file or CSV or TSV file) and then have as 
an output the data with the Freebase ID filled out....(much like how the 
reconciliation interface works)

BTW, does the output of the reconciliation service reduce down to the 
same as the search api if I were to feed the reconciliation api the 
right parameters?

Thanks!
-Raymond


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20090624/2dfdf92d/attachment.htm 


More information about the Developers mailing list