[Developers] how to make best use of Freebase suggest for reconciling collections of names
Raymond Yee
raymond.yee at gmail.com
Wed Jun 24 17:47:58 UTC 2009
[My apologies if you get duplicates of this email -- I sent this out
already under another email address but didn't see it come through....]
Hi everyone,
I'm finding it a challenge to efficiently match to Freebase items
entities that are identified by no more than a single string (such as
the names of US government agencies -- e.g., "Department of State").
I'd like to describe an approach I'm taking and get your feedback on how
to make it better.
I'd like to write a Freebase Acre app that will take as input a list of
strings and return the list of strings with Freebase ids for matches --
after an interactive process involving the user. (This app is modeled
roughly on http://mqlx.com/reconciliation/recon.html) The steps
involved will be:
1) feed each of the strings to the freebase search api
(http://www.freebase.com/api/service/search?help) to come up with the
"best match" (Naively, I'd just use the best match with the highest
relevance:score -- but I'd like to figure out approaches for
distinguishing between matches that are head and shoulders beyond the
other matches vs ones that are just a bit better than the rest....)
2) present the best matches in an input box tied to the Freebase suggest
jQuery plugin (http://suggest.freebaseapps.com/) so that the user can
hopefully quickly inspect what Freebase is suggesting. If the user is
unhappy with the choice, the user can look through other suggestions,
create a new Freebase item, or flag the item as having no match.
3) return the complete list of matches.
I'm curious to know whether this approach is basically sound. If so, I
plan to implement it and look for ways to make the process efficient.
For example:
a) I'd like to find ways to make it easy for the user to know which
matches have been matched with "high confidence" by Freebase so that
she can scan through the list quickly....Do people have suggestions
about how to measure "high confidence" beyond a high relevance score?
b) have input and output mechanisms that tie into, say, Google
spreadsheet. I often find it convenient to work with spreadsheets with
columns of attributes -- one of which I'd like to have is the Freebase
ID. I'd like to point this app to a Google spreadsheet (or upload an
Excel or OpenOffice.org calc file or CSV or TSV file) and then have as
an output the data with the Freebase ID filled out....(much like how the
reconciliation interface works)
BTW, does the output of the reconciliation service reduce down to the
same as the search api if I were to feed the reconciliation api the
right parameters?
Thanks!
-Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20090624/2dfdf92d/attachment.htm
More information about the Developers
mailing list