[Developers] how to make best use of Freebase suggest for reconciling collections of names

Tom Morris tfmorris at gmail.com
Wed Jun 24 18:30:07 UTC 2009


That sounds like a useful tool.  So useful in fact, that it should be
provided as part of the basic offering of Freebase.

One of the things that you don't describe at all is what role Types
play in this process.  Are you looking for entities of a specific type
(government agencies)?  Obviously the Freebase topics are going to be
a mixture of typed, untyped, and non-existent, so you'll need to deal
with them all on that end, but if you are looking for a specific
target type or types, you can help yourself.

On Wed, Jun 24, 2009 at 1:47 PM, Raymond Yee <raymond.yee at gmail.com> wrote:

>
> a) I'd like to find ways to make it easy for the user to know which matches have been matched with "high confidence" by Freebase  so that she can scan through the list quickly....Do people have suggestions about how to measure "high confidence" beyond a high relevance score?
>

I looked at two things when I did this: 1) absolute score and 2) delta
between top score and the rest as compared to the standard deviation
for the series.  That will help, but search's scoring isn't tunable
and it's attempting to be all things to all people, so it may not
return results that are maximally useful for your application.

One of the ways that I worked around this was to basically consider
the search score to be one input to an overall scoring system that
takes other factors into account.  White lists and black lists of
types can be a help (e.g. if it's typed a Play, it's probably not a
government agency -- unless it's a farce, of course :-))  I've also
resorted to going back to the original Wikipedia article to do
analysis on it for useful scoring hints.

> b) have input and output mechanisms that tie into, say, Google spreadsheet.  I often find it convenient to work with spreadsheets with columns of attributes -- one of which I'd like to have is the Freebase ID.  I'd like to point this app to a Google spreadsheet (or upload an Excel or OpenOffice.org calc file or CSV or TSV file) and then have as an output the data with the Freebase ID filled out....(much like how the reconciliation interface works)
>

Depending how big your data set is, you're probably going to want to
have multiple people working on it.  This means that it would be
useful to have a shared queue that they work from.

Not really relevant to your particular application, but something the
whole data import process could really use is a much more powerful
table parser - something which is at least equivalent to what Excel or
Calc can do.  Using a spreadsheet tool as a front end would allow you
to do this part there, but you really should be able to just dump html
tables or csv files or anything somewhat structured into the list
import box and massage it to get reasonable results without resorting
to a separate tool.

Something similar to this has been on my To Do list for a while, but
a) I haven't had time to work on it and b) I fundamentally believe
that such basic pieces of plumbing should be provided as part of the
basic system.  Having said that, if you get started on it, I'll help
to the extent that I can.

Tom


More information about the Developers mailing list