[Developers] how to make best use of Freebase suggest for reconciling collections of names

Shawn Simister narphorium at gmail.com
Wed Jun 24 19:47:12 UTC 2009


I too have done some work in this area, although I haven't worked on it 
much lately. I think that anyone with a programming background who gets 
deep enough into Freebase eventually gravitates toward this sort of 
tool-set. In fact, last year the Metaweb folks drew up some pretty 
ambitious plans <https://bugs.freebase.com/browse/CLI-3291> for 
something very similar to what you're describing. Over time, those plans 
got scaled back to produce the reconciliation tool that you linked to. 
However it looks <https://bugs.freebase.com/browse/CLI-3718> like the 
full-fledged spreadsheet loader is still in the works so I'm still hopeful.

As Tom said, you can get a lot of mileage from white-list/black-lists of 
types to reconcile against. You can also use things like Wikipedia, 
IMDB, NNDB keys as a proxy for some sort of "notability score" since 
each of those sites have their own notability requirements. Lastly, 
regular expressions can be used to filter out specific naming 
structures. For example, people's names don't often end with 
organization suffixes like Inc., Corp., Association, etc.

I'd love to discuss this in more detail and since you, me and Tom are 
all going to be at Hack Day, I'd like to propose that we do a very 
informal session on entity reconciliation. Maybe we could get Colin and 
Reilly and any other interested parties to join in and share their 
experiences building these sort of reconciliation services.

Shawn

Raymond Yee wrote:
> [My apologies if you get duplicates of this email -- I sent this out 
> already under another email address but didn't see it come through....]
>
> Hi everyone,
>
> I'm finding it a challenge to efficiently match to Freebase items 
> entities that are identified by no more than a single string (such as 
> the names of US government agencies -- e.g., "Department of State").  
> I'd like to describe an approach I'm taking and get your feedback on 
> how to make it better.
>
> I'd like to write a Freebase Acre app that will take as input a list 
> of strings and return the list of strings with Freebase ids for 
> matches -- after an interactive process involving the user.  (This app 
> is modeled roughly on http://mqlx.com/reconciliation/recon.html)  The 
> steps involved will be:
>
> 1) feed each of the strings to the freebase search api 
> (http://www.freebase.com/api/service/search?help) to come up with the 
> "best match"  (Naively, I'd just use the best match with the highest 
> relevance:score -- but I'd like to figure out approaches for 
> distinguishing between matches that are head and shoulders beyond the 
> other matches vs ones that are just a bit better than the rest....)
>
> 2) present the best matches in an input box tied to the Freebase 
> suggest jQuery plugin (http://suggest.freebaseapps.com/) so that the 
> user can hopefully quickly inspect what Freebase is suggesting.  If 
> the user is unhappy with the choice, the user can look through other 
> suggestions, create a new Freebase item, or flag the item as having no 
> match.
>
> 3) return the complete list of matches.
>
> I'm curious to know whether this approach is basically sound.  If so, 
> I plan to implement it and look for ways to make the process 
> efficient.  For example:
>
> a) I'd like to find ways to make it easy for the user to know which 
> matches have been matched with "high confidence" by Freebase  so that 
> she can scan through the list quickly....Do people have suggestions 
> about how to measure "high confidence" beyond a high relevance score?
>
> b) have input and output mechanisms that tie into, say, Google 
> spreadsheet.  I often find it convenient to work with spreadsheets 
> with columns of attributes -- one of which I'd like to have is the 
> Freebase ID.  I'd like to point this app to a Google spreadsheet (or 
> upload an Excel or OpenOffice.org calc file or CSV or TSV file) and 
> then have as an output the data with the Freebase ID filled 
> out....(much like how the reconciliation interface works)
>
> BTW, does the output of the reconciliation service reduce down to the 
> same as the search api if I were to feed the reconciliation api the 
> right parameters?
>
> Thanks!
> -Raymond
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20090624/ecbb74f5/attachment.htm 


More information about the Developers mailing list