[Developers] how to make best use of Freebase suggest for reconciling collections of names

Raymond Yee raymond.yee at gmail.com
Wed Jun 24 20:18:43 UTC 2009


Hi Shawn,

Thanks for pointing me to the https://bugs.freebase.com/browse/CLI-3291 
and https://bugs.freebase.com/browse/CLI-3718 -- I'd love to hear from 
the Metaweb staff concerning their thoughts re reconciliation tools.

Thanks also for pointing out how we can use various keys.  I sometimes 
have another scenario in which after I do the reconciliation of IDs, I 
have keys that I'd like to feed back to freebase in the process.  For 
example, my blog post 
http://blog.dataunbound.com/2009/06/18/a-first-pass-at-an-org-chart-for-the-us-federal-government/ 
points to the dataset that I'd like to reconcile to Freebase:  a list of 
US federal government agencies in OPML format that I created 
(http://labs.dataunbound.com/doc/2009/06/OMB_A_11_C.xml).  All these 
agencies all have OMB agency/bureau codes that *might provide* useful 
keys into US government agencies.  After I do the reconciliation, I 
might want to insert these OMB codes into Freebase....

I'm happy that you, Tom, and I will all be at Hack Day!    Yes, let's do 
a session on that topic since it is to me, probably the key issue for 
how far I'll ultimately get in using Freebase.

-Raymond

Shawn Simister wrote:
> I too have done some work in this area, although I haven't worked on 
> it much lately. I think that anyone with a programming background who 
> gets deep enough into Freebase eventually gravitates toward this sort 
> of tool-set. In fact, last year the Metaweb folks drew up some pretty 
> ambitious plans <https://bugs.freebase.com/browse/CLI-3291> for 
> something very similar to what you're describing. Over time, those 
> plans got scaled back to produce the reconciliation tool that you 
> linked to. However it looks 
> <https://bugs.freebase.com/browse/CLI-3718> like the full-fledged 
> spreadsheet loader is still in the works so I'm still hopeful.
>
> As Tom said, you can get a lot of mileage from white-list/black-lists 
> of types to reconcile against. You can also use things like Wikipedia, 
> IMDB, NNDB keys as a proxy for some sort of "notability score" since 
> each of those sites have their own notability requirements. Lastly, 
> regular expressions can be used to filter out specific naming 
> structures. For example, people's names don't often end with 
> organization suffixes like Inc., Corp., Association, etc.
>
> I'd love to discuss this in more detail and since you, me and Tom are 
> all going to be at Hack Day, I'd like to propose that we do a very 
> informal session on entity reconciliation. Maybe we could get Colin 
> and Reilly and any other interested parties to join in and share their 
> experiences building these sort of reconciliation services.
>
> Shawn
>
> Raymond Yee wrote:
>> [My apologies if you get duplicates of this email -- I sent this out 
>> already under another email address but didn't see it come through....]
>>
>> Hi everyone,
>>
>> I'm finding it a challenge to efficiently match to Freebase items 
>> entities that are identified by no more than a single string (such as 
>> the names of US government agencies -- e.g., "Department of State").  
>> I'd like to describe an approach I'm taking and get your feedback on 
>> how to make it better.
>>
>> I'd like to write a Freebase Acre app that will take as input a list 
>> of strings and return the list of strings with Freebase ids for 
>> matches -- after an interactive process involving the user.  (This 
>> app is modeled roughly on http://mqlx.com/reconciliation/recon.html)  
>> The steps involved will be:
>>
>> 1) feed each of the strings to the freebase search api 
>> (http://www.freebase.com/api/service/search?help) to come up with the 
>> "best match"  (Naively, I'd just use the best match with the highest 
>> relevance:score -- but I'd like to figure out approaches for 
>> distinguishing between matches that are head and shoulders beyond the 
>> other matches vs ones that are just a bit better than the rest....)
>>
>> 2) present the best matches in an input box tied to the Freebase 
>> suggest jQuery plugin (http://suggest.freebaseapps.com/) so that the 
>> user can hopefully quickly inspect what Freebase is suggesting.  If 
>> the user is unhappy with the choice, the user can look through other 
>> suggestions, create a new Freebase item, or flag the item as having 
>> no match.
>>
>> 3) return the complete list of matches.
>>
>> I'm curious to know whether this approach is basically sound.  If so, 
>> I plan to implement it and look for ways to make the process 
>> efficient.  For example:
>>
>> a) I'd like to find ways to make it easy for the user to know which 
>> matches have been matched with "high confidence" by Freebase  so that 
>> she can scan through the list quickly....Do people have suggestions 
>> about how to measure "high confidence" beyond a high relevance score?
>>
>> b) have input and output mechanisms that tie into, say, Google 
>> spreadsheet.  I often find it convenient to work with spreadsheets 
>> with columns of attributes -- one of which I'd like to have is the 
>> Freebase ID.  I'd like to point this app to a Google spreadsheet (or 
>> upload an Excel or OpenOffice.org calc file or CSV or TSV file) and 
>> then have as an output the data with the Freebase ID filled 
>> out....(much like how the reconciliation interface works)
>>
>> BTW, does the output of the reconciliation service reduce down to the 
>> same as the search api if I were to feed the reconciliation api the 
>> right parameters?
>>
>> Thanks!
>> -Raymond
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>>   
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20090624/0e39809b/attachment-0001.htm 


More information about the Developers mailing list