[Developers] how to make best use of Freebase suggest for reconciling collections of names

Reilly Hayes rfh at metaweb.com
Wed Jun 24 21:32:40 UTC 2009


Yes,  there is a spreadsheet loader.  It's in early internal alpha.   
It works quite differently from the Jira task you've identified, as it  
is a freestanding tool that is not integrated with the freebase.com  
UI.  There are some issues that need to be resolved before it is  
stable enough for use by users.  It will appear on labs as soon as it  
is ready.

I'll be sure to have Peter Burns (who wrote it) look you up at hack day.

-r

p.s. The OMB codes sound like they should be in an authority  
namespace.  If you create a private namespace for these keys we can  
migrate that to an authority namespace.


On Jun 24, 2009, at 1:18 PM, Raymond Yee wrote:

> Hi Shawn,
>
> Thanks for pointing me to the https://bugs.freebase.com/browse/CLI-3291 
>  and https://bugs.freebase.com/browse/CLI-3718 -- I'd love to hear  
> from the Metaweb staff concerning their thoughts re reconciliation  
> tools.
>
> Thanks also for pointing out how we can use various keys.  I  
> sometimes have another scenario in which after I do the  
> reconciliation of IDs, I have keys that I'd like to feed back to  
> freebase in the process.  For example, my blog post http://blog.dataunbound.com/2009/06/18/a-first-pass-at-an-org-chart-for-the-us-federal-government/ 
>  points to the dataset that I'd like to reconcile to Freebase:  a  
> list of US federal government agencies in OPML format that I created  
> (http://labs.dataunbound.com/doc/2009/06/OMB_A_11_C.xml).  All these  
> agencies all have OMB agency/bureau codes that *might provide*  
> useful keys into US government agencies.  After I do the  
> reconciliation, I might want to insert these OMB codes into  
> Freebase....
>
> I'm happy that you, Tom, and I will all be at Hack Day!    Yes,  
> let's do a session on that topic since it is to me, probably the key  
> issue for how far I'll ultimately get in using Freebase.
>
> -Raymond
>
> Shawn Simister wrote:
>>
>> I too have done some work in this area, although I haven't worked  
>> on it much lately. I think that anyone with a programming  
>> background who gets deep enough into Freebase eventually gravitates  
>> toward this sort of tool-set. In fact, last year the Metaweb folks  
>> drew up some pretty ambitious plans for something very similar to  
>> what you're describing. Over time, those plans got scaled back to  
>> produce the reconciliation tool that you linked to. However it  
>> looks like the full-fledged spreadsheet loader is still in the  
>> works so I'm still hopeful.
>>
>> As Tom said, you can get a lot of mileage from white-list/black- 
>> lists of types to reconcile against. You can also use things like  
>> Wikipedia, IMDB, NNDB keys as a proxy for some sort of "notability  
>> score" since each of those sites have their own notability  
>> requirements. Lastly, regular expressions can be used to filter out  
>> specific naming structures. For example, people's names don't often  
>> end with organization suffixes like Inc., Corp., Association, etc.
>>
>> I'd love to discuss this in more detail and since you, me and Tom  
>> are all going to be at Hack Day, I'd like to propose that we do a  
>> very informal session on entity reconciliation. Maybe we could get  
>> Colin and Reilly and any other interested parties to join in and  
>> share their experiences building these sort of reconciliation  
>> services.
>>
>> Shawn
>>
>> Raymond Yee wrote:
>>>
>>> [My apologies if you get duplicates of this email -- I sent this  
>>> out already under another email address but didn't see it come  
>>> through....]
>>>
>>> Hi everyone,
>>>
>>> I'm finding it a challenge to efficiently match to Freebase items  
>>> entities that are identified by no more than a single string (such  
>>> as the names of US government agencies -- e.g., "Department of  
>>> State").  I'd like to describe an approach I'm taking and get your  
>>> feedback on how to make it better.
>>>
>>> I'd like to write a Freebase Acre app that will take as input a  
>>> list of strings and return the list of strings with Freebase ids  
>>> for matches -- after an interactive process involving the user.   
>>> (This app is modeled roughly on http://mqlx.com/reconciliation/recon.html) 
>>>   The steps involved will be:
>>>
>>> 1) feed each of the strings to the freebase search api (http://www.freebase.com/api/service/search?help 
>>> ) to come up with the "best match"  (Naively, I'd just use the  
>>> best match with the highest relevance:score -- but I'd like to  
>>> figure out approaches for distinguishing between matches that are  
>>> head and shoulders beyond the other matches vs ones that are just  
>>> a bit better than the rest....)
>>>
>>> 2) present the best matches in an input box tied to the Freebase  
>>> suggest jQuery plugin (http://suggest.freebaseapps.com/) so that  
>>> the user can hopefully quickly inspect what Freebase is  
>>> suggesting.  If the user is unhappy with the choice, the user can  
>>> look through other suggestions, create a new Freebase item, or  
>>> flag the item as having no match.
>>>
>>> 3) return the complete list of matches.
>>>
>>> I'm curious to know whether this approach is basically sound.  If  
>>> so, I plan to implement it and look for ways to make the process  
>>> efficient.  For example:
>>>
>>> a) I'd like to find ways to make it easy for the user to know  
>>> which matches have been matched with "high confidence" by  
>>> Freebase  so that she can scan through the list quickly....Do  
>>> people have suggestions about how to measure "high confidence"  
>>> beyond a high relevance score?
>>>
>>> b) have input and output mechanisms that tie into, say, Google  
>>> spreadsheet.  I often find it convenient to work with spreadsheets  
>>> with columns of attributes -- one of which I'd like to have is the  
>>> Freebase ID.  I'd like to point this app to a Google spreadsheet  
>>> (or upload an Excel or OpenOffice.org calc file or CSV or TSV  
>>> file) and then have as an output the data with the Freebase ID  
>>> filled out....(much like how the reconciliation interface works)
>>>
>>> BTW, does the output of the reconciliation service reduce down to  
>>> the same as the search api if I were to feed the reconciliation  
>>> api the right parameters?
>>>
>>> Thanks!
>>> -Raymond
>>>
>>>
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at freebase.com
>>> http://lists.freebase.com/mailman/listinfo/developers
>>>
>>
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>>
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20090624/252c5f89/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2434 bytes
Desc: not available
Url : http://lists.freebase.com/pipermail/developers/attachments/20090624/252c5f89/attachment.bin 


More information about the Developers mailing list