[Developers] libraries/techniques for extracting data from the Wikipedia to feed to freebase

Christopher R. Maden crism at metaweb.com
Thu Feb 26 16:36:53 UTC 2009


Tom Morris wrote:
> I would definitely like to see some of the framework for Wikipedia
> interpretation released so that it could be improved/expanded, but
> it's possible that Freebase considers this part of its "secret sauce."
>  It certainly needs improvement, but I don't have the machine learning
> chops to know how hard it would be.

Nearly all of the secret sauce is in WEX, which we publish regularly.

It is just a matter of someone having the time to define the mappings 
between the Wikipedia data and Freebase properties; sometimes it is 
messier than it seems at first, and other times, it requires revisiting 
the schema itself.

Filing Jira tasks and voting for them is a good way to help set priorities.

Of course, the best way to get it done is for the data community to be 
involved; unfortunately, although WEX is public, I think the mapping 
tool is still too grotty to inflict on the public.

~Chris
-- 
Christopher R. Maden
Data Architect
Freebase.com: <URL: http://www.freebase.com/ >
Metaweb Technologies, Inc. <URL: http://www.metaweb.com/ >


More information about the Developers mailing list