[Developers] freebase and dbpedia

Robert Cook robert at metaweb.com
Mon Nov 10 17:14:46 UTC 2008


Also, I should point out that most wikipedia templates are quite noisy  
and pose serious problems for automatic extraction.  This makes sense  
-- information in these templates is intended as static content in a  
document, not as structured machine-readable data.  There are major  
variations in formatting of dates, number values (instance by instance  
variations in metric and imperial for example), parenthetical asides,  
multivalue subfields, etc.  It surprises me that some of these  
templates are as clean as they are -- we really are lucky to enjoy the  
efforts of so many compulsive organizers in the wikipedia community.   
But these clean templates are the exception, not the rule.

Freebase has taken a conservative approach to data quality so our data  
growth from wikipedia has been gradual.  We are getting better at  
mapping these templates, and, indeed, the Freebase corpus itself is  
now being used to increase the certainty of extraction and thus the  
yield.  In the last few weeks, we have improved the quality of our  
extraction algorithms and added hundreds of new template and category  
mappings and are continuing to do so.

R

On Nov 10, 2008, at 3:02 AM, John Giannandrea wrote:

>
> Ravi Iyer wrote:
>> I'm considering using the guid links to merge the data, but I
>> thought I'd ask what plans were already in place to import the
>> dbpedia data into freebase first.
>
> The dbpedia RDF specifically links to the Freebase data via an
> owl:sameAs  predicate.
>
> The reason you see more RDF assertions at http://dbpedia.org/page/Bobby_Rahal
> than at  http://rdf.freebase.com/rdf/guid.9202a8c04000641f800000000031a897
> is that dbpedia is exporting more raw data such as wikipedia
> categories and all the
> infobox properties at a low level.
>
> Freebase only has the data that has been specifically hand mapped to
> freebase schema.
> For example, because we dont have a racecar driver type yet, we have
> not mapped available data
> like   p:firstWin • 1982 (xsd:integer)   into freebase data.   Over
> time we expect more of
> this data to get mapped automatically into corresponding freebase  
> types.
>
> -jg
>
>
>
>
>
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers



More information about the Developers mailing list