[Developers] libraries/techniques for extracting data from the Wikipedia to feed to freebase

Raymond Yee raymond.yee at gmail.com
Thu Feb 26 05:03:11 UTC 2009


Anyone out there have a lot of experience scraping the Wikipedia for 
facts?   The applications are many, but some examples I have in mind 
right now include:

1) extracting data about chemical elements -- e.g. boiling points of 
elements

2) American politicians at the federal, state, and municipal levels

3) visual artists and their works

One thing that has surprised me about freebase has been the patchiness 
of the data in it -- I wanted to plot all the boiling point of elements 
vs atomic numbers -- but a lot of the elements are missing bps -- if you 
go to

http://is.gd/kVb1

and hit "Read>>"  you'll get a list of elements w/o boiling points -- as 
of 2009-02-26T04:53:34.3750Z (that is).

So what I'd like to do is to use a set of Wikipedia parsers to extract 
data that I find useful and push them into Freebase for some projects I 
have in mind.  My quick experience with DBPedia is that it's not better 
for chemical elements either -- but I might just be misunderstanding it.

Does freebase have any tools it can release that we can adapt for 
specific purposes to push more data into freebase?

Thanks,
-Raymond


More information about the Developers mailing list