[Developers] libraries/techniques for extracting data from the Wikipedia to feed to freebase
Raymond Yee
raymond.yee at gmail.com
Thu Feb 26 05:03:11 UTC 2009
Anyone out there have a lot of experience scraping the Wikipedia for
facts? The applications are many, but some examples I have in mind
right now include:
1) extracting data about chemical elements -- e.g. boiling points of
elements
2) American politicians at the federal, state, and municipal levels
3) visual artists and their works
One thing that has surprised me about freebase has been the patchiness
of the data in it -- I wanted to plot all the boiling point of elements
vs atomic numbers -- but a lot of the elements are missing bps -- if you
go to
http://is.gd/kVb1
and hit "Read>>" you'll get a list of elements w/o boiling points -- as
of 2009-02-26T04:53:34.3750Z (that is).
So what I'd like to do is to use a set of Wikipedia parsers to extract
data that I find useful and push them into Freebase for some projects I
have in mind. My quick experience with DBPedia is that it's not better
for chemical elements either -- but I might just be misunderstanding it.
Does freebase have any tools it can release that we can adapt for
specific purposes to push more data into freebase?
Thanks,
-Raymond
More information about the Developers
mailing list