[Developers] Full article text
Stephen Lau
stevel at songbirdnest.com
Mon Mar 31 19:33:28 UTC 2008
John Giannandrea wrote:
> Stephen Lau wrote:
>
>> Is there a way to get the full article text, instead of just an
>> excerpt. e.g. for the Radiohead article, I get the following URL:
>> http://www.freebase.com/api/trans/raw/guid/9202a8c04000641f800000000004c272
>> which cuts off after a certain amount of text. Does Freebase cache
>> the
>> full text of the article from Wikipedia (or at least the full abstract
>> text)?
>>
>
> Hi
> We dont mirror the entire wikipedia article directly, but we do
> provide the wikipedia key so that you can get the article from
> wikipedia.
> For example.
>
> /topic/en/radiohead has property "/wikipedia/topic/en_id" : "38252"
>
> Which allows you to fetch:
> http://en.wikipedia.org/wiki/index.html?curid=38252
>
> As others have pointed out wikipedia has several ways to get their
> full text
> http://en.wikipedia.org/wiki/index.html?curid=38252&action=render
>
> More options are available via their API.
> http://en.wikipedia.org/w/api.php
>
Many thanks Alexander, Brendan, & John for the quick replies...
I didn't know about the Wikipedia api.php, that's definitely handy. My
issue, as Brendan noted is that the text it returns is wik-text.
Ideally I'd like an HTML'd render of just the content with an easy way
to strip out content I don't need like the Infobox, etc. etc. I was
hoping to do this without having to parse HTML itself - but it looks
like that is an unavoidable task.
(In an ideal world, Freebase would provide me the full text between the
Infobox and the Contents. No worries though, that should be a
(hopefully) easy Javascript regex).
Thanks again for the pointers guys, much appreciated.
cheers,
steve
--
stephen lau | stevel at songbirdnest.com | www.whacked.net
More information about the Developers
mailing list