[Developers] Full article text
brendan
brendan at metaweb.com
Mon Mar 31 19:56:52 UTC 2008
It looks to me like this:
http://en.wikipedia.org/wiki/index.html?curid=38252&action=render
returns a chunk of html (without the html and body tags)
and that most, if not all, of the stuff that you would want to strip
out is enclosed in a div or span with a class. It seems like there
should be a way to impose some css style on this data to hide the
stuff you want to hide and prettify the stuff you want displayed.
If anyone comes up with a nice pattern for this, do share, it seems
like a really useful one.
Brendan
> On Mar 31, 2008, at 12:33 PM, Stephen Lau wrote:
> John Giannandrea wrote:
>> Stephen Lau wrote:
>>
>>> Is there a way to get the full article text, instead of just an
>>> excerpt. e.g. for the Radiohead article, I get the following URL:
>>> http://www.freebase.com/api/trans/raw/guid/9202a8c04000641f800000000004c272
>>> which cuts off after a certain amount of text. Does Freebase cache
>>> the
>>> full text of the article from Wikipedia (or at least the full
>>> abstract
>>> text)?
>>>
>>
>> Hi
>> We dont mirror the entire wikipedia article directly, but we do
>> provide the wikipedia key so that you can get the article from
>> wikipedia.
>> For example.
>>
>> /topic/en/radiohead has property "/wikipedia/topic/en_id" :
>> "38252"
>>
>> Which allows you to fetch:
>> http://en.wikipedia.org/wiki/index.html?curid=38252
>>
>> As others have pointed out wikipedia has several ways to get their
>> full text
>> http://en.wikipedia.org/wiki/index.html?curid=38252&action=render
>>
>> More options are available via their API.
>> http://en.wikipedia.org/w/api.php
>>
>
> Many thanks Alexander, Brendan, & John for the quick replies...
> I didn't know about the Wikipedia api.php, that's definitely handy.
> My
> issue, as Brendan noted is that the text it returns is wik-text.
> Ideally I'd like an HTML'd render of just the content with an easy way
> to strip out content I don't need like the Infobox, etc. etc. I was
> hoping to do this without having to parse HTML itself - but it looks
> like that is an unavoidable task.
>
> (In an ideal world, Freebase would provide me the full text between
> the
> Infobox and the Contents. No worries though, that should be a
> (hopefully) easy Javascript regex).
>
> Thanks again for the pointers guys, much appreciated.
>
> cheers,
> steve
>
> --
> stephen lau | stevel at songbirdnest.com | www.whacked.net
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
More information about the Developers
mailing list