[Developers] WEX (wpid/fbid lookup) database
Sam Halliday
sam.halliday at gmail.com
Fri Nov 7 11:48:00 UTC 2008
Aah... Link Export, I feel blind :-)
Are the wikipedia redirects resolved? In the WEX dumps, one of the
beautiful things is that the redirects always point to their final
destinations.
I've encountered MQL key-quoting before... I'll try my best to decode
it into UTF-8 but I'd really appreciate it if you had optimised/tested
encode/decode methods for Java to save me writing them!
On 7 Nov 2008, at 09:37, Alexander Marks wrote:
> Using /wikipedia/en instead of /wikipedia/en_id in that sample code
> will give you the names of the wikipedia article, plus the names of
> all of its redirects (they will be MQL key-quoted, however, so let
> me know if you need help decoding them). The latest quad dump is
> always linked here: http://download.freebase.com/datadumps/ under
> the "2. Link Export" heading.
>
> Al
>
> ----- Original Message -----
> From: "Sam Halliday" <sam.halliday at gmail.com>
> To: "For discussions about MQL, Freebase API and apps built on
> Freebase" <developers at freebase.com>
> Sent: Friday, November 7, 2008 12:52:42 AM GMT -08:00 US/Canada
> Pacific
> Subject: Re: [Developers] WEX (wpid/fbid lookup) database
>
> Good to know about the WEX coming soon!
>
> Sorry, typo in my original e-mail... it's the wpname I am most
> interested in, not the wpid. If future freebase dumps included wpname
> and the wikipedia redirects, I'd be very very happy.
>
> For future reference, where is the freebase-datadump-quadruples.tsv
> file? I can't find it in the freebase data dump.
>
> On 6 Nov 2008, at 23:11, Alexander Marks wrote:
>
>> Hey Sam. We're working on a new WEX dump that will be out soon. Note
>> that you can, however, generate the guid->wpid map that you want
>> using the quad dump with something like this python script:
>>
>> dump = open("freebase-datadump-quadruples.tsv", "r")
>> out = open("guid2wpid.tsv", "w")
>> for line in dump:
>> src, prop, dst, val = line.split("\t")
>> if prop == "/type/object/key" and dst == "/wikipedia/en_id":
>> out.write("%s\t%s\n" % (src, val))
>>
>> which will give you a format like this:
>>
>> /guid/9202a8c04000641f8000000000009e89 3746
>> /guid/9202a8c04000641f8000000000032ded 25493
>> ...
>>
>> As for Wikipedia article redirects, you will always need the WEX
>> dump for that, although replacing /wikipedia/en_id with /wikipedia/
>> en above might get you part of the data you want. The /wikipedia/en
>> Freebase namespace contains both the name of the Wikipedia article,
>> and the name of all redirects to that article.
>>
>> Hope that helps,
>>
>> Al
>>
>> ----- Original Message -----
>> From: "Sam Halliday" <sam.halliday at gmail.com>
>> To: developers at freebase.com
>> Sent: Tuesday, November 4, 2008 2:30:09 AM GMT -08:00 US/Canada
>> Pacific
>> Subject: [Developers] WEX (wpid/fbid lookup) database
>>
>> Hi all,
>>
>> I noticed that freebase released a new database dump for October, but
>> no corresponding WEX dump. My interests are not in the WEX part of
>> the
>> latter database, but in the piece that links wpids to fbuids and
>> gathers all the Wikipedia redirects... I am confused why this is
>> not a
>> part of the freebase download in the first place.
>>
>> I used freebase for a project on the assumption that new data would
>> be
>> available on a quarterly basis. Is this not the case for the WEX
>> data?
>>
>> I'd also like to request that the wpid and redirect tables be
>> included
>> in the freebase data dumps in the future, and not exclusively in the
>> WEX data. I understand that the WEX can be built from the wikipedia
>> data, but the wpid/fbuid lookup cannot... it must be freebase that do
>> this.
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2427 bytes
Desc: not available
Url : http://lists.freebase.com/pipermail/developers/attachments/20081107/05887897/attachment.bin
More information about the Developers
mailing list