[Developers] Freebase WEX Data

Alexander Marks al at metaweb.com
Wed Apr 16 20:50:51 UTC 2008


Kendra -- WEX is not designed for reassembling markup into HTML, which it sounds like you might be trying to do. The XML column for a "Template:%" article will not give you the markup necessary to *interpret* a Template macro, so there is really no utility in that content. WEX is designed to make querying the values of template calls in articles simple, for data mining. The data dumps that Wikipedia itself provides might be better suited for your application (http://meta.wikimedia.org/wiki/Data_dumps). Is that helpful? I'm curious: what exactly is your project?

Al

----- Original Message -----
From: "Kendra Kuhl" < kendra.kuhl at juggle.com > 
Date: April 15, 2008 7:24:52 AM PDT 
To: < developers at freebase.com > 
Subject: [Developers] Freebase WEX Data 
Reply-To: "For discussions about MQL, Freebase API and apps built on Freebase" < developers at freebase.com > 


I'm hoping somebody can help me or point me in the right direction. 

I'm trying to put the Wikipedia article back together using XSLT. In trying to figure out what to do with the templates, it seemed to me that data is missing from the template article that allows me to do that. For example (also see below) in trying to reconstruct the dmoz template, I would go to the Template:Dmoz article, get the wiki markup for the template and use the data from the article with the template. The WEX dumps Template:DMOZ article doesn't contain template data. The same for the Template:Doctor_Who_RG article. The Dr. Who article contains more of the template information, but still not enough to create the template on the article side. 

Information I expect to be in the Template:Dmoz article: 

<text xml:space="preserve"><noinclude>{{Pp-semi-protected|small=yes}}</noinclude>{{#switch: {{{3|}}} |#default=[ http://www.dmoz.org/ {{{1}}}/ {{{2|{{PAGENAME}}}}}] at the [[Open Directory Project]] |user=[ http://www.dmoz.org/profiles/ {{{1}}}.html {{{2|{{PAGENAME}}}}}] at the [[Open Directory Project]] }}<noinclude>{{Documentation}}</noinclude></text> 

Information showing in the Template:Dmoz article in the WEX dump: 

<articles xmlns:xhtml=" " loadtime="0 sec" rendertime="0.002 sec" totaltime="0.002 sec"><article><paragraph><extension extension_name="noinclude"><template name="Pp-semi-protected">\n<param name="small">yes</param>\n</template><template name="Pp-semi-protected">\n<param name="small">yes</param>\n</template></extension><extension extension_name="noinclude"><template name="Documentation">\n</template><template name="Documentation">\n</template></extension></paragraph></article></articles> 

This is what I see as missing: 

{{#switch: {{{3|}}} |#default=[ http://www.dmoz.org/ {{{1}}}/ {{{2|{{PAGENAME}}}}}] at the [[Open Directory Project]] |user=[ http://www.dmoz.org/profiles/ {{{1}}}.html {{{2|{{PAGENAME}}}}}] at the [[Open Directory Project]] }} 

So my questions are these: 

1. Am I missing something? Is there another spot in the WEX dump that contains the information to put the template back together again? I have looked, but nothing jumps out at me. How does Metaweb / Freebase handle it? 
2. Has anybody else attempted the XSLT reconstruction of the WEX data? If so, are you willing to share? I can share what I have come up with already, but I'm pretty new to XSLT. 

Thanks in advance!! 

Kendra _______________________________________________ 
Developers mailing list 
Developers at freebase.com 
http://lists.freebase.com/mailman/listinfo/developers 




More information about the Developers mailing list