[Developers] Metaweb Perl module
Kurt Bollacker
kurt at metaweb.com
Tue Aug 14 19:04:50 UTC 2007
On Sun, Aug 12, 2007 at 01:32:41PM +0200, Hayden Stainsby wrote:
> I started off looking at a concept similar to Shawn's, but found that
> it meant giving up MQL's ability to send multiple queries in one
> envelope. So I'm working on a structure for sending a query that is a
> little more complex (one method call or so), but I think that even a
> programmer fairly new to Perl should be able to get it up and running
> fairly quickly.
>
> In brief Perl-ish syntax:
>
> $handle = Metaweb->new( server / login details here );
> $handle->add_read_query( query here );
> $handle->send_read_envelope;
>
> You then access your queries by name. The important point being that
> you can send multiple named queries (or a single query using the
> default name if you don't specify a name) using the same calls.
>
> You then also need a separate method/function to access the trans
> service to fetch pictures and body text. But that's pretty straight
> forward, all you need to use there is the translation type and the
> guid of the object you're fetching.
An Metaweb API feature I found very helpful was to abstract away
cursors. For example, in this python code:
######################################################################
# Forgive me if there are bugs
######################################################################
def read_gen(env,pagesize=100,maxhits=None):
''' Use a python generator to abstract away the use
of cursors in MQL.
env: Is a valid MQL query envelope.
pagesize: Chunk size for the cursor. Probably can be left alone.
maxhits: Maximum hits desired. Optional.
'''
# Let's use cursors
env['cursor']=True
count=0
while count<maxhits or maxhits==None:
if maxhits:
# Make sure we don't ask for more than maxhits
maxpagehits=min(pagesize,maxhits-count)
else:
maxpagehits=pagesize
# Set the page size in the query itself
env['query']['limit']=maxpagehits
r=self.read(env)
# Get the cursor for the next page
env['cursor']=r['cursor']
if len(r['result'])==0:
# We've run out of hits
break
for i in r['result']:
# Next iteration for our generator
yield i
count+=len(r['result'])
env={"query":{"id":None,"name":[]}}
for hit in read_gen(env,maxhits=50000):
print hit
######################################################################
When a query envelope is passed to read_gen, an iterator is returned.
In the test usage here, one can iterate to get the names of 50K
topics. Notice how the query author did not need to specify "limit"
or know anything about cursors-- especially helpful to new users. I'd
encourage anyone writing a Perl, Ruby, or any other language API for
Freebase to add this feature.
Kurt :-)
More information about the Developers
mailing list