[Developers] Metaweb Perl module

Kurt Bollacker kurt at metaweb.com
Tue Aug 14 19:04:50 UTC 2007


On Sun, Aug 12, 2007 at 01:32:41PM +0200, Hayden Stainsby wrote:
> I started off looking at a concept similar to Shawn's, but found that  
> it meant giving up MQL's ability to send multiple queries in one  
> envelope. So I'm working on a structure for sending a query that is a  
> little more complex (one method call or so), but I think that even a  
> programmer fairly new to Perl should be able to get it up and running  
> fairly quickly.
> 
> In brief Perl-ish syntax:
> 
> $handle = Metaweb->new( server / login details here );
> $handle->add_read_query( query here );
> $handle->send_read_envelope;
> 
> You then access your queries by name. The important point being that  
> you can send multiple named queries (or a single query using the  
> default name if you don't specify a name) using the same calls.
> 
> You then also need a separate method/function to access the trans  
> service to fetch pictures and body text. But that's pretty straight  
> forward, all you need to use there is the translation type and the  
> guid of the object you're fetching.

An Metaweb API feature I found very helpful was to abstract away
cursors.  For example, in this python code:

######################################################################
# Forgive me if there are bugs
######################################################################
def read_gen(env,pagesize=100,maxhits=None):
    ''' Use a python generator to abstract away the use
        of cursors in MQL.
             env: Is a valid MQL query envelope.
        pagesize: Chunk size for the cursor.  Probably can be left alone.
         maxhits: Maximum hits desired.  Optional.
    '''     
    # Let's use cursors
    env['cursor']=True
    count=0
    while count<maxhits or maxhits==None:
        if maxhits:
            # Make sure we don't ask for more than maxhits
            maxpagehits=min(pagesize,maxhits-count)
        else:
            maxpagehits=pagesize
        # Set the page size in the query itself
        env['query']['limit']=maxpagehits
        r=self.read(env)
        # Get the cursor for the next page
        env['cursor']=r['cursor']
        if len(r['result'])==0:
            # We've run out of hits
            break
        for i in r['result']:
            # Next iteration for our generator
            yield i
        count+=len(r['result'])


env={"query":{"id":None,"name":[]}}

for hit in read_gen(env,maxhits=50000):
    print hit
######################################################################

When a query envelope is passed to read_gen, an iterator is returned.
In the test usage here, one can iterate to get the names of 50K
topics.  Notice how the query author did not need to specify "limit"
or know anything about cursors-- especially helpful to new users.  I'd
encourage anyone writing a Perl, Ruby, or any other language API for
Freebase to add this feature.

								Kurt :-)




More information about the Developers mailing list