[Developers] 503 error when trying to do a large write to Sandbox
Tyler Pirtle
tyler at metaweb.com
Sat Feb 23 00:51:17 UTC 2008
Hey Alex,
This is an issue that we currently know about, I'll do my best to walk you through what happened.
First, the 503 part.
Basically you're request completed, in full, the first time. We returned the HTTP response prematurely, this is a known
bug and it's being worked on.
The latter part involving duplicates being written is also a known bug.
So while we're working on that, the best advice I can give you is what you've already figured out:
Break your writes up into small ones. This serves two very useful purposes:
1) It lets you checkpoint your operations, meaning if you're going to do some 5000, 10000 or greater writes,
if you break them up then you can pause them at any time because you're making multiple requests
2) It would also let you test your writes a little cleaner, do a hundred or so say, and then you can check what you've done.
One other thing - if you happen to see some error code, like anything in the 500 range, I try to be as responsive as
possible to these so please by all means post a message back to this list.
(If you're really ambitious, if you could post the full HTTP response headers I can get you answers very quickly, in
particular we return a header called "X-Metaweb-TID", that one helps a lot).
Thanks, sorry for the troubles.
Tyler
Alexander Botero-Lowry wrote:
> Hi,
>
> Monday night I tried to import 2561 records as a single query on Sandbox.
> After fixing some bugs in how I was formatting my date, I finally got the
> query to execute but I received a 503. I tried the query a few more times
> and then checked sandbox to make sure nothing had happened and low and
> behold some of the entries showed up! At that point I realized that I
> wasn't sure all of them were there so I rewrote my importer script to
> do it in 100 block increments, and I got an error that there were 2
> unique entries, which I looked up and determined was the result of
> create=unless_exists not being able to disambiguate! So it seems like
> somehow, even though i was using create=unless_exists the entry got
> added twice. I'm not exactly sure how transactions work internally
> so I can't really speculate further on how that happened.
>
> Follows is my importer:
>
> #!/usr/bin/env python
>
> import metaweb
>
> USERNAME=''
> PASSWORD=''
>
> MONTH_MAP = {'Jan':1, 'Feb':2, 'Mar':3, 'Apr':4, 'May':5, 'Jun':6, 'Jul':7, 'Aug':8, 'Sep':9, 'Oct':10, 'Nov':11,
> 'Dec':12}
> TYPEID = '/user/alexbl/default_domain/exchange_rate'
> SOURCE_CURR = {'name':'US $', 'type':'/finance/currency'}
> TARGET_CURR = {'name':'Australian dollar', 'type':'/finance/currency'}
>
> def generate_query(a):
> data = a.split()
> # FIXME: find a better way to do this
> rate_date = data[0].split('-')
> rate_date[2] = '19'+rate_date[2]
> rate_date[1] = "%02d" % (MONTH_MAP[rate_date[1]])
> rate_date[0] = "%02d" % int(rate_date[0])
> data[0] = '-'.join(reversed(rate_date))
>
> q = {'create':'unless_exists',
> 'id':None,
> 'type':[TYPEID],
> 'source_of_exchange':SOURCE_CURR,
> 'target_of_exchange':TARGET_CURR,
> 'amount':float(data[1]),
> 'date_of_rate':data[0]
> }
> return q
>
> if __name__ == '__main__':
> query = [ generate_query(a) for a in file('USD-AUD-90-99.txt') ]
> credentials = metaweb.login(USERNAME, PASSWORD)
> for a in range(100, len(query), 100):
> result = metaweb.write(query[a:a+100], credentials)
> for r in result:
> print r['create'], r['id']
>
>
> Before I added the stepper, it was simply directly doing what's in side the
> for loop with the query list.
>
> The data format is like:
> 2-Jan-90 0.7855
> 3-Jan-90 0.7818
> ...
>
> I will most likely write a script to query the ids and then detype them and then
> do an import again with the 100 block steps version to see if that's the only problem.
>
> Luckily this was all on sandbox :)
>
> Thanks,
> Alex
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
More information about the Developers
mailing list