[Developers] Handling query timeouts
Scott Meyer
sm at metaweb.com
Fri Aug 22 17:23:58 UTC 2008
Rishav Rastogi wrote:
> Can anyone suggest a good way to handle query timeouts?
1. Be aware of what is hard. Basically, this boils down to two
things: queries involving sorting or counting, and queries which
are underconstrained relative to the number of links to
the objects involved. For sorting and counting we either
need to look at all the possible answers to a query or, in
the case of sorting, use a sorted index (which precludes use
of other indexes). For complex queries, the liability is stumbling
upon a very popular node which can cause an explosion in
the number of candidates we have to check. So, if you're working
in a domain which is relatively small, say types, you can ask
relatively complex questions. But asking for everything withing
6 degress of Kevin Bacon is hard. It might work for some actors
but not for well-known ones.
2. If you get a timeout, the first thing to do is a little divide
and conquer experimentation. For example if you're asking for
a hundred results, try asking for 50, 20, 10, 1. If result 11 is
some pathological case, finding exactly where things go wrong
might allow you to add a constraint which will eliminate the
bad guy. You can also pull the query apart and evaluate sub-clauses
individually. Adding a count to each subclause can be very
illuminating: If subclause A has a million candidates and
subclause B has 5 million you can reasonably suspect that
joining A and B might hurt. If you have sorts, try the query
without sorting.
3. Build some sort of contingency plan into your application.
Many Freebase applications allow unconstrained navigation
around the graph. Some experience from #2 above will suggest
useful ways in which queries can be simplified. If the complex
query times out, shift to a simpler view, unsorted, fewer properties,
more constraints, etc. New features such as estimated counts
can help do this.
4. Please do complain. We put every query that anyone has ever
complained about into our regression tests and we use the
developers mailing list archive to motivate development plans.
Hope that helps,
-Scott
More information about the Developers
mailing list