[Data-modeling] Geosearch service (was Re: Modeling uncertainty)

Tom Morris tfmorris at gmail.com
Thu Feb 26 23:25:22 UTC 2009


I don't claim to have a magic solution to this problem.  I'd just
prefer not to record incorrect or misleading data.  More comments
inline below.

On Thu, Feb 26, 2009 at 4:57 PM, John Giannandrea <jg at metaweb.com> wrote:
>
> Well I can think of three options to capture this:
>
> 1/ use the next highest level of location, so if its 'near' /en/
> oxford, say the location is /en/oxfordshire
>     this should always be available as the /location/location/
> containedby property
>
> 2/ we add a property to /location/location called 'approximate' and
> then have to create lots of 'nearby oxford' topics.
>
> 3/ every property that has an expected type location id denormalized
> so there is an approximate_location property too.
>
> Of the three of these, #1 seems least ugly and most correct.

Solution #1 fails in border cases.  I knew this would come up, so I
should have planned ahead with a better example, but if we switch from
Oxford to Reading or Henley, "near Reading" could mean Oxfordshire or
Berkshire.

I'd need to work it through, but I think I lean towards #2.  You'd
only be creating the "near Reading" style locations when they're
actually used (and they really are different locations then), so it
shouldn't affect most users and shouldn't be a storage burden.

Ed's suggestion of quad trees (or hex trees) is great for searching
(and I presume the Freebase database uses something like this
internally), but I'm not sure it provides a user accessible way of
dealing with this.  If it turns out that the only practical solution
is to force the person entering data to explicitly choose a radius
value for "near" then either a circle, quad or hex is probably equally
good.

Tom


More information about the Data-modeling mailing list