[Data-modeling] "area" property of "/location/location" type

Jonathan W. Lowe jlowe at giswebsite.com
Wed Apr 2 20:41:05 UTC 2008


Thoughts:

For this particular case, Jeff P's proposal (to make
the /location/location/area property unique until a more sophisticated
modeling need arises) sounds practical and relatively painless (given
only 24 cases containing multiple area values).

Robert's examples of earthquake magnitude and changing company name
don't seem "over-modeled" at all -- they seem *clear*.  Yes, an
application developer has to invest time to understand the detail of
those models, but can then pick and choose what data to use from them
depending on his application requirements.

Conversely, the /location/location/area property is not a CVT; it's just
a floating point number.  And the /location/location type description
provides "guidelines for filling in location properties" for some of
that type's properties (e.g. "geolocation", "contains", "adjoins", etc.)
but not for "area".  Simple typing with no guidelines suggests a simple
property to hold "definitive values", right?  So, given this seemingly
simple model for "area", I was confused by topics containing multiple
area values -- all without timestamp or source associations to
distinguish any one area from any other in the list.  I interpreted
these instances to be mistakes that could be prevented in future by
setting the "area" property to be unique.

This was prompted in support of modeling US Census 2000 Blocks, each of
which has an area and population that can be combined in the application
tier to calculate population density.  If it's decided that area should
remain non-unique, then my application will arbitrarily use the first
value in the list to perform this calculation.

- Jonathan

On Wed, 2008-04-02 at 11:40 -0700, Robert Cook wrote:
> Again, I think this is an example of two distinct use cases:  1) The  
> definitive value that's there for people who really don't want to be  
> bothered with all of the detail (most of the world); and 2) All of the  
> possible variants, be they measurements taken at a different time,  
> with a different methodology or by different people that's there for  
> the enthusiasts (those who care enough about the data to ensure it's  
> complete and accurate).  Take for example the over-modeling I did on  
> Earthquakes:
> 
> http://www.freebase.com/view/en/loma_prieta_earthquake
> 
> The Magnitude is blown out into a CVT and can have multiple magnitude  
> values on different scales from different sources.  This is certainly  
> interesting to seismologists and geo-geeks, but for somebody who wants  
> an ordered list of the top 30 most destructive earthquakes, the detail  
> simply gets in the way.
> 
> The same could be said for the "current" vs "complete" list of board  
> members of a company, members of a sports team, products of a  
> manufacturer, or population of a country.  Jeff T pointed out that I  
> went as far as creating a specific time-series property to capture the  
> changing name of a company:
> 
> <http://www.freebase.com/view/guid/9202a8c04000641f8000000005b7ab1f>
> (See "Previous names")
> 
> So far my (kind of lame) solution is to use two distinct properties.   
> This is an OK stopgap, but, like all denormalizations, it's confusing  
> and semantically sub-optimal.  The "simple" property really should be  
> generated by a query on the complete property, although it's not clear  
> how that could support the "top 30 earthquakes" example above in a  
> performant way.  For now, though, we have to balance when to have the  
> simple representation and when we should add a CVT enriched property  
> that can hold all of the possible representations.
> 
> Thoughts?
> 
> R
> 
> On Apr 2, 2008, at 10:47 AM, Jeff Prucher wrote:
> 
> >> -----Original Message-----
> >> From: data-modeling-bounces at freebase.com
> >> [mailto:data-modeling-bounces at freebase.com] On Behalf Of
> >> Kirrily Robert
> >> Sent: Wednesday, April 02, 2008 10:04 AM
> >> To: Freebase data modeling mailing list
> >> Subject: Re: [Data-modeling] "area" property of
> >> "/location/location" type
> >>
> >>
> >> ----- "Jonathan W. Lowe" <jlowe at giswebsite.com> wrote:
> >>> The "/location/location" type has an "area" property that accepts
> >>> multiple values.  Should this property instead be restricted to one
> >>> value?
> >>>
> >>> Unless someone can identify a location having more than one valid
> >>> area, I recommend a schema change that restricts
> >>> /location/location/area to one value.
> >>
> >> Time series!  Uhhh... forget I said that.  Please.  PLEASE?
> >
> > Well, that and the fact that different sources can have different,  
> > equally
> > valid, area measurements for the same location at the same time,  
> > depending
> > on methodology. But neither of those reasons is why the current  
> > property is
> > non-unique; the real reason is that I forgot to check the box. Me, I'm
> > tempted to just make it unique until we actually have some time- 
> > series data
> > to input. Right now, it's practically all WP infobox loads, which  
> > implies
> > reasonably current area.  (There are 24 locations with multiple  
> > values, all
> > of which appear to be differences in rounding, conversion, or using  
> > the
> > different measurement units, which will have to be cleaned up before  
> > we can
> > make the property unique.)
> >
> > Jeff P
> >
> > _______________________________________________
> > Data-modeling mailing list
> > Data-modeling at freebase.com
> > http://lists.freebase.com/mailman/listinfo/data-modeling
> 
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
> 



More information about the Data-modeling mailing list