[Data-modeling] Location question
Ed Laurent
spatial.db at gmail.com
Sun Jul 12 07:09:01 UTC 2009
You ask a couple of great questions and you described the problem well.
To summarize, there are three primary issues: 1) the
"Location<http://www.freebase.com/type/schema/location/location?domain=%2Flocation>"
type needs some work to more explicitly define relationships among
locations, 2) locations are rarely defined concretely, and 3) there are
often multiple topics for the same concept of a location that differ in one
or more ways.
Issue 1: A couple weeks ago I started an "Enhanced
location<http://www.freebase.com/view/base/enhancedlocation>"
base to build and test potential properties of the Location type to improve
its spatial semantics. Currently, I have "Location
intersection<http://www.freebase.com/type/schema/base/enhancedlocation/location_intersection>"
and "Location union<http://www.freebase.com/type/schema/base/enhancedlocation/location_union>"
types to describe locations as either occurring within the area of
intersection of two or more overlapping locations or within the combined
area of two or more adjacent or overlapping locations. That's about as far
as I've gotten with these types and I welcome all feedback and assistance
with them.
Issue 2: This will always be a problem. Perhaps there are standard
descriptions of the entire world, but I am unaware of them (CIA World
Factbook <https://www.cia.gov/library/publications/the-world-factbook/>?),
and they would necessarily be time mediated. One way to describe such
standards would be to import all the location topics with their definitions
and link them to their sources, which would have a property for the date.
For example, location entities of a political map could be described using FGDC
compliant metadata <http://www.fgdc.gov/csdgmgraphical/index.html>. I've
started modeling FGDC compliant metadata in my MapCentral
base<http://www.freebase.com/view/base/mapcentral>but have not yet
made it to Entity
and Attribute Information <http://www.fgdc.gov/csdgmgraphical/entatt.htm>,
which could include the location topics under the Enumerated
domain<http://www.fgdc.gov/csdgmgraphical/entatt/detail/attrib/adv/enumer.htm>(I
think). See2001
NLCD<http://www.freebase.com/edit/topic/en/2001_national_land_cover_data_metadata?domain=%2Fbase%2Fmapcentral>for
an example of how I'm describing maps.
Issue 3: A system is needed to organize similar but different location
topics to describe how they are similar and/or different. I'm using the "Code
category<http://www.freebase.com/type/schema/base/landcover/code_category?domain=%2Fbase%2Flandcover>"
type of my Land Cover base <http://landcover.freebase.com/> to do this for
land cover classes. See "Spruce-Fir
forest<http://www.freebase.com/edit/topic/en/spruce_fir_forest>"
for an example. Here, I have a generic concept of a land cover category and
and linking it to all the defined land cover codes/classes that fall under
that umbrella, which are linked to their classification
systems<http://www.freebase.com/type/schema/base/landcover/classification_system?domain=%2Fbase%2Flandcover>,
which in turn are linked to relevant
publication(s)<http://www.freebase.com/type/schema/base/landcover/classification_publication>.
The "Classification
code<http://www.freebase.com/type/schema/base/landcover/classification_code>"
type provides properties to describe how the land cover codes/classes are
similar and/or different (i.e., equals, overlaps, contains, contained by).
Maybe some answers to your questions here. I'm not sure, but I am very
interested in helping to work this out.
-Ed
On Sun, Jul 12, 2009 at 1:54 AM, Richard Newman <rnewman at twinql.com> wrote:
> (Apologies if this has been discussed to death; a quick scan through
> the archive didn't satisfy my curiosity. I began phrasing this as an
> email to my only FB list, developers, but decided that it was a data-
> modeling issue… so here I am.)
>
> I recently ended up down a rabbit hole exploring common
> categorizations of the United States — e.g., "Mountain States",
> "Pacific Northwest" — and I'd appreciate some insight, opinion, or
> brutal rejection from those more knowledgeable.
>
> The Census Bureau Divisions (e.g., "Northwest") are clear cut. The
> common categorizations (e.g., "Pacific Northwest") on the other hand,
> based as they are on convention and wooly usage, pose some interesting
> challenges, which apply to notional regions around the world. They're
> clearly locations — they are regions that contain other locations,
> just like the Census Bureau Divisions — but they're not definite (one
> cannot definitively say that Pendleton, OR is in the Palouse, because
> some people exclude Oregon from that region).
>
> This leads to at least two expressiveness problems:
>
> * The "Northwestern United States" is always considered to include
> Oregon and Washington… but also *sometimes* Idaho, Montana, Wyoming,
> Southeast Alaska, and parts of Northern California (presumably
> depending on who's talking). I don't see a good type or property in
> Freebase to express varying levels of "considered membership" of any
> kind (location containment or otherwise). I suppose I could put
> memberships in my own domain (rendering them personal), or a base
> (rendering them subjective), but surely some definitions of these
> regions belong in the Commons? Linkage between these categories and
> their states is a valuable navigation tool.
>
> * The previous bullet mentioned "parts of Northern California" and
> "Southeast Alaska". I feel uneasy reifying these entities ("the parts
> of California sometimes considered to be in the Northwestern United
> States" is a pretty bad name for a topic), even in my own domain, and
> doing so still leaves the problem of choosing which parts to include!
>
> The general approach to wooly containment seems to have been "dart
> throwing", judging by examples:
>
> <
> http://www.freebase.com/view/en/northern_california/-/location/location/contains
> >
>
> -- Northern California appears to consist of the Bay Area, SoMa
> warranting its own mention! No Redding, which seems odd. Is there a
> better way?
>
>
> I realize that this is data modeling, and these are the breaks.
> However, this occurred in the context of playing again with Parallax*,
> in which I see no way to define a set of entities apart from as the
> range of a property of some other entity (not even through
> intersection) — i.e., if I want to map cities in just Oregon and
> Washington, the precise constituents of "Northwestern United States"
> suddenly become very important, because it might be my only way to get
> a handle on that collection. That experience suggests that this might
> be an issue that shouldn't just be swept under the rug, particularly
> given the broad applicability and importance of geographic data.
>
> The alternatives to solving this are presumably either to omit
> containment links for these categories (not much use when I pick
> "Pacific Northwest" in Parallax and see no contained locations at
> all), or to assert only those things which are reasonable certainties.
> I'm not particularly happy with that approach, for the simple reason
> that it leads to definitions that appear artificially narrow (because
> only the intersection of divergent definitions has been included) and
> differ from a reading of the English text imported from Wikipedia.
>
> Thoughts?
>
> -R
>
> * Kicks the hell out of Wolfram Alpha for this kind of exploration, by
> the way; Alpha has difficulty doing any kind of work with sets. Good
> job, David!
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20090712/9421f043/attachment.htm
More information about the Data-modeling
mailing list