[Data-modeling] Location question

Tom Morris tfmorris at gmail.com
Mon Jul 13 21:52:51 UTC 2009


One more tidbit that I forgot to include which may affect how
locations in Freebase are perceived is that all Wikipedia articles
have any parenthetical phrases stripped when they are imported.
Generally this is a good thing, but for certain types of locations it
leads to problems when it confuses users.

For example, in New England cities and towns which have two markedly
different areas (due to density differences or some other significant
characteristic) typically have a U.S. Census Bureau "Census Designated
Place" with a name of the form "<foo> CDP."  For example, there might
be both Portland and Portland CDP.  Unfortunately, Wikipedia converts
this to Portland (city) and Portland (CDP) which Freebase then strips
creating Portland and Portland.  These either both get left around
creating one type of confusion or get merged together, creating an
entirely different kind of confusion.  What really needs to happen is
to have "Portland," the CDP, get renamed back to "Portland CDP" since
that's its official name.

Tom


More information about the Data-modeling mailing list