[Data-modeling] how much work done on modeling of personal names -- even for surname + given name?

Robert Cook robert at metaweb.com
Wed Mar 4 19:24:04 UTC 2009



On Mar 4, 2009, at 10:40 AM, Tom Morris <tfmorris at gmail.com> wrote:

> On Wed, Mar 4, 2009 at 1:23 PM, Robert Cook <robert at metaweb.com>  
> wrote:
>> We've talked about this in the past but were overwhelmed by the  
>> edge cases
>> and didn't see that much value for all of the work.  That may have  
>> changed.
>>  This was the thinking:
>> We could add a "surname" property to /people/person which would
>> expect /people/family_name
>> We could add another property, "given name(s)" to person which would
>> expect /givennames/given_name
>>    - this probably should be moved to /people/
>>    - if a person has more than one known given name, then they  
>> would be
>> ordered appropriately
>
> You can't decouple given names and surnames and get accurate results.
> All elements of a single name need to be tied together and then the
> aliasing of multiple names layered on top of that.  Otherwise you
> can't correctly model Mary Smith and Mrs. Mary Smith Jones.  She's
> never known as Mary Smith Smith.

It's true that this would be not modeled explicitly capturing the  
state change, but it could be done in a useful way still.  The real  
ambiguity in this case exists in the real world. Does Mary Smith Jones  
consider "jones" to be her middle name or part of her composite family  
name?  This may be different from person to person. Worse is the  
carrying forward of family names to offspring as given (or family?)  
names.

I think that it would be fairly straightforward to create a model that  
has high utility but near impossible to create a model that was  
"right" that would be used by anybody.

I've seen on Freebase that the quickest way to kill a practical  
schema  addition is to get mired in too much complexity. We can start  
simply, get data in and improve the model as needed, refactoring or  
denormalizing as we go.

Put another way, an imperfect model with data always trumps a  
"correct" one with no data.

>
>
> You'll also want a place to put honorific prefixes (Dr., Prof., Gen.,
> Rev., etc) and generational suffixes (Jr, Sr, III), particularly if
> you're going to be constructing "full" names out of their component
> pieces.  Nicknames are used in most English speaking cultures.  Other
> cultures have similar things like German ruf names or call names.

These seem orthogonal and could be added if there is data.
>
>
>> This raises a bigger question -- what other things would people  
>> like to do
>> with it?  See distributions of given names over time?  Of geographic
>> distribution of family names?
>
> This seems like the domain of applications.  Once it's possible to get
> the data, people can do whatever they want with it, but to start, with
> just being able to sort, like Raymond wants to do, or provide accurate
> data for that government form with the Surname field would be a big
> step forward.
>
> Tom
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling


More information about the Data-modeling mailing list