[Data-modeling] how much work done on modeling of personal names -- even for surname + given name?

Tom Morris tfmorris at gmail.com
Sat Mar 7 00:21:53 UTC 2009


Responding to Scott and Glenn below ...

On Thu, Mar 5, 2009 at 3:09 PM, Scott Meyer <sm at metaweb.com> wrote:
> Tom Morris wrote:
>
>> I agree that modeling in the abstract is a fool's game, but I'm a
>> little confused by the assertion that there's no structured name data
>> available.  I run into it at every turn
>
> Right, and it is irksome to give up that structure to load things
> into Freebase, however, how useful would it be to have a mixture
> of more and less structured data?

I don't think you escape the migration problem unless you decide never
to fix the current situation.

> If we created the "obvious" CVT with first_name and last_name and
> used that (in addition to the current name) in cases where we have
> structured data, how valuable would that be?

Presumably you meant given_names and family_name. :-)  I think it
would be more useful than what exists today, but less useful than what
Ed is working on sketching out.

> Would you constrain query results to people having structured name
> data?  If not, then you're stuck trying to figure out a sort order,
> if so, you may miss results.

I hate to say it, but I suspect applications are going to need some
help from the underlying layers to make this refactoring work.  Having
said that, once the machinery is figured out it will be useful in
other circumstances.

On Thu, Mar 5, 2009 at 3:45 PM, glenn mcdonald
<gmcdonald at itasoftware.com> wrote:

>> Tom: The disadvantage is that it enforces a single global collating
>> sequence.
>
> Well, it doesn't "enforce" anything. It provides one more sorting
> option than you have now, so you at least have straight lexical sort
> (the current name) and the most culturally appropriate name sort (the
> new sortname). And by definition you can populate this new field with
> *something* for all people, and then fix the ones that need fixing.

OK, let me rephrase that.  It only *allows* a single collating
sequence.  The culturally appropriate collating sequence isn't known
until application run time.  It's dependent on the culture of the
application & viewer, not the culture associated with the name or the
person who holds the name.

Tom


More information about the Data-modeling mailing list