[Data-modeling] how much work done on modeling of personal names -- even for surname + given name?

Scott Meyer sm at metaweb.com
Sat Mar 7 01:01:30 UTC 2009


Tom Morris wrote:
> Responding to Scott and Glenn below ...
> 
> On Thu, Mar 5, 2009 at 3:09 PM, Scott Meyer <sm at metaweb.com> wrote:
>> Tom Morris wrote:
>>
>>> I agree that modeling in the abstract is a fool's game, but I'm a
>>> little confused by the assertion that there's no structured name data
>>> available.  I run into it at every turn
>> Right, and it is irksome to give up that structure to load things
>> into Freebase, however, how useful would it be to have a mixture
>> of more and less structured data?
> 
> I don't think you escape the migration problem unless you decide never
> to fix the current situation.

Ah, by "fix the current situation" you mean "create a new structured
name CVT and _remove_ name?"

>> If we created the "obvious" CVT with first_name and last_name and
>> used that (in addition to the current name) in cases where we have
>> structured data, how valuable would that be?
> 
> Presumably you meant given_names and family_name. :-)  I think it
> would be more useful than what exists today, but less useful than what
> Ed is working on sketching out.
> 
>> Would you constrain query results to people having structured name
>> data?  If not, then you're stuck trying to figure out a sort order,
>> if so, you may miss results.
> 
> I hate to say it, but I suspect applications are going to need some
> help from the underlying layers to make this refactoring work.  Having
> said that, once the machinery is figured out it will be useful in
> other circumstances.

You're aiming at some sort of name psudo-property which would
be created out of raw materials stored in a hypothetical name
CVT?

I think that the difficulty with this sort of thing is not the
underlying string concatenation, it is the correctly localized
UI code which guides anyone through the entry of a structured
name.  Analogous to what we do with dates.  The date stuff
is tractable because the date format is well defined and has
a very limited set of terminals (numbers from 1 - 31, 1 - 12,
month names, etc.) and a limited set of localizations.  Names
are much worse on every score.

-Scott


More information about the Data-modeling mailing list