[Developers] [Data-modeling] Data load issues

Tom Morris tfmorris at gmail.com
Fri May 29 22:25:30 UTC 2009


On Thu, May 28, 2009 at 4:42 PM, Kirrily Robert <kirrily at metaweb.com> wrote:
> On 28/05/2009, at 1:03 PM, Reilly Hayes wrote:
>>
>> We are not relying on the community for this cleanup.  You will not
>> see these in the merge queue.  We did not want to swamp the merge
>> queue with these.
>>
>
> What about if the community discover them on their own and flag them
> for merge, though?  I think that's the situation we're dealing with
> here.

Precisely.  If I'm working on topics about one design sailboat classes
and go to add Uffa Fox as the designer of a class and find two of Uffa
Foxes, my first reaction based on the uniqueness of the name is that
they've got to be duplicates.  When I look at the two topics to try
and confirm my suspicion, I find one (apparently) completely blank.
No little note saying "work in progress - don't touch," no link to
OpenLibrary, no birth/death dates to help disambiguate such as
WorldCat might typically have, no real indication as to how to
proceed.

Given 1.3+ million, mostly notable, people and 0.5 million authors,
there's bound to be a huge overlap since "notable" people are pretty
likely to be authors and vice versa.  However, a lot of folks who are
principally known as naval architects or scientists or engineers
aren't necessarily going to be typed as Author (and might not even be
typed as Person making automated disambiguation almost impossible
without going back to do some analysis of the WP article).

To get people to treat these newly loaded OpenLibrary author topics
differently today, they need to be identifiable and people need to be
told what to do differently with them.

Tom


More information about the Developers mailing list