[Data-modeling] Languages and Rosetta

Tom Morris tfmorris at gmail.com
Tue May 5 05:39:37 UTC 2009


Thanks for the quick response guys, and congratulations on the
successful IPO -- oh wait, different Rosetta. :-)

On Mon, May 4, 2009 at 8:37 PM, Kurt Bollacker <kurt at spaceship.com> wrote:
>
>> What's the deal with the Rosetta base and bot?  Is this intended to
>> replace the current Language commons?
>
> No.  It is intended to co-exist and supplement the /language domain.

Co-exist in the terms of schema/types or topics or both?  It's a very
visible subject (obviously), so I think we need to make it easy for
users to make correct choices.  The fact that Langoid is the primary
type for your schema will obviously help keep civilians from picking
it, but things aren't as clear cut for topics.

>> I've flagged a few things for merger, but now I've come across
>> something that I can't even flag
>> http://www.freebase.com/view/base/rosetta/group/Chinese (because of
>> permissions perhaps?)
>
> oooh..  You picked a good one.  Chinese is one of the few lingusitic
> entities that the linguists consider to be a "macrolanguage" or
> "language group" rather than a specific language. The Freebase topic
> for Chinese:
>
> http://www.freebase.com/view/en/chinese_language
>
> includes both the "language" and "group" properties in an intuitive
> (but non-rigorous to linguists) way.

For the record, I was trying to merge
  http://www.freebase.com/view/base/rosetta/group/Chinese
  http://www.freebase.com/view/en/chinese_language

as a Language family.

> A good indicator of this is the
> fact that this topic is typed both "Human Language" and "Language
> Family", which is taxonomically impossible.  However, for the real
> world, this entropy is OK.

Actually, from my reading of the Wikipedia article, upon which the
Freebase typing is based, it's pretty clear that it should be a
Language Family, not a Human Language.  I think before I started it
was only typed as a Human Language, not a Language Family, but from
what I've seen in my travels through Freebase, I think you'll find
that almost everything has been typed Human Language - dialects,
languages, families, you name it.  Given the lack of guidance and the
amateurs doing the typing, that's not too surprising, but I don't
think it's an irrecoverable situation.  As long as the linguists
definitions aren't too completely wacky, I don't see why users
couldn't be convinced to use them (particularly if the type hierarchy
is already completely populated).

> The taxonomy of languoids in the Rosetta Basehas been created by
> linguists, and is purposefully trying to stay out of the way of common
> usage of languages when there is potential confusion.

Unfortunately, that's really hard to do because of the way the
autocomplete works.  Unless it's clear from the description that a
topic or type is from someplace other than the commons, there's no way
for a user to tell.

> I'd leave it alone until we are sure that the merge-meisters at
> Metaweb have done their stuff with the 600 pending requests.

OK, I'll put things on the back burner until the merge happens.  I
haven't seen anything except the ones that I flagged show up in the
Freebase public queue, so I presume these are all off in some magic
internal-only Metaweb queue currently.

Tom


More information about the Data-modeling mailing list