[Data-modeling] Languages and Rosetta
Kurt Bollacker
kurt at spaceship.com
Tue May 5 00:37:28 UTC 2009
I'll fess up. Rosetta bot is me.
On Mon, May 04, 2009 at 06:37:21PM -0400, Tom Morris wrote:
> What's the deal with the Rosetta base and bot? Is this intended to
> replace the current Language commons?
No. It is intended to co-exist and supplement the /language domain.
> They/it/whatever appear to have taken a more rigorous and regular
> approach than the Freebase's existing combination Wikipedia article
> plus typing by "I always wanted to play a linguist on the Web" users
> (like me), which is good, but there also appears to be an awful lot of
> duplication, which is bad.
I've handed over 600 hand vetted mergeable languages pairs to the
folks in charge of such matters, but I do not know if all those merges
have happened yet. Also, I have about 300-400 additional potential
mergers that have not yet been looked at.
> I've flagged a few things for merger, but now I've come across
> something that I can't even flag
> http://www.freebase.com/view/base/rosetta/group/Chinese (because of
> permissions perhaps?)
oooh.. You picked a good one. Chinese is one of the few lingusitic
entities that the linguists consider to be a "macrolanguage" or
"language group" rather than a specific language. The Freebase topic
for Chinese:
http://www.freebase.com/view/en/chinese_language
includes both the "language" and "group" properties in an intuitive
(but non-rigorous to linguists) way. A good indicator of this is the
fact that this topic is typed both "Human Language" and "Language
Family", which is taxonomically impossible. However, for the real
world, this entropy is OK.
The taxonomy of languoids in the Rosetta Basehas been created by
linguists, and is purposefully trying to stay out of the way of common
usage of languages when there is potential confusion. As it matures,
perhaps some of /base/rosetta will be promoted to /language as is
desirable. On the other hand, even now the /base/rosetta/languoid
type provides a taxonomy that is interesting to browse.
> Should I just leave this stuff for some type global
> reconciliation/cleanup later? Which types/topics/schemas should we
> be using in modeling stuff that needs languages?
I'd leave it alone until we are sure that the merge-meisters at
Metaweb have done their stuff with the 600 pending requests.
Kurt :-)
More information about the Data-modeling
mailing list