[Data-modeling] Languages and Rosetta
Laura Welcher
laura at longnow.org
Tue May 5 01:21:12 UTC 2009
To chime in, the "macrolanguage" codes in ISO 639-3 do indeed refer to
groups of languages, rather than individual languages -- your example of
Chinese is probably the best known case of this. There are 30 macrolanguage
codes in the set (for a table of them see
http://www.sil.org/iso639-3/macrolanguages.asp). The reason for their
inclusion in the ISO set is for purposes of historical reference (they were
entities in previous codesets) and utility.
SIL, the organization that maintains the codeset, explains macrolanguage
this way:
"In various parts of the world, there are clusters of closely-related
language varieties that, based on the criteria discussed above, can be
considered distinct individual languages, yet in certain usage contexts a
single language identity for all is needed. Typical situations in which this
need can occur include the following:
- There is one variety that is more developed and that tends to be used
for wider communication by speakers of various closely-related languages; as
a result, there is a perceived common linguistic identity across these
languages. For instance, there are several distinct spoken Arabic languages,
but Standard Arabic is generally used in business and media across all of
these communities, and is also an important aspect of a shared
ethno-religious unity. As a result, a perceived common linguistic identity
exists.
- There is a common written form used for multiple closely-related
languages. For instance, multiple Chinese languages share a common written
form.
- There is a transitional socio-linguistic situation in which
sub-communities of a single language community are diverging, creating a
need for some purposes to recognize distinct languages while, for other
purposes, a single common identity is still valid. For instance, in some
contexts it is necessary to make a distinction between Bosnian, Croatian and
Serbian languages, yet there are other contexts in which these distinctions
are not discernible in language resources that are in use."
(see http://www.sil.org/iso639-3/scope.asp#M)
In practice, since macrolanguages are always groupings of closely related
languages, the Rosetta base will have matching group entity (so, for
example, http://www.freebase.com/view/base/rosetta/group/Chinese ).
However, my sense is that the Freebase entity Chinese is actually
semantically quite different from the Rosetta group Chinese, which refers to
a particular taxonomic level as well as an entity with a particular kind of
linguistic meaning.
On Mon, May 4, 2009 at 5:37 PM, Kurt Bollacker <kurt at spaceship.com> wrote:
>
>
> > I've flagged a few things for merger, but now I've come across
> > something that I can't even flag
> > http://www.freebase.com/view/base/rosetta/group/Chinese (because of
> > permissions perhaps?)
>
> oooh.. You picked a good one. Chinese is one of the few lingusitic
> entities that the linguists consider to be a "macrolanguage" or
> "language group" rather than a specific language. The Freebase topic
> for Chinese:
>
> http://www.freebase.com/view/en/chinese_language
>
> includes both the "language" and "group" properties in an intuitive
> (but non-rigorous to linguists) way. A good indicator of this is the
> fact that this topic is typed both "Human Language" and "Language
> Family", which is taxonomically impossible. However, for the real
> world, this entropy is OK.
>
>
--
Laura Welcher, Ph.D.
The Long Now Foundation
Director of Development and The Rosetta Project
laura at longnow.org
ph.415.561.6582
fx.415.561.6297
http://www.longnow.org
http://www.rosettaproject.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20090504/a4577889/attachment-0001.htm
More information about the Data-modeling
mailing list