[Developers] MQL bug WRT case sensitivity

Scott Meyer sm at metaweb.com
Tue Mar 31 22:32:42 UTC 2009


Kurt Bollacker wrote:

>> All the IPA characters seem to have unique codes.
> 
> Google for "ASCII IPA". (e.g. find SAMPA and Kirshcenbaum) It's been
> practice for some to use pure ASCII to encode IPA, which treats upper
> and lower case as different.  This the problem I have here is that I
> seem to have an unholy mix of IPA representations. Don't blame me, the
> linguists made me do it!

And reason you can translate automatically from ASCII IPA into
unicode IPA is that ASCII IPA is mixed with plain English?
 
>> Assuming that I'm right about the character encoding issues
>> I'm not sure how to represent IPA in Freebase.  /lang/ipa
>> seems wrong since it would make the obvious "pronunciation"
>> property awkward to work with.  Under the current MQL,
>> you'd have to ask for it with an explicit "lang" : "/lang/ipa"
>> constraint.  Otoh, storing it as some other language, fx.
>> "/lang/en", seems just as bad, although technically correct if
>> you're describing the (er an?) English pronunciation of an
>> English word.
> 
> Unfortunately, I have many "Alernate Names" for Languages that are
> mostly (English) words (in ASCII) mixed in with representations of IPA
> for the rest.  The most manageable way I know to handle this is to
> make them all aliases in English.

...and there are too many English words which are also valid ASCII IPA?

-Scott



More information about the Developers mailing list