[Developers] MQL bug WRT case sensitivity

Kurt Bollacker kurt at spaceship.com
Tue Mar 31 01:24:28 UTC 2009


On Mon, Mar 30, 2009 at 04:57:58PM -0700, Scott Meyer wrote:
> Kurt Bollacker wrote:
> > On Mon, Mar 30, 2009 at 03:57:54PM -0700, Warren Harris wrote:
> >> On Mar 30, 2009, at 4:02 PM, Kurt Bollacker wrote:
> >>
> >>> Are there other good choices?
> >> I see that for the IPA example, you're storing this stuff under the  
> >> "alias" property. Is there any way you could store it as a new  
> >> property of type rawstring? That would get around the case- 
> >> insensitivity issue. 
> > 
> > I could, but then the usability in the client would be reduced, and
> > I'd be splitting aliases between /common/topic/alias and some new
> > property.  I think a "Right Model(tm)" would be to have the aliases be
> > in their own language (e.g. "/lang/ipa"), which would specifically be
> > case sensitive.
> 
> I'm a bit unclear on the need for case insensitivity in IPA.  Your
> original failure/example was a problem with the x-like characters
> known as "uvular fricative" and "velar fricative" (?, 0x03C7 and
> x, 0x0263, if your browser supports unicode).

My specific example was an upper vs. a lower case ASCII character in a
string that happened to contain unicode.   
 
> I don't see any mention of ascii/latin x either upper or lower case
> in the following
> 
> http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm
> http://www.linguistlist.org/unicode/ipa.html
> 
> All the IPA characters seem to have unique codes.

Google for "ASCII IPA". (e.g. find SAMPA and Kirshcenbaum) It's been
practice for some to use pure ASCII to encode IPA, which treats upper
and lower case as different.  This the problem I have here is that I
seem to have an unholy mix of IPA representations. Don't blame me, the
linguists made me do it!
 
> Assuming that I'm right about the character encoding issues
> I'm not sure how to represent IPA in Freebase.  /lang/ipa
> seems wrong since it would make the obvious "pronunciation"
> property awkward to work with.  Under the current MQL,
> you'd have to ask for it with an explicit "lang" : "/lang/ipa"
> constraint.  Otoh, storing it as some other language, fx.
> "/lang/en", seems just as bad, although technically correct if
> you're describing the (er an?) English pronunciation of an
> English word.

Unfortunately, I have many "Alernate Names" for Languages that are
mostly (English) words (in ASCII) mixed in with representations of IPA
for the rest.  The most manageable way I know to handle this is to
make them all aliases in English.
 
> Consider the case of a word which is spelled the same
> and means the same in two different languages but has
> different pronunciations.  Pretty typical when one
> language borrows a word from another.

Don't even get started.  This little project of mine could be a
rathole factory if I let it.
 

							Kurt :-)



More information about the Developers mailing list