[Developers] MQL bug WRT case sensitivity

Tom Morris tfmorris at gmail.com
Fri Mar 27 11:20:57 UTC 2009


On Fri, Mar 27, 2009 at 7:04 AM, Philip Kendall
<philip-freebase at shadowmagic.org.uk> wrote:
> On Fri, Mar 27, 2009 at 06:58:21AM -0400, Tom Morris wrote:
>> If you're working with multinational characters, you should also be
>> aware of this bug
>> https://bugs.freebase.com/browse/GD-570
>
> which is (at least at present) a super secret Metaweb-only bug as well.
> I'm guessing you were the reporter so you can see it.

Oops!  Sorry about that.  Here's a transcript of the one that I can
see.  Not sure what the other says...
------------
 Description   	« Hide
[This is miscategorized, but I can't figure out where the heck the MQL
category is]

This query produces no results:

[ { "alias" : "Aníbal ACEVEDO-VILÁ",
    "type" : "/common/topic"}]

but this one

[{ "alias" : "Aníbal ACEVEDO-VILá",
    "type" : "/common/topic" }]

returns the expected topic /en/anibal_acevedo_vila

 All  	 Comments  	 Work Log  	 Change History  	  	  Sort Order:
Ascending order - Click to sort in descending order
[ Permlink | « Hide ]
David Stafford - 18/Mar/09 04:45 PM
What you're asking for is unicode support in graphd. This is tracked
by GD-106 and the fix for GD-106 should cover your case.
[ Show » ]
David Stafford - 18/Mar/09 04:45 PM What you're asking for is unicode
support in graphd. This is tracked by GD-106 and the fix for GD-106
should cover your case.

[ Permlink | « Hide ]
Tom Morris - 18/Mar/09 05:10 PM
I get a big red Permission Violation when I try to access that. Can
someone give me permission to see it?

What character set(s) is/are supported today?
[ Show » ]
Tom Morris - 18/Mar/09 05:10 PM I get a big red Permission Violation
when I try to access that. Can someone give me permission to see it?
What character set(s) is/are supported today?

[ Permlink | « Hide ]
David Stafford - 18/Mar/09 06:40 PM
I added you as a watcher to GD-106 which should give you access to the
bug. The plan is to ignore diacriticals when matching characters. Thus
a, A, Á, ä, ã, á, and so forth will be considered equivalent when
testing for equality. This should provide
more reasonable behavior for latin-based alphabets. For non-latin
based alphabets, our behavior will remain the same. Do you have
any specific needs WRT unicode behavior?
[ Show » ]
David Stafford - 18/Mar/09 06:40 PM I added you as a watcher to GD-106
which should give you access to the bug. The plan is to ignore
diacriticals when matching characters. Thus a, A, Á, ä, ã, á, and so
forth will be considered equivalent when testing for equality. This
should provide more reasonable behavior for latin-based alphabets. For
non-latin based alphabets, our behavior will remain the same. Do you
have any specific needs WRT unicode behavior?

[ Permlink | « Hide ]
Tom Morris - 19/Mar/09 12:43 PM
Still can't see that bug. I suspect that some users might want to
control whether characters get folded to their lower case form,
retaining the diacritics, versus being folded all the way back to the
base unaccented character, but the big hammer approach would be a good
start.

The discussion of 'strength' at http://unicode.org/reports/tr10/ might
help provide some insight into the requirements.
[ Show » ]
Tom Morris - 19/Mar/09 12:43 PM Still can't see that bug. I suspect
that some users might want to control whether characters get folded to
their lower case form, retaining the diacritics, versus being folded
all the way back to the base unaccented character, but the big hammer
approach would be a good start. The discussion of 'strength' at
http://unicode.org/reports/tr10/ might help provide some insight into
the requirements.


More information about the Developers mailing list