[Data-modeling] Writing UTF-8 in MQL strings
Kurt Bollacker
kurt at spaceship.com
Fri Mar 27 18:22:50 UTC 2009
I've been writing language data to sandbox, and I'm trying to create
names for languages that use UTF-8 characters. Consider the "More"
language at:
http://sandbox.freebase.com/view/en/more_language
The accented version of the English /type/object/name looks wrong
here, but doing the MQL query:
{"id":"/en/more_language","name":{"value":null,"lang":"/lang/en"}}
Gives me the result (JSON encoded to keep it all ASCII here):
{"id": "/en/more_language",
"name": {"lang": "/lang/en",
"value": "M\u00c3\u00b2or\u00c3\u00a9 Language"}}
I believe these are the right characters in UTF-8 encoding, (at least
according to http://www.utf8-chartable.de/unicode-utf8-table.pl).
The German display name looks right in the explore view, but:
{"id":"/en/more_language","name":{"value":null,"lang":"/lang/de"}}
gives me:
{"id": "/en/more_language",
"name": {"lang": "/lang/de",
"value": "M\u00f2or\u00e9"}}
which is the correct encoding for ISO Latin 1, not UTF-8. All of this
makes me believe that I've done a correct MQL write of the name, but
for some reason Firefox 3 on Mac and Linux and Safari 4 (all of which
I've set for UTF-8 encoding in the preferences) display this
incorrectly. A check in Firefox under Tools->Page Info tells me this
page is UTF-8 encoded. I also did all my encoding with cjson and
Python 2.6's json module, with identical results. Also, For these two
queries, the metaweb.py and freebase-api up at google code seem to act
identically.
My question is, what went wrong? Did I encode the name incorrectly,
are the browsers just buggy, or is the client doing something wrong?
This question is not academic-- there are a large number of
incrorrectly encoded names in www.freebase.com (not just sandbox) that
I want to correct, so I need to get this right.
Thanks..... Kurt :-)
More information about the Data-modeling
mailing list