[Data-modeling] Writing UTF-8 in MQL strings

Kurt Bollacker kurt at spaceship.com
Fri Mar 27 18:22:50 UTC 2009


I've been writing language data to sandbox, and I'm trying to create
names for languages that use UTF-8 characters.  Consider the "More"
language at: 

http://sandbox.freebase.com/view/en/more_language

The accented version of the English /type/object/name looks wrong
here, but doing the MQL query:

{"id":"/en/more_language","name":{"value":null,"lang":"/lang/en"}}

Gives me the result (JSON encoded to keep it all ASCII here):

{"id": "/en/more_language", 
 "name": {"lang": "/lang/en", 
          "value": "M\u00c3\u00b2or\u00c3\u00a9 Language"}}

I believe these are the right characters in UTF-8 encoding, (at least
according to http://www.utf8-chartable.de/unicode-utf8-table.pl).

The German display name looks right in the explore view, but:

{"id":"/en/more_language","name":{"value":null,"lang":"/lang/de"}}

gives me:

{"id": "/en/more_language", 
 "name": {"lang": "/lang/de", 
          "value": "M\u00f2or\u00e9"}}

which is the correct encoding for ISO Latin 1, not UTF-8. All of this
makes me believe that I've done a correct MQL write of the name, but
for some reason Firefox 3 on Mac and Linux and Safari 4 (all of which
I've set for UTF-8 encoding in the preferences) display this
incorrectly.  A check in Firefox under Tools->Page Info tells me this
page is UTF-8 encoded.  I also did all my encoding with cjson and
Python 2.6's json module, with identical results. Also, For these two
queries, the metaweb.py and freebase-api up at google code seem to act
identically.

My question is, what went wrong?  Did I encode the name incorrectly,
are the browsers just buggy, or is the client doing something wrong?
This question is not academic-- there are a large number of
incrorrectly encoded names in www.freebase.com (not just sandbox) that
I want to correct, so I need to get this right.

Thanks.....							Kurt :-)








More information about the Data-modeling mailing list