[Data-modeling] Writing UTF-8 in MQL strings

Christopher R. Maden crism at maden.org
Sat Mar 28 04:13:46 UTC 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kurt Bollacker wrote:
> in the MQL Reference Guide, one can see that
> 
>   "The text of a /type/text value must be a string of Unicode
>    characters, encoded using the UTF-8 encoding."
> 
> Thus, MQL writes are supposed to be done with UTF-8 encoded
> characters, *NOT* unicode code points.  If you do that, however, you
> run into the same incorrect string display that I did.  So the problem
> is either:
> 
>  - The documentation is wrong and MQL does not support UTF-8, but
>    rather 16-bit unicode code points directly. 
>  
>       OR
> 
>  - The client is buggy in its display of UTF-8 characters.

Or that you misunderstood...

The *bytes* must be UTF-8.  Not that the characters must be encodings of
the representations of those bytes, but that the bytes on the wire must
be in UTF-8.

There is a levels of encoding problem here.  To communicate the string
“Mòoré” you can either:

• send those characters as a UTF-8 byte stream (\x4D \xC3 \xB2 \x6F \x72
\xC3 \xA9 on the wire)

• encode them using JSON encoding, becoming the string
“M\u00f2or\u00e9”, and then send those characters as a UTF-8 byte stream
(\x4D \x5C \x75 \x30 \x30 \x66 \x32 \x6F \x72 \x5C \x75 \x30 \x30 \x65
\x39 on the wire)

The byte stream must be UTF-8.  The characters are then interpreted
according to JSON syntax; the quotation marks are stripped, the \uhhhh
encoding is decoded, etc.

If you use JSON encoding, the \u notation must be of the Unicode code
points.  That is independent of the wire serialization of the MQL
statements.

~Chris
- --
Chris Maden, text nerd  <URL: http://crism.maden.org/ >
“All I ask of living is to have no chains on me,
 And all I ask of dying is to go naturally.” — Laura Nyro
GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknNo+4ACgkQGfhmdwB3wxnmHgCg0d+Xe2prJR+UXW6gkVfieB3m
4twAn0sGRbAY/PLRSWTBd+MyibMpihHH
=BYIv
-----END PGP SIGNATURE-----


More information about the Data-modeling mailing list