[Data-modeling] Writing UTF-8 in MQL strings
Christopher R. Maden
crism at maden.org
Fri Mar 27 21:55:41 UTC 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Kurt Bollacker wrote:
> I've been writing language data to sandbox, and I'm trying to create
> names for languages that use UTF-8 characters. Consider the "More"
> language at:
No, they use Unicode characters.
UTF-8 is a way of representing those characters as a series of bytes.
In JSON, characters should be encoded with their Unicode code points,
not as byte-wise representations.
The string you are looking for is “Mòoré” or “M\u00f2or\u00e9.” The “u”
in “\u” stands for “Unicode.”
It happens that the first 256 characters of Unicode are identical to the
first (and total) 256 characters of ISO Latin 1, which may have confused
you. However, the Latin 1 representations are the correct ones, as they
are also the Unicode representations.
HTH,
Chris
- --
Chris Maden, text nerd <URL: http://crism.maden.org/ >
“All I ask of living is to have no chains on me,
And all I ask of dying is to go naturally.” — Laura Nyro
GnuPG Fingerprint: C6E4 E2A9 C9F8 71AC 9724 CAA3 19F8 6677 0077 C319
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknNS1kACgkQGfhmdwB3wxnwMwCffyI8ZvQCPxL2qkPh//bx2USR
V14AoJYMqSpOf0WVzZCfbnImvRtBP1CL
=lULb
-----END PGP SIGNATURE-----
More information about the Data-modeling
mailing list