[Developers] mql_escape and UTF-8
Shug Boabby
shug.boabby at gmail.com
Wed Jul 30 15:08:12 UTC 2008
Hi all,
I had a few posts last week regarding the Wikipedia ID and Freebase
name. I'm now running into the problem that the /wikipedia/en and
actual Wikipedia Name use completely different encoding schemes.
The (actual) Wikipedia name is a URL-encoded string that may
optionally use underscores instead of spaces.
The /wikipedia/en key uses MW Hex encoding and uses underscores
instead of spaces.
I am coding in Java and using the URLEncoder/URLDecoder classes (and a
regex to deal with spaces), I am comfortably able to convert the
Wikipedia Names to/from UTF-8 and a URL safe version.
However, I am unable to convert the MW Hex keys to/from UTF-8 because
I cannot find any existing code to do the conversion (I am also unsure
of the name of this, apparently custom encoding). Does anybody know of
any existing code to encode/decode MW Hex to/from UTF-8 in Java?
Also, I believe this subtle point should be documented in more detail
alongside any examples making use of /wikipedia/en because it means
that /wikipedia/en is definitely *not* the Wikipedia name.
More information about the Developers
mailing list