[Developers] mql_escape and UTF-8

Shug Boabby shug.boabby at gmail.com
Wed Jul 30 17:16:01 UTC 2008


argh, ok, i see why you did it... but I must admit this really really
stung me because it basically means that the /wikipedia/en key is
*not* the Wikipedia Name. Somebody please let me know if they write a
Java encoder/decoder for the $000 style encoding to UTF-8 (and the
more awkward inverse case). It's not that difficult to do, but it is
time consuming.

2008/7/30 Christopher R. Maden <crism at metaweb.com>:
> Shug Boabby wrote:
>> Thanks Chris... I think I'd already worked all that out, but I was
>> just wondering if anybody had actually written a Java encoder/decoder
>> between UTF-8/MW Hex. I realise it should be simple to convert the
>> $000 syntax, but it is troublesome to have to write this code myself.
>> I really wish you'd decided to just use the URL encoding scheme as
>> that would require no additional work on our side of things (despite
>> it looking ugly). It's just not standard enough (although, admittedly,
>> prettier).
>
> Besides the fact that URL-encoding is often broken, we deliberately
> chose a variant syntax to avoid double escaping.
>
> If we stored Garc%C3%ADa as a key in the Freebase graph, then the URL to
> access it would involve Garc%25C3%25ADa, which is just egregious.
>
> We could have allowed literal high characters in our keys, but there was
> a feeling that the keys should be as programatically usable as possible
> in their native form, which meant keeping them to ASCII only.
>
> ~Chris
> --
> Christopher R. Maden
> Data Architect
> Freebase.com: <URL: http://www.freebase.com/ >
> Metaweb Technologes, Inc. <URL: http://www.metaweb.com/ >
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
>


More information about the Developers mailing list