[Developers] mqlread http response header (content-type)

Tim Kientzle tim at metaweb.com
Fri Sep 19 17:35:43 UTC 2008


The charset is not the problem here.  The problem is more basic:  Our  
JSON
encoder isn't correctly encoding the 0x1A control characters.  (I  
misread
the spec earlier; JSON doesn't permit control characters and  
apparently your
Java JSON decoder is strict about this.)

The problem is that this quoted JSON string in the response:
   "Friesengeist 2: Regelm<sub>ässige Zerstö<sub>rungen" (<sub> is the  
0x1a control character)

Is being encoded as this series of bytes in the HTTP body (using C- 
style escapes):

   "Friesengeist 2: Regelm\x1a\xc3\xa4ssige Zerst\xc3\xb6\x1arungen"

The \xc3\xa4 and \xc3\xb6 sequences are valid UTF-8 codes, but the  
bare \x1a characters should be escaped to satisfy JSON requirements;  
we should be generating this:

   "Friesengeist 2: Regelm\\u001a\xc3\xa4ssige Zerst\xc3\xb6\ 
\u001arungen"

In particular, changing the content-type or charset declaration won't  
help here.  In particular, the default character encoding for  
application/json is still UTF-8 (per RFC 4627), so you'll still see  
the same problem were we to change the content-type or charset  
modifier.  The underlying JSON encoding is still doing the wrong thing  
and your Java JSON decoder will still choke.

I've filed a bug; this should be fixed in the next release cycle in a  
couple of weeks.

Thank you very much for reporting this.

Tim Kientzle

On Sep 19, 2008, at 10:21 AM, Augusto Callejas wrote:

> alec-
>
> there is no reason why i need "application/json".
> i'm more concerned that the current response includes "char-set=utf-8"
>
> from my previous reply to the mailing list:
>
> my problem is with the mqlread HTTP response header Content-type,  
> which is
> returning char-set=”utf-8”.
> i contend that the the content type should be either “application/ 
> json” or
> “text/plain” WITHOUT char-set=”utf-8”.
> by not specifying the char-set, the “\u001a” characters above would  
> pass
> thru the HTTP layer without being decoded,
> and thus allow any JSON library to decode them properly.
>
> there seems to be a conflation between encoding on the HTTP layer and
> encoding on the JSON layer.
>
> “\u001a” --(HTTP utf-8)--> string with one character 0x1a  --(JSON)-->
> invalid character in JSON (should be encoded)
>
> “\u001a”--(HTTP w/o utf-8) --> “\u001a” --(JSON)--> string with one
> character 0x1a
>
> you can see this with the following query:
>
> =====
> {"q":{"query":
> {
>  "master_property" : "/type/object/name",
>  "operation" : "delete",
>  "source" : {
>    "guid" : "#9202a8c04000641f8000000006e31736"
>  },
>  "target_value" : {
>    "value" : null
>  },
>  "timestamp" : "2008-08-30T18:56:23.0000Z",
>  "type" : "/type/link",
>  "valid" : null
> }
> }}
> =====
>
> augusto.
>
>
> On 9/19/08 9:49 AM, "Alec Flett" <alecf at metaweb.com> wrote:
>
>>
>> Below is the long answer. The short answer is "the docs are out of
>> date" :)
>>
>> Much of the reasoning is practical -  application/json made it really
>> hard to debug because browsers refuse to display application/json
>> directly in the browser - application/json makes the browser prompt
>> the user with a save/launch dialog. Not to mention that certain APIs
>> (like image upload) can only be run in IFRAMEs in the browser, which
>> means those APIs must return text/plain so they can be intercepted  
>> and
>> parsed by JS.
>>
>> When you're developing against a JSON API, it's much easier to see
>> this stuff visible instead of launching a separate app like notepad /
>> textedit.
>>
>> When considering the above, there didn't seem to be any actual value
>> in making it return application/json - nobody we talked to could come
>> up with a reason that making it application/json would be any better
>> than it returning text/plain
>>
>> Augusto - is there a particular reason you need application/json?
>>
>> Alec
>>
>> On Sep 18, 2008, at 11:53 AM, Christine Weibel wrote:
>>
>>>
>>> Hmm.  mqlread did used to return a content-type of application/json,
>>> but there was a deliberate change to text/plain a while ago.  it is
>>> my recollection that was done for some browser compatibility issue?
>>> perhaps alecf remembers this.
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: "Augusto Callejas" <acallejas at appliedminds.com>
>>> To: "For discussions about MQL, Freebase API and apps built on
>>> Freebase" <developers at freebase.com>
>>> Sent: Thursday, September 18, 2008 11:39:29 AM (GMT-0800) America/
>>> Los_Angeles
>>> Subject: Re: [Developers] mqlread http response header (content- 
>>> type)
>>>
>>> according to the documenation:
>>>
>>> http://www.freebase.com/view/guid/9202a8c04000641f800000000544e139
>>>
>>> =====
>>> 4.2.2. mqlread Output
>>>
>>> mqlread returns an HTTP response with a Content-Type header of
>>> application/json (or text/plain if the callback parameter was
>>> specified).
>>> The body of the response is a JSON-serialized envelope object that
>>> holds a
>>> MQL result object (or objects if multiple queries were submitted).  
>>> The
>>> format of mqlread response envelopes is specified in Section 4.2.3
>>> =====
>>>
>>> however i did not specify a callback parameter, so how do i get
>>> mqlread to
>>> return its response as "application/json" and not text/plain;
>>> charset="utf-8".
>>>
>>> thanks,
>>> augusto.
>>>
>>>
>>> On 9/17/08 9:36 AM, "Augusto Callejas" <acallejas at appliedminds.com>
>>> wrote:
>>>
>>>> hi-
>>>>
>>>> when i perform a mqlread query, i get back HTTP response headers
>>>> that look
>>>> like:
>>>>
>>>> =====
>>>> Date: Wed, 17 Sep 2008 00:51:35 GMT
>>>> ,
>>>> Server: Apache
>>>> ,
>>>> X-Metaweb-Success: 1/1
>>>> ,
>>>> Content-Length: 35616
>>>> ,
>>>> Content-Type: text/plain; charset="utf-8"
>>>> ,
>>>> ...
>>>> Connection: Keep-Alive
>>>> =====
>>>>
>>>>
>>>> shouldn't the "Content-Type" header value be "application/json"?
>>>>
>>>> http://www.iana.org/assignments/media-types/application/
>>>>
>>>> i'm having a problem with the 'charset="utf-8"' in the header.
>>>> the java http client library that i'm using (
>>>> http://hc.apache.org/httpclient-3.x/) reads the response (see  
>>>> example
>>>> below), and decodes any unicode characters (ie. /u001a) into their
>>>> actual
>>>> character value.  however, this is before i get a chance to decode
>>>> it in my
>>>> json library.  however, the unicode characters have already been
>>>> decoded at
>>>> the http level, but not at the json level.
>>>>
>>>> any thoughts?
>>>>
>>>> thanks,
>>>> augusto.
>>>>
>>>>
>>>> query
>>>> =====
>>>> {"q":{"query":
>>>> {
>>>> "master_property" : "/type/object/name",
>>>> "operation" : "delete",
>>>> "source" : {
>>>>   "guid" : "#9202a8c04000641f8000000006e31736"
>>>> },
>>>> "target_value" : {
>>>>   "value" : null
>>>> },
>>>> "timestamp" : "2008-08-30T18:56:23.0000Z",
>>>> "type" : "/type/link",
>>>> "valid" : null
>>>> }
>>>> }}
>>>> =====
>>>>
>>>>
>>>> response
>>>> =====
>>>> {
>>>> "master_property" : "/type/object/name",
>>>> "operation" : "delete",
>>>> "source" : {
>>>>   "guid" : "#9202a8c04000641f8000000006e31736"
>>>> },
>>>> "target_value" : {
>>>>   "value" : "Friesengeist 2: Regelm\u001aässige Zerstö\u001arungen"
>>>> },
>>>> "timestamp" : "2008-08-30T18:56:23.0000Z",
>>>> "type" : "/type/link",
>>>> "valid" : null
>>>> }
>>>> =====
>>>>
>>>>
>>>> _______________________________________________
>>>> Developers mailing list
>>>> Developers at freebase.com
>>>> http://lists.freebase.com/mailman/listinfo/developers
>>>
>>>
>>> _______________________________________________
>>> Developers mailing list
>>> Developers at freebase.com
>>> http://lists.freebase.com/mailman/listinfo/developers
>>>
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers



More information about the Developers mailing list