[Developers] codes in topic name in topic.tsv?

Christopher R. Maden crism at metaweb.com
Sun Aug 24 23:27:58 UTC 2008


AJ Chen <canovaj at gmail.com> wrote:
> there are lots of codes like <C2><C3> in topic names in the topic.tsv file.
> what are they? code for special chars?

All of the data in Freebase is (or at least should be) UTF-8 encoded.  That’s assuming that you meant <C2><C3> as an example, as that is not a legal UTF-8 byte sequence; if you actually encounter an illegal UTF-8 byte sequence, that would be worth reporting a bug.

It is a documentation bug that the UTF-8 encoding was not clear on the download page.  Sorry about that.

~Chris
-- 
Christopher R. Maden
Data Architect
Freebase.com: <URL: http://www.freebase.com/ >
Metaweb Technologes, Inc. <URL: http://www.metaweb.com/ >


More information about the Developers mailing list