[Developers] $0027 encodings in wikipedia topic names

Drew Perttula drewp at bigasterisk.com
Wed Feb 11 09:42:51 UTC 2009


curl http://rdf.freebase.com/rdf/en.barack_obama | grep value.value

   <http://rdf.freebase.com/ns/type.value.value> "Barry_O$0027Bama"],
   <http://rdf.freebase.com/ns/type.value.value> "Pres$002E_Obama"],
...

What is that $0027 encoding? I've never seen that style before. I also 
think the underscores might be spaces, but I'm not sure if that's always 
true. (I.e. maybe sometimes they are real underscores, and you can't 
tell which case is which.)

In that same file, the objects of 
<http://rdf.freebase.com/ns/type.object.name> are normal-looking unicode 
without $ escapes.

For the record, I'm currently using this python to undo the escaping:
     s = s.replace("_", " ")
     s = re.sub(r'\$(\d\d\d\d)', lambda g: unichr(int(g.group(1), 16)), s)




More information about the Developers mailing list