[Developers] Steps Toward a Complete History Export in Freebase

Hostile Fork hostilefork at gmail.com
Thu Jun 4 21:08:23 UTC 2009


Thanks to everyone for the detailed responses thus far!!  It appears  
that MQL can do more than I expected in getting recent deltas.   
There's an elegance to not needing a specialized API for this-- 
especially since MQL already has comprehensive filtering abilities.

However: if today's corpus contains 178 million links, then millions  
of individual queries is probably a bad way for people to download the  
full history (especially since value types will need to be retrieved  
too?)  I'd also imagine MQL's JSON would be too verbose for the  
history download files.  So it might be nice if an "all changes  
between (time1) and (time2)" REST API offered transactions in the same  
format as the dumps.

It's not *technically* necessary for Freebase to be the ones offering  
this service, so long as MQL can be post-processed to get the whole  
thing.  A third party (with a sufficiently relaxed API quota) *could*  
do the queries and then reflect from their own archive to anyone  
interested.  Their scrape would then become the "trust dump" that  
Spencer mentions.

But Freebase is in the best position to set standards and keep  
coherence.  Especially in tricky cases like selective deletion:

	http://en.wikipedia.org/wiki/Wikipedia:Selective_deletion

Has legal pressure or other "bad content issues" led to history  
revisions like this yet?  How might such a retraction be published to  
those operating on histories they obtained prior to the deletions?   
(Without notification of these deletions, people operating on  
histories would have an inconsistent view of the current state from  
Freebase.)

Tx!
    ---Brian

P.S. For anyone else following along and trying out the queries, I  
discovered this one works fine if you give it a timestamp that's not  
June 0th, 2000 :)  (e.g. "timestamp<=": "2009-06-01T00:00:00Z")

 > [{
 >   "type":         "/type/link",
 >   "creator": "/user/<username to inspect>",
 >   "source": {
 >     "*": null
 >   },
 >   "target": {
 >     "*": null
 >   },
 >   "timestamp>=": "2009-05-29T00:00:00Z",
 >   "timestamp<=": "2000-06-00T00:00:00Z",
 >   "timestamp": null,
 >   "operation": null,
 >   "valid": null,
 >   "sort":         "-timestamp"
 > }]




More information about the Developers mailing list