[Developers] Steps Toward a Complete History Export in Freebase
Hostile Fork
hostilefork at gmail.com
Thu Jun 4 21:08:23 UTC 2009
Thanks to everyone for the detailed responses thus far!! It appears
that MQL can do more than I expected in getting recent deltas.
There's an elegance to not needing a specialized API for this--
especially since MQL already has comprehensive filtering abilities.
However: if today's corpus contains 178 million links, then millions
of individual queries is probably a bad way for people to download the
full history (especially since value types will need to be retrieved
too?) I'd also imagine MQL's JSON would be too verbose for the
history download files. So it might be nice if an "all changes
between (time1) and (time2)" REST API offered transactions in the same
format as the dumps.
It's not *technically* necessary for Freebase to be the ones offering
this service, so long as MQL can be post-processed to get the whole
thing. A third party (with a sufficiently relaxed API quota) *could*
do the queries and then reflect from their own archive to anyone
interested. Their scrape would then become the "trust dump" that
Spencer mentions.
But Freebase is in the best position to set standards and keep
coherence. Especially in tricky cases like selective deletion:
http://en.wikipedia.org/wiki/Wikipedia:Selective_deletion
Has legal pressure or other "bad content issues" led to history
revisions like this yet? How might such a retraction be published to
those operating on histories they obtained prior to the deletions?
(Without notification of these deletions, people operating on
histories would have an inconsistent view of the current state from
Freebase.)
Tx!
---Brian
P.S. For anyone else following along and trying out the queries, I
discovered this one works fine if you give it a timestamp that's not
June 0th, 2000 :) (e.g. "timestamp<=": "2009-06-01T00:00:00Z")
> [{
> "type": "/type/link",
> "creator": "/user/<username to inspect>",
> "source": {
> "*": null
> },
> "target": {
> "*": null
> },
> "timestamp>=": "2009-05-29T00:00:00Z",
> "timestamp<=": "2000-06-00T00:00:00Z",
> "timestamp": null,
> "operation": null,
> "valid": null,
> "sort": "-timestamp"
> }]
More information about the Developers
mailing list