[Developers] Steps Toward a Complete History Export in Freebase
Hostile Fork
hostilefork at gmail.com
Fri May 29 22:19:37 UTC 2009
Hello Freebase + developers...!
As an open source activist, I'm naturally skeptical of a proprietary
engine being the foundation for Web 3.0+. But MetaWeb's technical
approach to the semantic web problem fits my intuition perfectly, and
I can't find anyone else who is doing such a stellar job at the
execution. Plus, pursuant to the "Free" in the Freebase name...the
data set is available for anyone to import, index, and even serve
through an API.
However: In speaking with the Graphd team in person, I expressed a
concern about "community trust" because it isn't a full export.
*Please correct me if I'm wrong*, but the finest granularity of
download looks like this:
http://download.freebase.com/datadumps/quad-sample.txt
There are no dates/times on which the transactions were entered. This
means Freebase is the only entity that can analyze the chronology, or
implement the crucial "as_of_time" feature as explained here:
http://blog.freebase.com/2009/02/02/mql-monday-looking-back-into-the-past-with-as_of_time/
No indication is given of which user made an assertion in that log.
Imagine if a bad piece of data is noticed: Freebase holds a unique
position to investigate the other changes made by that account. Thus
their ability to reconcile and analyze the corpus is privileged.
Lastly, these data sets are provided in monolithic files released on
an arbitrary 3 month delay. The most recent export was on March 23,
2009...and is now more than a month old. That's far too long for a
competing (or complementary) service based on the data to wait. As
anyone developing apps under Internet expectations can attest, even a
one minute lag for updates is too long!!
The good news is that there's a very simple solution to all of this.
Just establish a REST API which returns all the user modifications
that Freebase records for itself between (time1) and (time2). It's
perfectly fine to require a login for these queries, and to establish
quotas...so long as these are not deliberately designed to cripple a
mirroring effort.
Make sense? I'm happy to volunteer my time to assist in the
specification + documentation of such an API.
Thanks!
--Brian
http://hostilefork.com
More information about the Developers
mailing list