[Developers] Loading Freebase into a Star Schema
Scott Meyer
sm at metaweb.com
Wed Feb 25 23:08:43 UTC 2009
Paul Houle wrote:
> I recently, with some effort, managed to load the Freebase January
> 2009 database into a Star Schema. Loading the 241 M facts took about a
> week with a seriously untuned system. Here's an article about the results:
>
> http://gen5.info/q/2009/02/25/putting-freebase-in-a-star-schema/
That's a great article.
Just for grins, I tried your example query (20 most popular
predicates) against graphd directly:
read cost="" ( result=((guid $count))
pagesize=20
sort=(-$count)
(<-typeguid $count=count live=dontcare newest>=0))
ok cost="tu=46251 ts=56 tr=46307 ..."
( (9202a8c04000641f800000000000000c 28374366)
(9202a8c04000641f8000000000000002 24491505)
(9202a8c04000641f800000000000000a 18006216)
(9202a8c04000641f80000000000000ca 14783688)
(9202a8c04000641f800000000000012e 5556711)
(9202a8c04000641f80000000011ae8b6 5418047)
(9202a8c04000641f800000000000012b 4721173)
(9202a8c04000641f80000000011ae8a5 4595978)
(9202a8c04000641f8000000003c9406e 4383153)
(9202a8c04000641f80000000000005de 3045908)
(9202a8c04000641f80000000000005bc 2848071)
(9202a8c04000641f80000000000005e5 1136296)
(9202a8c04000641f80000000000000d6 1089713)
(9202a8c04000641f80000000000000ce 1089351)
(9202a8c04000641f80000000000000d2 1088757)
(9202a8c04000641f80000000011ae8ad 1085601)
(9202a8c04000641f80000000000000f1 1035347)
(9202a8c04000641f8000000000000106 1035065)
(9202a8c04000641f80000000045320e1 1026395)
(9202a8c04000641f80000000000005c4 968540)
)
That's 46.3 seconds of real time, compared to your reported
result of "about a minute".
If you're just looking for a measure of relative popularity,
these results are fine, however they do include versioned
or deleted links. If you want exact results, we have to weed out
versioned and deleted links (something that the data dump does
for you) and this makes us much slower, about 216 seconds. Of course,
we're doing this ad-hoc, without any "CREATE INDEX..." so 216 seconds
(or 46) isn't too shabby.
As you get other specific results, we'd love to see them.
-Scott
More information about the Developers
mailing list