[Developers] corruption in first column of freebase-datadump-quadruples.tsv.bz2 for 2008/07/01

Alexander Marks al at metaweb.com
Sun Jul 20 02:09:56 UTC 2008


Thanks for spotting this. I've grep'd out those 17 corrupt records and updated the tsv at our downloads site. Kurt has identified the origin of the problem, so the next one will have a complete fix. For anyone with this version, no need to re-download, just use: grep '^/guid/9' freebase-datadump-quadruples.tsv > freebase-datadump-quadruples.tsv.fixed

Al

----- Original Message -----
From: "Robin H. Johnson" <robbat2 at gentoo.org>
To: developers at freebase.com
Sent: Saturday, July 19, 2008 10:29:41 AM (GMT-0800) America/Los_Angeles
Subject: [Developers] corruption in first column of freebase-datadump-quadruples.tsv.bz2	for 2008/07/01

There's some weird non-GUID data in the first column.
Easiest way to find it is:
# grep '^/guid/[^9]' freebase-datadump-quadruples.tsv.20080701 -C1

These strings appear in the first column, and aren't referenced anywhere else.

The output from the above command follows:

/guid/9202a8c04000641f80000000087fcb24	/type/content/length		2932
/guid/\nTried to play the station on Z	/user/zsi_editorial/editorial/comment/quality		66
/guid/\nTried to play the station on Z	/user/zsi_editorial/editorial/comment/reviewed		2008-06-29T12:17:35.0000Z
/guid/9202a8c04000641f8000000000000624	/type/object/type	/type/property	
--
/guid/9202a8c04000641f80000000087fd16c	reverse_of:/location/location/geolocation	/guid/9202a8c04000641f80000000087fd082	
/guid/\nPlaylist download failed!" 920	/user/zsi_editorial/editorial/comment/reviewed		2008-06-23T08:08:37.0006Z
/guid/\nPlaylist download failed!" 920	/user/zsi_editorial/editorial/comment/quality		59
/guid/9202a8c04000641f8000000000000413	/type/usergroup/member	/guid/9202a8c04000641f80000000042b2e87	
--
/guid/9202a8c04000641f80000000087fd0a5	/common/document/in_reply_to	/guid/9202a8c04000641f80000000087903f1	
/guid/\nfor her Fresh Brew by becoming	/type/object/name	/lang/en	Caffeinated Ponderings
/guid/9202a8c04000641f8000000000000258	/type/property/schema	/guid/9202a8c04000641f8000000000000023	
--
/guid/9202a8c04000641f80000000087fd134	/business/employment_tenure/from		1978
/guid/check Blog for updates" 9202a8c0	/type/object/name	/lang/en	Sea Kayak Podcasts.com
/guid/9202a8c04000641f800000000000020b	/type/property/master_property	/guid/9202a8c04000641f800000000000000a	
--
/guid/9202a8c04000641f80000000087fd08e	/metropolitan_transit/transit_stop/transit_lines	/guid/9202a8c04000641f80000000087fd018	
/guid/\nat System.Web.UI.Page.ProcessR	/user/zsi_editorial/editorial/comment/quality		12
/guid/\nat System.Web.UI.Page.ProcessR	/user/zsi_editorial/editorial/comment/reviewed		2008-06-23T03:28:44.0000Z
/guid/9202a8c04000641f8000000000000333	/type/object/type	/type/usergroup	
--
/guid/9202a8c04000641f80000000087fc8e5	/business/employment_tenure/person	/guid/9202a8c04000641f80000000087fc8e7	
/guid/\nImage does not appear on devic	/user/zsi_editorial/editorial/comment/reviewed		2008-06-23T08:15:02.0004Z
/guid/\nImage does not appear on devic	/user/zsi_editorial/editorial/comment/quality		73
/guid/9202a8c04000641f8000000000000418	reverse_of:/community/discussion_thread/topic	/guid/9202a8c04000641f8000000007b1af05	
--
/guid/9202a8c04000641f80000000087fd033	/type/object/type	/common/topic	
/guid/\nThe freebase entry for Wallstr	/user/zsi_editorial/editorial/comment/reviewed		2008-07-01T06:38:23.0000Z
/guid/\nThe freebase entry for Wallstr	/user/zsi_editorial/editorial/comment/quality		77
/guid/9202a8c04000641f800000000000038f	/type/permission/controls	/guid/9202a8c04000641f8000000000000398	
--
/guid/9202a8c04000641f80000000087fcfa4	/type/content/text_encoding	/guid/9202a8c04000641f800000000000388e	
/guid/\nPlayback on the device stutter	/user/zsi_editorial/editorial/comment/quality		49
/guid/\nPlayback on the device stutter	/user/zsi_editorial/editorial/comment/reviewed		2008-06-27T06:30:45.0007Z
/guid/9202a8c04000641f800000000000049f	/type/permission/controls	/guid/9202a8c04000641f80000000000008f3	
--
/guid/9202a8c04000641f80000000087fd070	/metropolitan_transit/transit_stop/transit_lines	/guid/9202a8c04000641f80000000087fd018	
/guid/\nE'Mu" 9202a8c04000641f80000000	/type/object/key	/guid/9202a8c04000641f8000000001143432	ARTIST212836
/guid/\nE'Mu" 9202a8c04000641f80000000	/type/object/key	/guid/9202a8c04000641f800000000114342d	735ae537-9825-40b0-af80-3e342ddd5a55
/guid/9202a8c04000641f800000000000012b	/type/object/key	/boot	has_left_order
--
/guid/9202a8c04000641f80000000087fc680	reverse_of:/metropolitan_transit/transit_stop/service_hours	/guid/9202a8c04000641f8000000000cafd67	
/guid/\n-Sales Agents" 9202a8c04000641	/type/object/name	/lang/en	Professional Development for Women and Minorities
/guid/9202a8c04000641f8000000000000343	/type/object/name	/lang/en	Domain owners


-- 
Robin Hugh Johnson
Gentoo Linux Developer & Infra Guy
E-Mail     : robbat2 at gentoo.org
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85



More information about the Developers mailing list