[Developers] Upcoming data download

Will Fitzgerald will at powerset.com
Fri Jun 27 16:53:32 UTC 2008


I'm doing some work on the March 2008 data, and am noticing that in some
cases the tabbed separated value files have names (column 1) that themselves
contain a tab.

For example, line 1949205 of data/common/document.tsv

contains a name which is "Subject:        Character Race vs Species", with a
tab character separating "Subject:" and "Character".

If this could be fixed in the upcoming data release (today??), that would be
great. I've only found 14 unique instances or so of this problem. The 'bad'
second column entries are:

 Points to the discogs page for this album.
 Points to the discogs page for this artist.
 This link type is used for Amazon ASINs.
Character Race vs Species
Indicates a webpage with a biography of an artist
Indicates the official home page for an artist.
Of Mice and Men (Houston Grand Opera feat. conductor: Patrick Summers) (disc
2)
Points to the discogs page for this artist.
This link type is used for Amazon ASINs
This link type is used for Amazon ASINs.
downloads\, Pearl Jam FAQ
has an official homepage at
Max Wedltscheko tter
???????? ??????????.

Will Fitzgerald



More information about the Developers mailing list