[Data-modeling] Multi-disc release normalization prototype
Christopher R. Maden
crism at metaweb.com
Thu Apr 17 05:40:04 UTC 2008
OK: 1.7 million triples later, a whole bunch of stuff is changed on sandbox.
<URL:
http://sandbox.freebase.com/view/guid/9202a8c04000641f8000000007e4a109 >
is a Mass Data Operation with a brief description of what happened.
I took all the releases which MusicBrainz gave a title ending in “(disc
1)” or “(bonus disc)” or similar things and juggled them around. See
for example Pink Floyd’s _The Wall_ <URL:
http://sandbox.freebase.com/view/en/the_wall > or The Beatles’ _The
Beatles_ <URL:
http://sandbox.freebase.com/view/guid/9202a8c04000641f8000000002f7114f >.
Take a look at your favorite multi-disc album or box set. What you
should see is a single album; its releases should be similarly simple.
However, those releases should also be Multi-Part Musical Release
instances, and should have Musical Release Components that are the
individual discs.
Please let me know if you find any serious problems!
There are two things I already know about:
1) When MusicBrainz has slightly different titles for the discs, we did
not reconcile them. There are too many similarly-named albums out
there, and I wanted to err on the conservative side. As MusicBrainz’s
data gets cleaner, this repeated reconciliation will pick that data up.
2) I did not change the names of discs. In some cases, a past
reconciliation caused disc 2 of both _The Wall_ and _The Beatles_ to
merge with the Wikipedia article for the album. The shorter name has
persisted through a bunch of gardening operations. A future version of
this process will correct the disc names in most cases.
~Chris
--
Christopher R. Maden
Data Architect
Metaweb Technologies, Inc.
<URL: http://www.metaweb.com/ >
More information about the Data-modeling
mailing list