[Data-modeling] Library of Congress and Dewey Classifications

Praveen Paritosh paritosh at metaweb.com
Tue Mar 18 21:19:17 UTC 2008


Jeff Prucher wrote:
>   
> After poking around a bit more, it does seem like the LoC data is less
> completely weird than the Dewey data. What variation I've found (and from an
> admittedly small sample) is that sometimes the entire code is different at
> different libraries, but within that variation, the codes are stable at the
> edition level.  E.g., for a 1976 edition of "The Wealth of Nations", one
> library has "AC7 .S59 1976" and another has "HB161 .S65 1976", but I didn't
> see any with something like "HB161 .S64". So maybe LC classification should
> stay on the book edition.
> 
> The codes, regardless of how we do this, should definitely be
> machine-readable strings, rather than text strings (as they are now).  I'm
> not sure about modeling the entirety of Dewey/LoC classifications as topics
> -- we could do it (modulo any copyright issues -- I haven't looked into
> that), but I wonder which would be more useful -- a simple string or a
> topic?

For the top level 1000 of the Dewey Decimal Classification (DDC) 
numbers, we have a number and a label[1], e.g.,
	121 Epistemology (Theory of knowledge)

For the remaining vast majority (e.g., we currently have 170,000 DDC 
numbers that we have not loaded), we do not know the corresponding 
labels which seems to be copyrighted information that OCLC[2] provides.

However, as a result of having alternate subject information for those 
books, one can do mappings between /book/written_work/subjects and DDC 
numbers. The subjects themselves are topics, and so it seems unnecessary 
to model DDC as topics.

[1] http://www.tnrdlib.bc.ca/dewey.html
[2] http://www.oclc.org/dewey/

Thanks,
Praveen.


More information about the Data-modeling mailing list