[Data-modeling] Complete rethinking of the publishing schemas, or oops!

Jeff Prucher jeff at metaweb.com
Mon Mar 31 18:02:19 UTC 2008


I haven't removed anything -- you can still add named contents using the
"published work" and "publication" co-types as always. I'm basically
introducing a denormalization to reduce the input of bad data (e.g.,
contributors of an essay, story, or introduction being listed as authors of
the whole work).  Annoyingly, I found a couple examples of this in Freebase,
and I totally failed to note the topics, so now I have no idea what they
were. Ah well.

Jeff

> -----Original Message-----
> From: data-modeling-bounces at freebase.com 
> [mailto:data-modeling-bounces at freebase.com] On Behalf Of Ed Laurent
> Sent: Thursday, March 27, 2008 7:14 PM
> To: Freebase data modeling mailing list
> Subject: Re: [Data-modeling] Complete rethinking of the 
> publishing schemas,or oops!
> 
> This looks really good Jeff and addresses many of the issues 
> we have discussed in the past, especially those having to do 
> with reciprocation of authorship. However, I'm not that 
> thrilled with the "contributing authors" property. My 
> impression is that it overcomes the problem of requiring a 
> name for the contribution by removing the ability to add the 
> name if it is available. This is a big problem for me. I will 
> often need to reference the author and chapter title of a 
> contribution to an edited book. It appears this is not 
> possible using your new schema.
> 
> One possibility is to remove the "contributing authors" 
> property of the Book type and create several specialty book 
> types that contain Book as a co-type. For example, "Edited 
> book" could contain a chapter property that is linked to a 
> Chapter type with author disambiguation (or vice versa). The 
> disambiguation could get us back full circle to your issue of 
> not caring about authorship in some situations (e.g., 
> reference book). Maybe "Reference book" could be another 
> specialty form of book?
> 
> -Ed
> 
> 
> 
> On Thu, Mar 27, 2008 at 6:37 PM, Jeff Prucher 
> <jeff at metaweb.com> wrote:
> 
> 
> 	Freebase is working on doing a large import of data for 
> books, and in the
> 	process people have discovered some problems with the 
> current models for
> 	books, book editions, and authors. (By "current 
> models", I mean the ones
> 	that we pushed out last week, sad to say.) Fortunately, 
> I think that the
> 	revision I'm proposing here is actually better, anyway. 
> I'm just sorry I
> 	didn't think of it earlier.
> 	
> 	The biggest problem is the CVT I introduced between 
> "author" and "written
> 	work". This turns out to burn a large amount of data 
> for compartively little
> 	gain; it's the sort of thing that would be fine if we 
> were only going to
> 	have books numbering in the thousands, but the current 
> load is probably
> 	going to be much larger, and as Freebase expands, it 
> will just keep growing.
> 	So saving a little space now seems like a good option.  
> The new model I'm
> 	proposing keeps the "author" and "written work" types, 
> but links them
> 	directly via two simple properties, rather than one 
> property with a CVT: On
> 	"written work", the properties are "author" and 
> "editor"; the reverse
> 	properties on "author" are "works written" and "works 
> edited". This
> 	maintains the distinction between the two roles, but 
> without the CVT. The
> 	other authorial types we were linking from the CVT 
> (poet, reviewer,
> 	playwright, etc.) refer more to the final product than 
> the type of person
> 	doing the writing, so I don't think they're strictly 
> necessary. (It was a
> 	fairly arbitrary set, anyway -- the difference between 
> a poet and a
> 	playwright is probably not significantly greater than 
> that between a
> 	novelist and journalist, say.) So in the newest model, 
> the mode of
> 	authorship is entirely omitted (except for editor) from 
> the author/written
> 	work relationship. The mode of authorship in a given 
> instance can be
> 	determined by the cotypes on the written work, if so desired.
> 	
> 	To accommodate the data-load better, I've also added a 
> new property to "book
> 	edition", which will link it directly to the "author" 
> type, rather than
> 	having the only link to "author" be through the "book" 
> type. This is a bit
> 	of a denormalization, but allows us to accurately 
> associate book editions to
> 	authors without necessarily having to reconcile 
> different editions of the
> 	same book together. (This reconciliation is desirable 
> -- don't get me wrong,
> 	but it can be very difficult.) This property 
> essentially mimics the way
> 	"musical artist" and "musical track" are related in 
> Freebase -- artists are
> 	linked directly to both albums (which contain tracks) 
> and the tracks
> 	themselves. It also makes the book schema more easily 
> compatible with MARC,
> 	and probably other bibliographic schemata, which does 
> not reconcile separate
> 	book editions together, and additionally will hopefully 
> make it easier for
> 	naïve users to input the books on their shelves 
> directly without having to
> 	figure out the whole book/book edition relationship. 
> Ideally, of course, I'd
> 	like to see all book editions reconciled to their 
> books, but this can be
> 	done post-hoc either by automated processes or geeky 
> bibliophiles -- I mean,
> 	the community. :)
> 	
> 	A final property being added to book, which is 
> completely incidental to the
> 	other problems, is "contributing authors". This is 
> another denormalization
> 	of sorts -- in the original schema, the only way to 
> indicate that someone
> 	had contributed to a book was via the "contents" 
> property that connects
> 	"published work" and "publication". This requires that 
> the user know the
> 	name of the work that is collected in the book, which 
> is not always
> 	available, and in some cases (such as textbooks and 
> some reference books)
> 	not really applicable.
> 	
> 	I'm thinking about leaving the illustrated work/illustration
> 	instance/illustrator CVT relationship as it is, since 
> I'm not convinced
> 	removing the CVT will really help matters that much, 
> but I'd like to hear
> 	people's thoughts on that as well.
> 	
> 	The affected types are:
> 	http://sandbox.freebase.com/view/schema/book/author
> 	http://sandbox.freebase.com/view/schema/book/written_work
> 	http://sandbox.freebase.com/view/schema/book/book_edition
> 	http://sandbox.freebase.com/view/schema/book/book
> 	
> 	I put in some sample data to show the new relationships:
> 	http://sandbox.freebase.com/view/en/jonathan_lethem
> 	
> 	Please let me know what you think.
> 	
> 	Thanks,
> 	
> 	Jeff Prucher
> 	Type Librarian & Ontologist
> 	Metaweb Technologies, Inc.
> 	
> 	_______________________________________________
> 	Data-modeling mailing list
> 	Data-modeling at freebase.com
> 	http://lists.freebase.com/mailman/listinfo/data-modeling
> 	
> 
> 
> 



More information about the Data-modeling mailing list