[Data-modeling] Complete rethinking of the publishing schemas, or oops!
Jeff Prucher
jeff at metaweb.com
Mon Mar 31 18:02:19 UTC 2008
I haven't removed anything -- you can still add named contents using the
"published work" and "publication" co-types as always. I'm basically
introducing a denormalization to reduce the input of bad data (e.g.,
contributors of an essay, story, or introduction being listed as authors of
the whole work). Annoyingly, I found a couple examples of this in Freebase,
and I totally failed to note the topics, so now I have no idea what they
were. Ah well.
Jeff
> -----Original Message-----
> From: data-modeling-bounces at freebase.com
> [mailto:data-modeling-bounces at freebase.com] On Behalf Of Ed Laurent
> Sent: Thursday, March 27, 2008 7:14 PM
> To: Freebase data modeling mailing list
> Subject: Re: [Data-modeling] Complete rethinking of the
> publishing schemas,or oops!
>
> This looks really good Jeff and addresses many of the issues
> we have discussed in the past, especially those having to do
> with reciprocation of authorship. However, I'm not that
> thrilled with the "contributing authors" property. My
> impression is that it overcomes the problem of requiring a
> name for the contribution by removing the ability to add the
> name if it is available. This is a big problem for me. I will
> often need to reference the author and chapter title of a
> contribution to an edited book. It appears this is not
> possible using your new schema.
>
> One possibility is to remove the "contributing authors"
> property of the Book type and create several specialty book
> types that contain Book as a co-type. For example, "Edited
> book" could contain a chapter property that is linked to a
> Chapter type with author disambiguation (or vice versa). The
> disambiguation could get us back full circle to your issue of
> not caring about authorship in some situations (e.g.,
> reference book). Maybe "Reference book" could be another
> specialty form of book?
>
> -Ed
>
>
>
> On Thu, Mar 27, 2008 at 6:37 PM, Jeff Prucher
> <jeff at metaweb.com> wrote:
>
>
> Freebase is working on doing a large import of data for
> books, and in the
> process people have discovered some problems with the
> current models for
> books, book editions, and authors. (By "current
> models", I mean the ones
> that we pushed out last week, sad to say.) Fortunately,
> I think that the
> revision I'm proposing here is actually better, anyway.
> I'm just sorry I
> didn't think of it earlier.
>
> The biggest problem is the CVT I introduced between
> "author" and "written
> work". This turns out to burn a large amount of data
> for compartively little
> gain; it's the sort of thing that would be fine if we
> were only going to
> have books numbering in the thousands, but the current
> load is probably
> going to be much larger, and as Freebase expands, it
> will just keep growing.
> So saving a little space now seems like a good option.
> The new model I'm
> proposing keeps the "author" and "written work" types,
> but links them
> directly via two simple properties, rather than one
> property with a CVT: On
> "written work", the properties are "author" and
> "editor"; the reverse
> properties on "author" are "works written" and "works
> edited". This
> maintains the distinction between the two roles, but
> without the CVT. The
> other authorial types we were linking from the CVT
> (poet, reviewer,
> playwright, etc.) refer more to the final product than
> the type of person
> doing the writing, so I don't think they're strictly
> necessary. (It was a
> fairly arbitrary set, anyway -- the difference between
> a poet and a
> playwright is probably not significantly greater than
> that between a
> novelist and journalist, say.) So in the newest model,
> the mode of
> authorship is entirely omitted (except for editor) from
> the author/written
> work relationship. The mode of authorship in a given
> instance can be
> determined by the cotypes on the written work, if so desired.
>
> To accommodate the data-load better, I've also added a
> new property to "book
> edition", which will link it directly to the "author"
> type, rather than
> having the only link to "author" be through the "book"
> type. This is a bit
> of a denormalization, but allows us to accurately
> associate book editions to
> authors without necessarily having to reconcile
> different editions of the
> same book together. (This reconciliation is desirable
> -- don't get me wrong,
> but it can be very difficult.) This property
> essentially mimics the way
> "musical artist" and "musical track" are related in
> Freebase -- artists are
> linked directly to both albums (which contain tracks)
> and the tracks
> themselves. It also makes the book schema more easily
> compatible with MARC,
> and probably other bibliographic schemata, which does
> not reconcile separate
> book editions together, and additionally will hopefully
> make it easier for
> naïve users to input the books on their shelves
> directly without having to
> figure out the whole book/book edition relationship.
> Ideally, of course, I'd
> like to see all book editions reconciled to their
> books, but this can be
> done post-hoc either by automated processes or geeky
> bibliophiles -- I mean,
> the community. :)
>
> A final property being added to book, which is
> completely incidental to the
> other problems, is "contributing authors". This is
> another denormalization
> of sorts -- in the original schema, the only way to
> indicate that someone
> had contributed to a book was via the "contents"
> property that connects
> "published work" and "publication". This requires that
> the user know the
> name of the work that is collected in the book, which
> is not always
> available, and in some cases (such as textbooks and
> some reference books)
> not really applicable.
>
> I'm thinking about leaving the illustrated work/illustration
> instance/illustrator CVT relationship as it is, since
> I'm not convinced
> removing the CVT will really help matters that much,
> but I'd like to hear
> people's thoughts on that as well.
>
> The affected types are:
> http://sandbox.freebase.com/view/schema/book/author
> http://sandbox.freebase.com/view/schema/book/written_work
> http://sandbox.freebase.com/view/schema/book/book_edition
> http://sandbox.freebase.com/view/schema/book/book
>
> I put in some sample data to show the new relationships:
> http://sandbox.freebase.com/view/en/jonathan_lethem
>
> Please let me know what you think.
>
> Thanks,
>
> Jeff Prucher
> Type Librarian & Ontologist
> Metaweb Technologies, Inc.
>
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
>
>
>
>
More information about the Data-modeling
mailing list