[Data-modeling] Complete rethinking of the publishing schemas, or oops!
Ed Laurent
spatial.db at gmail.com
Fri Mar 28 02:13:54 UTC 2008
This looks really good Jeff and addresses many of the issues we have
discussed in the past, especially those having to do with reciprocation of
authorship. However, I'm not that thrilled with the "contributing authors"
property. My impression is that it overcomes the problem of requiring a name
for the contribution by removing the ability to add the name if it is
available. This is a big problem for me. I will often need to reference the
author and chapter title of a contribution to an edited book. It appears
this is not possible using your new schema.
One possibility is to remove the "contributing authors" property of the Book
type and create several specialty book types that contain Book as a co-type.
For example, "Edited book" could contain a chapter property that is linked
to a Chapter type with author disambiguation (or vice versa). The
disambiguation could get us back full circle to your issue of not caring
about authorship in some situations (e.g., reference book). Maybe "Reference
book" could be another specialty form of book?
-Ed
On Thu, Mar 27, 2008 at 6:37 PM, Jeff Prucher <jeff at metaweb.com> wrote:
> Freebase is working on doing a large import of data for books, and in the
> process people have discovered some problems with the current models for
> books, book editions, and authors. (By "current models", I mean the ones
> that we pushed out last week, sad to say.) Fortunately, I think that the
> revision I'm proposing here is actually better, anyway. I'm just sorry I
> didn't think of it earlier.
>
> The biggest problem is the CVT I introduced between "author" and "written
> work". This turns out to burn a large amount of data for compartively
> little
> gain; it's the sort of thing that would be fine if we were only going to
> have books numbering in the thousands, but the current load is probably
> going to be much larger, and as Freebase expands, it will just keep
> growing.
> So saving a little space now seems like a good option. The new model I'm
> proposing keeps the "author" and "written work" types, but links them
> directly via two simple properties, rather than one property with a CVT:
> On
> "written work", the properties are "author" and "editor"; the reverse
> properties on "author" are "works written" and "works edited". This
> maintains the distinction between the two roles, but without the CVT. The
> other authorial types we were linking from the CVT (poet, reviewer,
> playwright, etc.) refer more to the final product than the type of person
> doing the writing, so I don't think they're strictly necessary. (It was a
> fairly arbitrary set, anyway -- the difference between a poet and a
> playwright is probably not significantly greater than that between a
> novelist and journalist, say.) So in the newest model, the mode of
> authorship is entirely omitted (except for editor) from the author/written
> work relationship. The mode of authorship in a given instance can be
> determined by the cotypes on the written work, if so desired.
>
> To accommodate the data-load better, I've also added a new property to
> "book
> edition", which will link it directly to the "author" type, rather than
> having the only link to "author" be through the "book" type. This is a bit
> of a denormalization, but allows us to accurately associate book editions
> to
> authors without necessarily having to reconcile different editions of the
> same book together. (This reconciliation is desirable -- don't get me
> wrong,
> but it can be very difficult.) This property essentially mimics the way
> "musical artist" and "musical track" are related in Freebase -- artists
> are
> linked directly to both albums (which contain tracks) and the tracks
> themselves. It also makes the book schema more easily compatible with
> MARC,
> and probably other bibliographic schemata, which does not reconcile
> separate
> book editions together, and additionally will hopefully make it easier for
> naïve users to input the books on their shelves directly without having to
> figure out the whole book/book edition relationship. Ideally, of course,
> I'd
> like to see all book editions reconciled to their books, but this can be
> done post-hoc either by automated processes or geeky bibliophiles -- I
> mean,
> the community. :)
>
> A final property being added to book, which is completely incidental to
> the
> other problems, is "contributing authors". This is another denormalization
> of sorts -- in the original schema, the only way to indicate that someone
> had contributed to a book was via the "contents" property that connects
> "published work" and "publication". This requires that the user know the
> name of the work that is collected in the book, which is not always
> available, and in some cases (such as textbooks and some reference books)
> not really applicable.
>
> I'm thinking about leaving the illustrated work/illustration
> instance/illustrator CVT relationship as it is, since I'm not convinced
> removing the CVT will really help matters that much, but I'd like to hear
> people's thoughts on that as well.
>
> The affected types are:
> http://sandbox.freebase.com/view/schema/book/author
> http://sandbox.freebase.com/view/schema/book/written_work
> http://sandbox.freebase.com/view/schema/book/book_edition
> http://sandbox.freebase.com/view/schema/book/book
>
> I put in some sample data to show the new relationships:
> http://sandbox.freebase.com/view/en/jonathan_lethem
>
> Please let me know what you think.
>
> Thanks,
>
> Jeff Prucher
> Type Librarian & Ontologist
> Metaweb Technologies, Inc.
>
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20080327/a819efde/attachment-0001.htm
More information about the Data-modeling
mailing list