[Data-modeling] Possible revision to the way journal articlesare modeled

Jeff Prucher jeff at metaweb.com
Wed Sep 3 18:52:09 UTC 2008


Thanks for the pointer; I hadn't looked at this one before. There are
roughly a zillion different ontologies out there for modeling bibliographic
data, and rather than trying to base a model off of one of them, I think the
goal is to make Freebase's model compatible with as many of them as
possible.  (This is broadly true for all schemas in Freebase, not just
publishing.)  At a glance, it looks like the NLM's model is broadly
compatible, with the understanding that there are a bunch of things they
capture that don't apply to Freebase (like markup for article content). The
major issues I see are that they have "issue title" and "volume title" tags.
I'm in the midst of proposing that we do away with the "journal issue" type
entirely, which would lose that distinction, and we don't model volumes at
all, except as a plain-text property.  But we could also decide to simply
omit this data in imports as well.
 
Jeff


  _____  

From: data-modeling-bounces at freebase.com
[mailto:data-modeling-bounces at freebase.com] On Behalf Of Mohammad Al-Ubaydli
Sent: Wednesday, September 03, 2008 2:07 AM
To: Freebase data modeling mailing list
Subject: Re: [Data-modeling] Possible revision to the way journal
articlesare modeled


Hi Jeff,
my apologies that I am new and so have not seen your previous discussion
thread but I just wanted to make sure that you had already seen and
considered the National Library of Medicine's journal archiving DTD. This
website has very good documentation about the tags and hierarchy:

http://dtd.nlm.nih.gov/archiving/tag-library/2.3/index.html

The DTD has been adopted by most major publishing houses for use across all
their Science, Technology and Medicine journals. I would recommend using it
as the standard on which to model, except where data entry presents a
difficulty, because it would allow mass imports from PubMed and from
journals much more easily in the future.

(I used to work at NLM so would be happy to answer more questions if you
wanted to pursue this further.)

Best,
mohammad

Mohammad Al-Ubaydli, MD
e me at mo.md
w www.mo.md


On Wed, Sep 3, 2008 at 1:13 AM, Jeff Prucher <jeff at metaweb.com> wrote:


I'm reposting this because I've gotten some feedback at the related
discussion thread (<url:
http://www.freebase.com/discuss/threads/book#/guid/9202a8c04000641f800000000
<http://www.freebase.com/discuss/threads/book#/guid/9202a8c04000641f80000000
08cdbc2f> 
8cdbc2f>) and I'd like to see if anyone else has other thoughts before we
make a decision on this.

Original message:

Spurred on by an aside that spatialed made in some discussion post awhile
back (I can't locate the post), I'm considering revising the way that we
model journal issues.  Currently, each issue has its own topic, which links
to both the journal and the articles contained in that issue. (See
http://www.freebase.com/type/schema/book/journal_issue)  The main problems
with this format are that it is cumbersome to enter data, and also that most
bibliographic sources are concerned primarily with the article and the
journal, relegating the issue to a series of strings (volume, issue, date).
This latter issue might make integration with standard bibliographic schemas
a bit cumbersome, although it wouldn't be insurmountable.

As an experiment, though, I thought I'd try to see what a model that
eliminated the issue entirely looked like.  Here are the results (this links
to the filter view of the new CVT):
https://sandbox.freebase.com/type/view/book/journal_publication.

I've replaced the issue type with a CVT that connects the article and the
journal, and includes the standard bibliographic data of Volume, issue,
date, date extra, and pages ("date extra" is something I had to make up for
journals that aren't published on a schedule that translates into
mm/dd/yyyy).  Journal articles have both Scholarly Work and Written Work as
included types, although a journal article can also be a review, editorial,
letter or other type of writing.

The only real disadvantage that I see to this is that constructing the
contents of a given issue will be harder -- users will have to query on a
combination of several fields (volume, issue, etc.) to find what they're
looking for.

I'd love to hear what people think about this.

Thanks,

Jeff Prucher
Type Librarian & Ontologist
Metaweb Technologies, Inc.

_______________________________________________
Data-modeling mailing list
Data-modeling at freebase.com
http://lists.freebase.com/mailman/listinfo/data-modeling



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20080903/f3cba341/attachment.htm 


More information about the Data-modeling mailing list