[Data-modeling] keeping Freebase topics and Wikipedia pages in sync; uncertainty in who is the composer

Gordon Mackenzie gordon at metaweb.com
Tue Jul 21 23:07:06 UTC 2009


Sounds like a reasonable course, we tend to do weekly updates from WP  
to FB.

~ Gordon

<<< gordon at metaweb.com >>>



On Jul 21, 2009, at 3:50 PM, Raymond Yee wrote:

> I've been putting in some work on J. S. Bach base that I started a  
> while
> ago (http://jsbach.freebase.com/) -- specifically identifying Freebase
> topics corresponding to Bach cantatas, associating them with the topic
> /music/composition , the composer /en/johann_sebastian_bach and a BWV
> (/base/jsbach/bach_composition/bwv).  To help in the reconciliation
> process,  I scraped the wikipedia page
> http://en.wikipedia.org/wiki/List_of_cantatas_by_Johann_Sebastian_Bach
> looking for cantatas that have their own Wikipedia pages, figured out
> their "curid" with which I could then identify them in Freebase.
>
> For example:
>
> a) BWV 1 corresponds to
> http://en.wikipedia.org/wiki/Wie_sch%C3%B6n_leuchtet_der_Morgenstern
> ,which has a curid of 1505635  (curid is discoverable in the page  
> source
> i.e., (var wgArticleId = "1505635"); or via the wikpedia API)
> b) with the curid, you can then look up
> http://www.freebase.com/view/wikipedia/en_id/1505635 -- which is the
> same as  http://www.freebase.com/view/en/wie_schon_leuchtet_der_morgenstern
>
> For those approximately 80 cantatas with Wikipedia pages, I've now  
> made
> the ties to Bach and the BWV field
> (http://www.freebase.com/view/base/jsbach/views/bach_composition).  Of
> course, that still leaves a lot of cantatas that don't have either
> Wikipedia pages or Freebase topics. (e.g.,
> http://www.jsbach.org/bwv45.html)   Ideally (in my mind at least),  
> there
> should be a Wikipedia page for each Bach work and a Freebase topic  
> and a
> clear tie between them.  There is, in fact, a proposal to create at
> least Wikipedia stubs for each cantata
> (http://en.wikipedia.org/wiki/Talk:List_of_cantatas_by_Johann_Sebastian_Bach#Proposal_to_write_stubs_on_each_cantata 
> ).
>
>
> How should I proceed?  My proposed course of action is:
>
> 1) I go ahead with creating Freebase topics for the cantatas w/o any
> Freebase IDs currently
>
> 2) Start filling out the data as I can find them or as I can recruit
> help to fill them in for all the cantatas.
>
> 3) As I gather enough data to create Bach stuff articles on the
> Wikipedia, do so.
>
> 4) Wait for Freebase to discover the new Bach cantata pages and then
> flag them for merging.
>
> Does that make sense?  (I was thinking of focusing on creating the
> Wikipedia articles first, hope that they don't get deleted, and then
> wait for Freebase to pick them up....)
>
> In doing this upload of Bach cantata data into Freebase, I ran into  
> the
> issue of how to deal with work misattributed to J.S. Bach.   
> According to
> http://en.wikipedia.org/wiki/BWV:  "The BWV catalogue is occasionally
> updated, with newly discovered works added at its end, though spurious
> works do not have their numbers removed."  An example is BWV 15:
> http://www.freebase.com/view/en/denn_du_wirst_meine_seele_nicht_in_der_holle_lassen
> -- "BWV 15, is a church cantata spuriously attributed to Johann 
> Sebastian Bach but most likely composed by Johann Ludwig Bach."   
> What I
> did to model this is:
>
> 1) still have /base/jsbach/bach_composition/bwv = 15 for
> /en/denn_du_wirst_meine_seele_nicht_in_der_holle_lassen  -- but not  
> the
> type /base/jsbach/bach_composition
>
> and
>
> 2) go ahead with setting     /music/composition/composer  to   Johann
> Ludwig Bach
>
> What do you think?
>
> Thanks,
> -Raymond
>
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling



More information about the Data-modeling mailing list