[Developers] dealing with duplicates

Alec Flett alecf at metaweb.com
Wed May 23 18:08:06 UTC 2007


This is a great question - convergence problems are always going to 
exist in freebase and it would be great to flush out some specific 
patterns to deal with them. After all, it takes just one broken program 
or person creating a second topic called "The Beatles" - even if some 
process clear up duplicates in 6 hours, that's 6 hours of "confused" data.

I'd love to hear some suggestions on how to address this. My ideas 
follow, but I'd be really curious if other developers have other ideas 
about how to solve this problem.

Would it help to have a centralized authority on "official" topics? How 
would that authority work? Is there a way to have a decentralized 
authority so that Music experts could define the "official" Beatles and 
Chemistry experts could define the "official" Uranium? Would having 
these multiple authorities make queries unnecessarily complex? Maybe MQL 
could be told to only return "official" topics if so asked? Other ideas?

My own suggestion, which will work today and may not work in the future, 
is to look at the "key" reference to see what namespaces a topic appears 
in. For example, well-known topics from wikipedia will include 
"/wikipedia/en" - making wikipedia a sort of authority.

Take a look at the results of this query:
{
  "q":{
    "query":[{
      "id":null,
      "key":[{
        "namespace":null,
        "optional": true
      }],
      "name":"The Beatles",
      "type":"/music/artist"
    }]
  }
}

Unfortunately there are lots of keys for this band, but you'll notice 
that only one of them has lots of keys in /wikipedia/en. This is nice, 
but it what happens if your topic isn't in /wikipedia/en? Unfortunatly 
there is no ... at the moment there is no "central authority" declaring 
which topics are the official topics for a particular "thing" - what if 
Wikipedia doesn't know anything but some other namespace (musicbrainz?) 
does?

It could be said that a topic existing in any namespace is better than a 
no namespace at all. That could give you this query:

{
  "q":{
    "query":[{
      "id":null,
      "key":{"namespace":null, "limit":1},
      "name":"The Beatles",
      "type":"/music/artist"
    }]
  }
}

This says "Give me all The Beatles who are music artists that appear in 
any namespace"

But this doesn't help if there's some new topic that's not in ANY 
namespace.... hrm.

Alec

> On May 23, 2007, at 8:02 AM, Steve Sak wrote:
>
>   
>> I perform a query to get the values of all the properties for
>> "type":"/music/artist" "name":"The Beatles" and get three results from
>> different creators.  How should I go about determining which one to  
>> use
>> besides the obvious "the one with data" which wont always be the the
>> unique identifying factor.
>>
>> {
>>   "qname": {
>>     "query":[{
>>       "id": null,
>>       "type":"/music/artist",
>>       "name":"The Beatles",
>>          "/music/artist/origin":[],
>>          "/music/artist/active_start":[],
>>          "/music/artist/active_end":[],
>>          "/music/artist/genre":[],
>>          "/music/artist/label":[],
>>          "/music/artist/similar_artist":[],
>>          "/music/artist/home_page":[],
>>          "/music/artist/acquire_webpage":[],
>>          "/music/artist/member":[],
>>          "/music/artist/album":[],
>>          "/music/artist/contribution":[],
>>          "/music/artist/track":[],
>>          "/music/artist/artist_similar":[]
>> }]
>>   }
>> }
>>
>>  ({
>>   "status": "200 OK",
>>   "qname": {
>>     "status": "/mql/status/ok",
>>     "result": [
>>       {
>>         "/music/artist/active_start": [
>>           "1957"
>>         ],
>>         "name": "The Beatles",
>>         "/music/artist/artist_similar": [],
>>         "/music/artist/acquire_webpage": [],
>>         "/music/artist/home_page": [
>>           "Official homepage, but lacking discography."
>>         ],
>>         "/music/artist/album": [
>>           "Please Please Me",
>>           "From Me to You",
>>           "Introducing... The Beatles",
>>           "She Loves You",
>>           "With the Beatles",
>>           "I Want to Hold Your Hand",
>>           "Meet the Beatles",
>>           "Introducing the Beatles",
>>           "Second Album",
>>           "Something New",
>>           "A Hard Day's Night",
>>           "I Feel Fine",
>>           "Beatles for Sale",
>>           "Beatles '65",
>>           "Beatles VI",
>>           "Help! / I'm Down",
>>           "Help!",
>>           "Rubber Soul",
>>           "We Can Work It Out / Day Tripper",
>>           "Paperback Writer / Rain",
>>           "Yesterday... and Today",
>>           "Revolver",
>>           "Strawberry Fields Forever / Penny Lane",
>>           "Sgt. Pepper's Lonely Hearts Club Band",
>>           "Magical Mystery Tour",
>>           "Hello, Goodbye",
>>           "Magical Mystery Tour",
>>           "Lady Madonna",
>>           "Hey Jude",
>>           "The Beatles (disc 1)",
>>           "The Beatles",
>>           "Yellow Submarine",
>>           "Get Back",
>>           "Abbey Road",
>>           "Hey Jude",
>>           "Let It Be",
>>           "1962-1966 (disc 1)",
>>           "1962-1966 (disc 2)",
>>           "1967-1970 (disc 1)",
>>           "1967-1970 (disc 2)",
>>           "The Beatles at the Hollywood Bowl",
>>           "Love Songs",
>>           "Sgt. Pepper's Lonely Hearts Club Band",
>>           "Past Masters, Volume One",
>>           "Past Masters, Volume Two",
>>           "Rockin' at the Star-Club",
>>           "The Early Tapes of the Beatles",
>>           "Live at the BBC (disc 1)",
>>           "Live at the BBC (disc 2)",
>>           "Anthology 1 (disc 1)",
>>           "Anthology 1 (disc 2)",
>>           "Free as a Bird",
>>           "Real Love",
>>           "Anthology 2 (disc 1)",
>>           "Anthology 2 (disc 2)",
>>           "Anthology 3 (disc 1)",
>>           "Anthology 3 (disc 2)",
>>           "The Best Of [26 Unforgetable Hit Songs]",
>>           "Yellow Submarine Songtrack",
>>           "1",
>>           "Let It Be... Naked (disc 1)",
>>           "Let It Be... Naked (disc 2)",
>>           "The Capitol Albums, Volume 1 (disc 1: Meet the Beatles!)",
>>           "The Capitol Albums, Volume 1 (disc 2: The Beatles'  
>> Second Album)",
>>           "The Capitol Albums, Volume 1 (disc 3: Something New)",
>>           "The Capitol Albums, Volume 1 (disc 4: Beatles '65)",
>>           "The Capitol Albums, Volume 2 (disc 1: The Early Beatles)",
>>           "The Capitol Albums, Volume 2 (disc 2: Beatles VI)",
>>           "The Capitol Albums, Volume 2 (disc 3: Help!)",
>>           "16 Superhits, Volume 1",
>>           "16 Superhits, Volume 2",
>>           "16 Superhits, Volume 3",
>>           "16 Superhits, Volume 4",
>>           "1962 Live at Star Club in Hamburg",
>>           "1962 Live Recordings",
>>           "1962-1966 (Red Album)",
>>           "1962-1970",
>>           "A Collection of Beatles Oldies (UK Mono LP)",
>>           "A Hard Day's Night",
>>           "Alternate Rubber Soul",
>>           "Anthology (disc 3)",
>>           "Beatles Tapes III: The 1964 World Tour",
>>           "Beatles VI (Stereo and Mono)",
>>           "Best Selection 1962-1968 Part 3",
>>           "Best, Volume 4: 1964",
>>           "Best, Volume 9: 1966",
>>           "Christmas",
>>           "Complete Rooftop Concert 1",
>>           "EP Collection (disc 1)",
>>           "EP Collection (disc 10)",
>>           "EP Collection (disc 11: Yesterday)",
>>           "EP Collection (disc 12)",
>>           "EP Collection (disc 13)",
>>           "EP Collection (disc 14)",
>>           "EP Collection (disc 15)",
>>           "EP Collection (disc 2: Twist and Shout)",
>>           "EP Collection (disc 3)",
>>           "EP Collection (disc 4)",
>>           "EP Collection (disc 5)",
>>           "EP Collection (disc 6)"
>>         ],
>>         "/music/artist/active_end": [
>>           "1970-04-10"
>>         ],
>>         "/music/artist/track": [
>>           "And I Love Her",
>>           "All My Loving",
>>           "Twist and Shout",
>>           "Help!",
>>           "In My Life",
>>           "Strawberry Fields Forever",
>>           "A Day in the Life",
>>           "Revolution",
>>           "The Ballad of John & Yoko",
>>           "Julia",
>>           "Don't Let Me Down",
>>           "Something",
>>           "If I Needed Someone",
>>           "Here Comes the Sun",
>>           "Taxman",
>>           "Think for Yourself",
>>           "For You Blue",
>>           "While My Guitar Gently Weeps",
>>           "Rock and Roll",
>>           "My Bonnie (feat. Tony Sheridan)",
>>           "Beatles Movie Medley",
>>           "I Want to Hold Your Hand",
>>           "Love Me Do",
>>           "Can't Buy Me Love",
>>           "Something (feat. Al Pitrelli, Brad Gillis, John  
>> Petrucci, Michael Lee Firkins, Reb Beach, Steve Morse)",
>>           "My Bonnie (feat. Tony Sheridan)",
>>           "Twist and Shout",
>>           "Tomorrow Never Knows (UNKLEsounds edit)",
>>           "The Girl I Love",
>>           "She Loves You",
>>           "Can't Buy Me Love",
>>           "Help",
>>           "Sgt. Pepper's Lonely Hearts Club Band",
>>           "Oh! Darling",
>>           "Let It Be",
>>           "Oh! Darling",
>>           "Let It Be",
>>           "Ain't She Sweet",
>>           "My Bonnie (feat. Tony Sheridan)",
>>           "A Hard Day's Night",
>>           "Love Me Do",
>>           "Help!",
>>           "A Day in the Life (DM mix)",
>>           "While My Guitar Gently Weeps",
>>           "I Want to Hold Your Hand",
>>           "Paperback Writer",
>>           "Yellow Submarine",
>>           "Ain't She Sweet",
>>           "Cry for a Shadow (feat. Tony Sheridan)",
>>           "The Girl I Love",
>>           "I Saw Her Standing There",
>>           "Ain't She Sweet",
>>           "Hey Jude",
>>           "Revolution",
>>           "Get Back",
>>           "Don't Let Me Down",
>>           "The Ballad of John and Yoko",
>>           "Old Brown Shoe",
>>           "From Me to You",
>>           "She Loves You",
>>           "I Want to Hold Your Hand",
>>           "Can't Buy Me Love",
>>           "A Hard Day's Night",
>>           "I Feel Fine",
>>           "Ticket to Ride",
>>           "Ain't She Sweet",
>>           "Let It Be",
>>           "Till There Was You",
>>           "Money",
>>           "Let It Be",
>>           "Michelle",
>>           "Girl",
>>           "Hey Jude",
>>           "And I Love Her",
>>           "Yesterday",
>>           "Ain't She Sweet",
>>           "Yesterday"
>>         ],
>>         "/music/artist/member": [
>>           null,
>>           null,
>>           null,
>>           null,
>>           null,
>>           null
>>         ],
>>         "/music/artist/label": [],
>>         "/music/artist/origin": [],
>>         "/music/artist/similar_artist": [],
>>         "/music/artist/contribution": [],
>>         "type": "/music/artist",
>>         "id": "#9202a8c04000641f800000000003ac10",
>>         "/music/artist/genre": [
>>           null
>>         ]
>>       },
>>       {
>>         "/music/artist/active_start": [],
>>         "name": "the Beatles",
>>         "/music/artist/artist_similar": [],
>>         "/music/artist/acquire_webpage": [],
>>         "/music/artist/home_page": [],
>>         "/music/artist/album": [],
>>         "/music/artist/active_end": [],
>>         "/music/artist/track": [],
>>         "/music/artist/member": [],
>>         "/music/artist/label": [],
>>         "/music/artist/origin": [],
>>         "/music/artist/similar_artist": [],
>>         "/music/artist/contribution": [],
>>         "type": "/music/artist",
>>         "id": "#9202a8c04000641f8000000004f21a14",
>>         "/music/artist/genre": []
>>       },
>>       {
>>         "/music/artist/active_start": [],
>>         "name": "The Beatles",
>>         "/music/artist/artist_similar": [],
>>         "/music/artist/acquire_webpage": [],
>>         "/music/artist/home_page": [],
>>         "/music/artist/album": [],
>>         "/music/artist/active_end": [],
>>         "/music/artist/track": [],
>>         "/music/artist/member": [
>>           null,
>>           null,
>>           null,
>>           null
>>         ],
>>         "/music/artist/label": [],
>>         "/music/artist/origin": [],
>>         "/music/artist/similar_artist": [],
>>         "/music/artist/contribution": [],
>>         "type": "/music/artist",
>>         "id": "#9202a8c04000641f8000000004fbb1fc",
>>         "/music/artist/genre": []
>>       }
>>     ]
>>   }
>> })
>>
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>>     
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/developers/attachments/20070523/fb20e8f4/attachment-0002.htm 


More information about the Developers mailing list