[Data-modeling] Denormalised data

Jeff Prucher jeff at metaweb.com
Wed Apr 8 20:02:44 UTC 2009


 

> -----Original Message-----
> From: data-modeling-bounces at freebase.com 
> [mailto:data-modeling-bounces at freebase.com] On Behalf Of Scott Meyer
> Sent: Wednesday, April 08, 2009 10:10 AM
> To: Freebase Data Modeling
> Subject: Re: [Data-modeling] Denormalised data
> 
> Philip Kendall wrote:
> > Both /cvg/computer_videogame and /cvg/game_version deliberately 
> > contain developer, publisher and release date fields.
> > 
> > There's a certain lack of documentation around these 
> fields, but the 
> > convention seems to be that the developer/publisher/date of the 
> > original version of the game go in the /cvg/computer_videogame 
> > properties, and the developer/publisher/date of any 
> conversions/ports 
> > go in the /cvg/game_version properties.
> > 
> > However, this leads to a question as to what is "best 
> practice" when 
> > filling in the /cvg/game_version for the original version 
> of the game:
> > should they be filled in with the same values as were put in 
> > /cvg/computer_videogame (thus meaning that the two could get out of
> > sync) or should they be left empty (thus meaning that any 
> apps have to 
> > know about this structure)?
> > 
> > Any views?
> 
> Yeah, fix the data model.
> 
> This need for documentation, convention, gardening bots to 
> enforce convention, etc., is exactly why denormalization 
> (duplication of data) is a problem.  The problem for 
> application developers is "How do I ask about all versions of 
> a game?"  Can I just grab all the versions or do I need to 
> make a special case for "the original version"?  Currently, 
> the answer is the worst possible one: get all versions and 
> the merge the "original version" information from the video 
> game topic (carefully!) into the list of versions as it may 
> or may not be there.
> 
> Since /cvg/game_version is a CVT, it seems like the 
> reasonable thing to do to represent an "original version" is 
> to create a new property, 
> /cvg/computer_videogame/original_version, which also refers 
> to something of type /cvg/game_version. Typically this would 
> refer to a cvt which is also referred to by by the 
> /cvg/computer_videogame/versions property so the cost is one 
> extra primitive multiplied by 17,000 video games.
> 
> If you want the "original version" just ask for it, no 
> sorting of versions or special cases. If you want all 
> versions (including the original version) that is just a property too.

I think this is a good approach for this sort of thing in general: we did
something like this in the opera domain already. But this will not be quite
the hoped for panacea -- although it would be less messy than the current
situation -- since it also requires documentation, conventions, and possibly
bots in order to be useful. The convention is that orginal versions of a
game should appear in both the Original Versions property and the Versions
property. Obviously, documentation will be necessary to explain this to
users. And bots might be necessary to make sure all instances of "original
version" are also instances of "versions".  Otherwise we're back where we
started (well, a bit better off), which users needing to query two
properties to find all versions of a game (and furthermore to strip out
duplicates for the versions that were correctly listed in both properties).

Jeff



More information about the Data-modeling mailing list