[Data-modeling] Thoughts on disease/treatment

Benjamin Good ben.mcgee.good at gmail.com
Thu Jun 12 20:25:58 UTC 2008


On Jun 12, 2008, at 11:16 AM, Dan Ruderman wrote:

> Hi Ben,
>
>
> Benjamin Good wrote:
>> 1) the GO import could be made much useful through better integration
>> with the rest of freebase.  For example, http://www.freebase.com/view/en/biological_reproduction
>>  should very likely be linked to GO:reproduction (http://www.freebase.com/view/guid/9202a8c04000641f800000000520ead4
>> ).  How might this be automated ?
>>
> I think this lies at the core of the reconciliation problem, and may
> require curation by hand unless we can find some organization which
> has already placed GO in this broader context.

I tend to agree.  The ideal case would be for an organization like the  
GO consortium itself or the OBO foundry that it begot to take charge  
of this task.  That being said, I bet there are quite a few example  
where direct correspondence could be determined automatically at  
import.  (This has got to be something the data harvesting teams at  
freebase know quite a lot about).


>
>> 2) is there a mechanism to try to update the freebase data when the  
>> GO
>> is updated ?
>>
> I had uploaded what was current as of 2007-02-18.  I had attempted to
> track version information through the "Data Source" property on
> each GO group, but it looks like that did not work as I had expected.
> It would probably not be that difficult to determine the changes in
> the GO since then and add/deprecate groups (and hopefully reference
> information about the new version somehow).
>
>> 3) users of things like the GO really expect and depend on inferences
>> - especially hierarchical ones (isa, partof).  What is a good pattern
>> for representing hierarchy and executing queries across hierarchies
>> within the context of freebase?
>>
> One of the most attractive features of Freebase is the ease
> with which hierarchies can be represented in terms of
> properties.  I leveraged this concept when creating the
> schema for GO.  This mechanism extends naturally to queries.
> Could you be a bit more specific in stating the type of hierarchy
> and queries which interest you?  Perhaps that could help the
> list participants to generate some specific examples.

Sure.  I'd like to find all gene groups (and all genes when the  
associations get into freebase) that have been annotated with any  
Cellular Component or, any component of the cytoplasm.  This is  
basically the same as a query for all of the organism classifications  
lower-than a particular classification.  I can see how to request  
hierarchical chains when I know how many levels I want to traverse,  
but I would also like to be able to handle queries when I don't know  
in advance how far down they will go.  This could be solved through an  
iterative request but, as it seems like such a general case, it would  
be great if there was some sort of standard way to deal with it.  Here  
is MQL for a getting several levels down from 'bird' or 'Aves'.

[
   {
     "lower_classifications" : [
       {
         "lower_classifications" : [
           {
             "lower_classifications" : [
               {
                 "lower_classifications" : [
                   {
                     "lower_classifications" : []
                   }
                 ],
                 "name" : null
               }
             ],
             "name" : null
           }
         ],
         "name" : null
       }
     ],
     "name" : null,
     "scientific_name" : "Aves",
     "type" : "/biology/organism_classification"
   }
]

In an answer to a previous inquiry about subsumption in freebase, John  
Giannandrea said "So why dont we support strict type inheritance?   
Well because real  world data is messy."  .  OK, cool, I'm not  
suggesting that Freebase should support inheritance at the level of  
Types, but why not explicitly support the general case of what he  
calls "phylogeny patterns" ?  Though real world data is indeed very  
messy, there is a lot of very valuable, fairly clean data (e.g. the  
GO, the FMA, etc.) that would be much more easily merged into freebase  
given the definition (even just a consensus agreement) on what the  
general-purpose broader-than/narrower-than relationship should be and  
how it should be interacted with in MQL.

-Ben






















More information about the Data-modeling mailing list