[Data-modeling] Thoughts on disease/treatment
Benjamin Good
ben.mcgee.good at gmail.com
Thu Jun 12 20:25:58 UTC 2008
On Jun 12, 2008, at 11:16 AM, Dan Ruderman wrote:
> Hi Ben,
>
>
> Benjamin Good wrote:
>> 1) the GO import could be made much useful through better integration
>> with the rest of freebase. For example, http://www.freebase.com/view/en/biological_reproduction
>> should very likely be linked to GO:reproduction (http://www.freebase.com/view/guid/9202a8c04000641f800000000520ead4
>> ). How might this be automated ?
>>
> I think this lies at the core of the reconciliation problem, and may
> require curation by hand unless we can find some organization which
> has already placed GO in this broader context.
I tend to agree. The ideal case would be for an organization like the
GO consortium itself or the OBO foundry that it begot to take charge
of this task. That being said, I bet there are quite a few example
where direct correspondence could be determined automatically at
import. (This has got to be something the data harvesting teams at
freebase know quite a lot about).
>
>> 2) is there a mechanism to try to update the freebase data when the
>> GO
>> is updated ?
>>
> I had uploaded what was current as of 2007-02-18. I had attempted to
> track version information through the "Data Source" property on
> each GO group, but it looks like that did not work as I had expected.
> It would probably not be that difficult to determine the changes in
> the GO since then and add/deprecate groups (and hopefully reference
> information about the new version somehow).
>
>> 3) users of things like the GO really expect and depend on inferences
>> - especially hierarchical ones (isa, partof). What is a good pattern
>> for representing hierarchy and executing queries across hierarchies
>> within the context of freebase?
>>
> One of the most attractive features of Freebase is the ease
> with which hierarchies can be represented in terms of
> properties. I leveraged this concept when creating the
> schema for GO. This mechanism extends naturally to queries.
> Could you be a bit more specific in stating the type of hierarchy
> and queries which interest you? Perhaps that could help the
> list participants to generate some specific examples.
Sure. I'd like to find all gene groups (and all genes when the
associations get into freebase) that have been annotated with any
Cellular Component or, any component of the cytoplasm. This is
basically the same as a query for all of the organism classifications
lower-than a particular classification. I can see how to request
hierarchical chains when I know how many levels I want to traverse,
but I would also like to be able to handle queries when I don't know
in advance how far down they will go. This could be solved through an
iterative request but, as it seems like such a general case, it would
be great if there was some sort of standard way to deal with it. Here
is MQL for a getting several levels down from 'bird' or 'Aves'.
[
{
"lower_classifications" : [
{
"lower_classifications" : [
{
"lower_classifications" : [
{
"lower_classifications" : [
{
"lower_classifications" : []
}
],
"name" : null
}
],
"name" : null
}
],
"name" : null
}
],
"name" : null,
"scientific_name" : "Aves",
"type" : "/biology/organism_classification"
}
]
In an answer to a previous inquiry about subsumption in freebase, John
Giannandrea said "So why dont we support strict type inheritance?
Well because real world data is messy." . OK, cool, I'm not
suggesting that Freebase should support inheritance at the level of
Types, but why not explicitly support the general case of what he
calls "phylogeny patterns" ? Though real world data is indeed very
messy, there is a lot of very valuable, fairly clean data (e.g. the
GO, the FMA, etc.) that would be much more easily merged into freebase
given the definition (even just a consensus agreement) on what the
general-purpose broader-than/narrower-than relationship should be and
how it should be interacted with in MQL.
-Ben
More information about the Data-modeling
mailing list