[Data-modeling] Animal breeds and owners

Paul Houle paul at ontology2.com
Fri Feb 6 22:34:59 UTC 2009


John Giannandra wrote:
> Paul, thats a nice site.
>
>   
    Thanks.  It's actually got a lot of problems.  For one thing,  it's 
really a modified installation of wordpress,  and it's reached a point 
where the navigation mechanisms have broken down.  The head end could 
probably add 50,000 more pictures of birds,  reptiles,  fish,  you name 
it, in a week but then the site would become completely impossible to use.

    We're working on a second-generation publishing platform for similar 
kinds of sites (on different topics),  and I'll be launching the first 
of the new sites soon.  One of these days the animals will get ported to 
the new system.
> w.r.t ITIS numbers, we have about 40k ITIS TSNs mapped in freebase (and about 23K NCBI numbers).
> So you can view something like:  
>     http://freebase.com/view/biology/itis/180528
>
> And you can download the data directly at 
>     http://download.freebase.com/datadumps/2009-01-13/browse/biology/organism_classification.tsv
>
> There are currently ~84K organism classifications in freebase and the 40K mostly reflects the overlap between ITIS and Wikipedia when we last did the import.  There has been some discussion about importing all of ITIS even though we dont have blurbs and images for most of those Taxa.  Let me know if that would be useful for your project, or if there is particular data that you would like to contribute or see contributed to freebase.
>   
    We did an alignment between ITIS and Wikipedia for animalphotos.  
There are errors in both ITIS and Wikipedia,  and of course there is 
some disagreement between taxonomists.  My (unexpert) opinion from 
looking at the diffs is that Wikipedia tends to be more reflective of 
recent thinking,  but I've found some places where Wikipedia is wrong.  
Animalphotos uses ITIS as the skeleton for the animal taxonomy because 
the integrity of the tree is better:  there was funkiness in the tree we 
extracted from Wikipedia that we didn't want to deal with.

    Recently we've built a taxonomy of car makes and models based on 
government databases and wikipedia.  It identifies many car models that 
were not tagged as car models in freebase the last time I looked.  Our 
process starts with a set of objects that have a known identity,  
expands it to a larger set where the precision isn't so good,  and then 
uses various filtering processes to clean up the taxonomy.  I think 
we'll have something acceptable after the next processing phase -- we'd 
be happy to contribute data of this sort to freebase.

 


More information about the Data-modeling mailing list