[Data-modeling] Products with ingredients

Robert Cook robert at metaweb.com
Tue Jun 16 22:29:44 UTC 2009


On Jun 16, 2009, at 3:10 PM, Jeff Prucher wrote:

> I've been working on a model for Products With Ingredients (catchy  
> name, eh?) over on sandbox:
> <https://www.sandbox-freebase.com/view/business/product_with_ingredients 
> >
>
> It's pretty minimal, with two types: Product and Ingredient. The  
> "product with ingredients" type can be used both with a consumer  
> product (<https://www.sandbox-freebase.com/view/guid/9202a8c04000641f800000000c461acb 
> >) or with a brand or product line (<https://www.sandbox-freebase.com/view/en/corn_flakes 
> >), depending on where the ingredients make the most sense (i.e.,  
> all packages of Corn Flakes have the same ingredients, so putting  
> the type at the Brand level makes the most sense).
>
> There are two things I'm seeing with my example data that don't  
> quite work in the model, though, and I'm not quite sure what the  
> best way to resolve them is. One is the Corn Flakes ingredient  
> "Milled corn". Should the Ingredient topic be "Milled Corn", should  
> it just be "Corn", or do we need a CVT to allow people to modify the  
> ingredient ("Corn", "milled")?  The toothpaste has this ingredient  
> also: "sodium lauryl sulfate (from coconut oil)", which I think is  
> the same issue.

I would err on the side of simpler data input (to increase the chances  
that the schema is actually used).  For that reason, I think that  
"milled corn" is fine.  If queries need to find all corn-based  
ingredients, we then can either refactor data after we have a lot of  
it, perhaps using your suggested modifier property or we could create  
a phylogeny pattern that, for instance, encodes that "milled corn" is  
a type of "corn", and then MQL queries could use this structure.

Either way, hew to the existing data and we'll solve the query  
problems as we go.

>
> The other one is ingredients within ingredients: the toothpaste tube  
> lists this ingredient: "fruit extracts (strawberry, banana, and  
> other natural flavors)". Treat as four separate ingredients, and  
> punt on the relationship? I'm tempted toward this one -- if you're  
> looking for potential allergens, or animal-based ingredients, or the  
> like, you don't care whether the offending item is in a main  
> ingredient or is an ingredient of an ingredient.

This is probably a good guideline - if there are sub-ingredients, they  
should probably be broken out when the data is added.  The only  
problem here is that ordering matters -- on the original contents  
list, there is more of item N than item N+1 in the product.  If you  
break them out, it's unclear where they should end up in the list.

(As an aside, I can see that the ordering was lost in your corn flakes  
example -- this is a bug in the client when you add multiple property  
values at once, their ordering is lost.)

R



More information about the Data-modeling mailing list