[Data-modeling] Products with ingredients
Robert Cook
robert at metaweb.com
Tue Jun 16 22:29:44 UTC 2009
On Jun 16, 2009, at 3:10 PM, Jeff Prucher wrote:
> I've been working on a model for Products With Ingredients (catchy
> name, eh?) over on sandbox:
> <https://www.sandbox-freebase.com/view/business/product_with_ingredients
> >
>
> It's pretty minimal, with two types: Product and Ingredient. The
> "product with ingredients" type can be used both with a consumer
> product (<https://www.sandbox-freebase.com/view/guid/9202a8c04000641f800000000c461acb
> >) or with a brand or product line (<https://www.sandbox-freebase.com/view/en/corn_flakes
> >), depending on where the ingredients make the most sense (i.e.,
> all packages of Corn Flakes have the same ingredients, so putting
> the type at the Brand level makes the most sense).
>
> There are two things I'm seeing with my example data that don't
> quite work in the model, though, and I'm not quite sure what the
> best way to resolve them is. One is the Corn Flakes ingredient
> "Milled corn". Should the Ingredient topic be "Milled Corn", should
> it just be "Corn", or do we need a CVT to allow people to modify the
> ingredient ("Corn", "milled")? The toothpaste has this ingredient
> also: "sodium lauryl sulfate (from coconut oil)", which I think is
> the same issue.
I would err on the side of simpler data input (to increase the chances
that the schema is actually used). For that reason, I think that
"milled corn" is fine. If queries need to find all corn-based
ingredients, we then can either refactor data after we have a lot of
it, perhaps using your suggested modifier property or we could create
a phylogeny pattern that, for instance, encodes that "milled corn" is
a type of "corn", and then MQL queries could use this structure.
Either way, hew to the existing data and we'll solve the query
problems as we go.
>
> The other one is ingredients within ingredients: the toothpaste tube
> lists this ingredient: "fruit extracts (strawberry, banana, and
> other natural flavors)". Treat as four separate ingredients, and
> punt on the relationship? I'm tempted toward this one -- if you're
> looking for potential allergens, or animal-based ingredients, or the
> like, you don't care whether the offending item is in a main
> ingredient or is an ingredient of an ingredient.
This is probably a good guideline - if there are sub-ingredients, they
should probably be broken out when the data is added. The only
problem here is that ordering matters -- on the original contents
list, there is more of item N than item N+1 in the product. If you
break them out, it's unclear where they should end up in the list.
(As an aside, I can see that the ordering was lost in your corn flakes
example -- this is a bug in the client when you add multiple property
values at once, their ordering is lost.)
R
More information about the Data-modeling
mailing list