[Data-modeling] Modeling datastructures

Daniel Peebles pumpkingod at gmail.com
Thu Oct 23 16:54:01 UTC 2008


Hi all,

I'm pretty new to all this Freebase stuff, but yesterday I got myself
extremely involved in setting up a Data Structure type, with related
information and so on. You can find what I have so far at
http://www.freebase.com/view/user/pumpkin/computer_science/data_structure
(and please join my computer science domain if you're interested in
helping me with this!)

My main question has to do with modeling the operations on a given
data structure. If you look at what I added to splay tree, you'll see
I have a list of operations that the splay tree supports, along with
time complexities for each of them. When the complexity is obvious,
all is fine, but on the same page you can see my information for a
Trie. There, the complexity is in terms of the length of the input
string, rather than the total size of the data structure.

Ideally, my data structure type would have a property called "relevant
variables", which could map from a name to a description (n -> "Number
of strings contained", m -> "Size of string being operated on"), and
then the big-O notation in the "data structure operation profile" CVT
would require there to be a mathematical expression in terms of only
the variables in the data structure associated with it.

This seems awfully complex, and I'm not sure I can place a constraint
on the variable source in the expression to ensure it didn't come from
another data structure. The other issue is that I don't have a
"mathematical expression" basic type, and to make my own type would be
monstrously complicated and ugly.

So it boils down to, does anyone have any ideas on how to implement
the time complexities better? It looks really ugly to me to write
O(size of inserted string) as a machine-readable string, but I can't
see any better way to do it. The end-goal is obviously to have a
structured database of datastructures, the operations they support,
and the time complexity of that operation, so that people can quickly
find the best tool for the job (and just for the sake of good data
modeling). Is this too specific an application of the extremely
general data modeling features of freebase?

Thanks,
Daniel Peebles


More information about the Data-modeling mailing list