[Data-modeling] Modeling datastructures

Daniel Peebles pumpkingod at gmail.com
Thu Oct 23 19:32:22 UTC 2008


That seems like a workable solution, yeah. I still have the issue of
not knowing what variables mean in the expression. If say something is
O(mn), in isolation, that means almost nothing. When it's O(n), people
generally assume n represents the size of the datastructure in
question, so that's not so ambiguous most of the time, but even then,
with things like Tries, the definition of "size of the datastructure"
is rather nebulous.

As an aside, might I suggest a new primitive type for "mathematical
expression"? I could construct one with an epic abuse of types, but it
would be nice to have a simple built-in type that takes some standard
notation and stores it in a meaningful structured format (that could
then be converted to LaTeX or something for pretty output on pages).
It seems like such a type might be useful as the amount of scientific
information grows on the site, but maybe that's out of scope?

The main reason I bring this up is that it would be nice to store
these big-O strings as more than just text. If the system could reason
on the variables that constitute an expression, then maybe if we can
eventually specify constraints on properties, it would be possible to
require that meaningful variables be used in the equation. Similarly,
functions (it would verify that "log" is of type Function, or
something) and operators.

I do agree that this would be moving the definition of the variables
to text somewhere else, but it still remains a more structured
approach and a more machine-processable one.

But again, this is mostly me dreaming about the ultimate knowledge
system, without knowing too deeply how this all works. It may all just
be crazy talk :P

Thanks,
Dan

On Thu, Oct 23, 2008 at 3:17 PM, Faye Harris <faye at metaweb.com> wrote:
> Hi Daniel,
>
> For modeling Big O notation, I suggest creating a type and make it the
> expected type for Data Structure properties like "space complexity".
> Then create a topic for each common Big O order. If you add a property
> "algorithmic efficiency" to an Algorithm type, you can use the same Big
> O Order type as the expected type.
>
> The type allows you to have properties and reusability, and support
> reverse links. Having the Big O values as topics buys you an article
> where you can describe what each value means. If you mark the type as
> enumerated, users will be able to see a drop down and select a
> pre-populated value when they need to fill out a Big O property. Since
> there aren't that many common orders, I think it makes sense.
>
> To summarize, this is what we'd have:
>
> Type: Big O Order (or whatever you want to name it)
> Display: Enumerated
> Properties:
> 1) Data structures with this space complexity (expected type: Data
> Structure)
> 2) Algorithms with this runtime efficiency (expected type: Algorithm)
> Create a topic for each common order: O(1), O(n), O(log n), O(n log n),
> O(n^2), etc.
>
> For your existing Data Structure type, set:
> Property:
> Space complexity, expected type Big O Order (make sure it's a reverse
> property of the Big O Order type's #1 property to link them together).
>
> And do something similar with algorithms. Viola!
>
> -- Faye
>
>
>
> Daniel Peebles wrote:
>> Hi all,
>>
>> I'm pretty new to all this Freebase stuff, but yesterday I got myself
>> extremely involved in setting up a Data Structure type, with related
>> information and so on. You can find what I have so far at
>> http://www.freebase.com/view/user/pumpkin/computer_science/data_structure
>> (and please join my computer science domain if you're interested in
>> helping me with this!)
>>
>> My main question has to do with modeling the operations on a given
>> data structure. If you look at what I added to splay tree, you'll see
>> I have a list of operations that the splay tree supports, along with
>> time complexities for each of them. When the complexity is obvious,
>> all is fine, but on the same page you can see my information for a
>> Trie. There, the complexity is in terms of the length of the input
>> string, rather than the total size of the data structure.
>>
>> Ideally, my data structure type would have a property called "relevant
>> variables", which could map from a name to a description (n -> "Number
>> of strings contained", m -> "Size of string being operated on"), and
>> then the big-O notation in the "data structure operation profile" CVT
>> would require there to be a mathematical expression in terms of only
>> the variables in the data structure associated with it.
>>
>> This seems awfully complex, and I'm not sure I can place a constraint
>> on the variable source in the expression to ensure it didn't come from
>> another data structure. The other issue is that I don't have a
>> "mathematical expression" basic type, and to make my own type would be
>> monstrously complicated and ugly.
>>
>> So it boils down to, does anyone have any ideas on how to implement
>> the time complexities better? It looks really ugly to me to write
>> O(size of inserted string) as a machine-readable string, but I can't
>> see any better way to do it. The end-goal is obviously to have a
>> structured database of datastructures, the operations they support,
>> and the time complexity of that operation, so that people can quickly
>> find the best tool for the job (and just for the sake of good data
>> modeling). Is this too specific an application of the extremely
>> general data modeling features of freebase?
>>
>> Thanks,
>> Daniel Peebles
>> _______________________________________________
>> Data-modeling mailing list
>> Data-modeling at freebase.com
>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>
>>
>
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
>


More information about the Data-modeling mailing list