[Data-modeling] Modeling datastructures
Jeff Prucher
jeff at metaweb.com
Thu Oct 23 19:48:51 UTC 2008
I think Faye is suggesting that your Big-O values be topics, rather than
text strings, which would enable you to define the variables in the
description of the type, which should, in turn, reduce the ambiguity of the
expressions.
Jeff
> -----Original Message-----
> From: data-modeling-bounces at freebase.com
> [mailto:data-modeling-bounces at freebase.com] On Behalf Of
> Daniel Peebles
> Sent: Thursday, October 23, 2008 12:32 PM
> To: Freebase data modeling mailing list
> Subject: Re: [Data-modeling] Modeling datastructures
>
> That seems like a workable solution, yeah. I still have the
> issue of not knowing what variables mean in the expression.
> If say something is O(mn), in isolation, that means almost
> nothing. When it's O(n), people generally assume n represents
> the size of the datastructure in question, so that's not so
> ambiguous most of the time, but even then, with things like
> Tries, the definition of "size of the datastructure"
> is rather nebulous.
>
> As an aside, might I suggest a new primitive type for
> "mathematical expression"? I could construct one with an epic
> abuse of types, but it would be nice to have a simple
> built-in type that takes some standard notation and stores it
> in a meaningful structured format (that could then be
> converted to LaTeX or something for pretty output on pages).
> It seems like such a type might be useful as the amount of
> scientific information grows on the site, but maybe that's
> out of scope?
>
> The main reason I bring this up is that it would be nice to
> store these big-O strings as more than just text. If the
> system could reason on the variables that constitute an
> expression, then maybe if we can eventually specify
> constraints on properties, it would be possible to require
> that meaningful variables be used in the equation. Similarly,
> functions (it would verify that "log" is of type Function, or
> something) and operators.
>
> I do agree that this would be moving the definition of the
> variables to text somewhere else, but it still remains a more
> structured approach and a more machine-processable one.
>
> But again, this is mostly me dreaming about the ultimate
> knowledge system, without knowing too deeply how this all
> works. It may all just be crazy talk :P
>
> Thanks,
> Dan
>
> On Thu, Oct 23, 2008 at 3:17 PM, Faye Harris <faye at metaweb.com> wrote:
> > Hi Daniel,
> >
> > For modeling Big O notation, I suggest creating a type and
> make it the
> > expected type for Data Structure properties like "space complexity".
> > Then create a topic for each common Big O order. If you add
> a property
> > "algorithmic efficiency" to an Algorithm type, you can use the same
> > Big O Order type as the expected type.
> >
> > The type allows you to have properties and reusability, and support
> > reverse links. Having the Big O values as topics buys you
> an article
> > where you can describe what each value means. If you mark
> the type as
> > enumerated, users will be able to see a drop down and select a
> > pre-populated value when they need to fill out a Big O
> property. Since
> > there aren't that many common orders, I think it makes sense.
> >
> > To summarize, this is what we'd have:
> >
> > Type: Big O Order (or whatever you want to name it)
> > Display: Enumerated
> > Properties:
> > 1) Data structures with this space complexity (expected type: Data
> > Structure)
> > 2) Algorithms with this runtime efficiency (expected type:
> Algorithm)
> > Create a topic for each common order: O(1), O(n), O(log n), O(n log
> > n), O(n^2), etc.
> >
> > For your existing Data Structure type, set:
> > Property:
> > Space complexity, expected type Big O Order (make sure it's
> a reverse
> > property of the Big O Order type's #1 property to link them
> together).
> >
> > And do something similar with algorithms. Viola!
> >
> > -- Faye
> >
> >
> >
> > Daniel Peebles wrote:
> >> Hi all,
> >>
> >> I'm pretty new to all this Freebase stuff, but yesterday I
> got myself
> >> extremely involved in setting up a Data Structure type,
> with related
> >> information and so on. You can find what I have so far at
> >>
> http://www.freebase.com/view/user/pumpkin/computer_science/data_struc
> >> ture (and please join my computer science domain if you're
> interested
> >> in helping me with this!)
> >>
> >> My main question has to do with modeling the operations on a given
> >> data structure. If you look at what I added to splay tree,
> you'll see
> >> I have a list of operations that the splay tree supports,
> along with
> >> time complexities for each of them. When the complexity is
> obvious,
> >> all is fine, but on the same page you can see my information for a
> >> Trie. There, the complexity is in terms of the length of the input
> >> string, rather than the total size of the data structure.
> >>
> >> Ideally, my data structure type would have a property called
> >> "relevant variables", which could map from a name to a
> description (n
> >> -> "Number of strings contained", m -> "Size of string
> being operated
> >> on"), and then the big-O notation in the "data structure operation
> >> profile" CVT would require there to be a mathematical
> expression in
> >> terms of only the variables in the data structure
> associated with it.
> >>
> >> This seems awfully complex, and I'm not sure I can place a
> constraint
> >> on the variable source in the expression to ensure it didn't come
> >> from another data structure. The other issue is that I
> don't have a
> >> "mathematical expression" basic type, and to make my own
> type would
> >> be monstrously complicated and ugly.
> >>
> >> So it boils down to, does anyone have any ideas on how to
> implement
> >> the time complexities better? It looks really ugly to me to write
> >> O(size of inserted string) as a machine-readable string,
> but I can't
> >> see any better way to do it. The end-goal is obviously to have a
> >> structured database of datastructures, the operations they
> support,
> >> and the time complexity of that operation, so that people
> can quickly
> >> find the best tool for the job (and just for the sake of good data
> >> modeling). Is this too specific an application of the extremely
> >> general data modeling features of freebase?
> >>
> >> Thanks,
> >> Daniel Peebles
> >> _______________________________________________
> >> Data-modeling mailing list
> >> Data-modeling at freebase.com
> >> http://lists.freebase.com/mailman/listinfo/data-modeling
> >>
> >>
> >
> > _______________________________________________
> > Data-modeling mailing list
> > Data-modeling at freebase.com
> > http://lists.freebase.com/mailman/listinfo/data-modeling
> >
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
>
More information about the Data-modeling
mailing list