[Data-modeling] Units and Physical Dimension

Scott Meyer sm at metaweb.com
Wed Apr 29 22:27:39 UTC 2009


movgp0 at gmail.com wrote:

Thanks for taking the time to write this up.  Always amazing
to see what lurks beneath the surface of even the simplest
data modeling question.

> TROUBLES =======================================

> (1) There is a potentially unlimited number of units, but only a given
> subset is predefined. Exotic units are not supported.

Actually, a unit is simply an identity.  Properties with identical
units are assumed to be directly comparable.  Anyone can make up a
unit (exotic or garden variety) and create a property which uses
that unit.

> (2) The same unit is handled as complete other kind of unit if it has
> another prefix. In fact, the prefix defines the scale of the unit and
> not another unit. This not only restricts the way a user can input
> values, but it also causes an exponential growth in the number of units.

Yes, at this point all that Freebase knows is that milligrams
are not directly comparable to grams.  We have imagined a system
in which the milligram unit is related to the gram unit by some
sort of algebraic relationship (ie. objects and properties) which
would allow a computer program to determine that grams could
be converted to milligrams by multiplying by 1000, but we haven't
found a need for it.  Example:

   {
      "type" : "/algebra/equality",
      "operand" : {
         "type" : "/algebra/product",
         "operand" : "/en/milligram",
         "scalar_operand" : 1000
      }
      "operand" : "/en/gram" }

Note that any such system would preserve the existing units
identities, so not data would need to be changed.

Another wrinkle here is that our database does not "understand"
units.   As far as graphd is concerned, scalar properties are text
strings, some with formating conventions which allow them to
be treated as integer or floating point numbers.  Units are
just some meta-data that graphd is happy to remember for you,
so that you can ask questions like "show me objects with
a property measured in meters where the measurement is
greater than 10".  If we had the algebraic relationships
between units stored in the graph, a computer program could
modify that query to include "... measured in centimeters ...
greater than 1000" and so forth.

Applying that meta-data to actual values in order to present information
to human beings is left to the application.

> (3) Quantities may be given in other units. One can give electrical
> capacity in units of cm (CGS-system) rather than F (SI-system).
> The means of unit systems is only to provide units that can represent
> values in common dimensions in practical terms. They also provide
> suggestions when this units should get applied.
> Ie. the SI system suggests (and doesn't force) to use "s" for time and
> "m" for lenght. But you can also measure time in (light-)metres and
> length in (light-)seconds.

We expect that applications will choose and convert to display units.
At the moment such conversions would have to be hard coded based on the
unit the identities in question.

> (4) There is a potentially unlimited number of (physical) dimensions.
> But there is only a small subset defined in the freebase system. Also
> this dimensions are defined in a way that causes an exponentially
> growing number of "Unit Of ..." classes.

Algebraic combinations of base units, ought to solve that problem
if it ever materializes.  Do you have some examples of physical
data that is "of general interest" and best recorded with tensor
dimensions?  What data does our current system stop you from
loading?

Having many units is only a problem when it becomes hard to find
the right one.  The unit object is always going to exist whether it is
an identity with a tensor name string or some algebraic operator.
As long as the data described by the unit is of general value,
the overhead of creating a unit for property metadata is incidental.

> Support for complex numbers would be fine.

I'm sure it would be.  :-)  What general purpose data requires
complex numbers?

> * Physical dimensions are different to mathematical dimensions. * A
> physical dimension is defined by a polynomial of the base dimensions 1,
> L, M, T, I, θ, N, and J. * A physical dimension may not have a name.
> Classes for them are pointless.

If you're really proposing that we store an arbitrary polynomial
with each and every scalar value, sorry, we can't do that and
never will.  If you want to store a physical dimension polynomial
with every property (it would apply to all instances of the property),
I think that it would be easy to extend Freebase in that direction.

Regards,

-Scott



More information about the Data-modeling mailing list