[Developers] Problem writing floating point value twice (or "What value of epsilon does MQL use?")

Warren Harris warren at metaweb.com
Wed Jul 29 19:59:26 UTC 2009


Aside from the epsilon comparison issue, there is probably a bug here  
in that our python layer will often munge the number of significant  
digits before it ever gets to our database where the comparison is  
performed. I plan to fix this in then next major release of mql.

Warren


On Jul 29, 2009, at 12:17 PM, Scott Meyer wrote:

> Tom Morris wrote:
>> I guess maybe I should define my use of "epsilon."  In any given
>> floating point representation, not all numbers can be represented
>> exactly, even if they are in range.  Because of this and the way
>> processors and software libraries do calculations, rounding, etc,  
>> it's
>> bad practice to check for exact equality of floating point numbers by
>> comparing their binary representation.  Instead, the difference
>> between the two numbers is calculated and compared to some small
>> value, often referred to as "epsilon," and if the difference is less
>> than this value, the numbers are considered equal.
>>
>> It would appear that MQL is not using this technique or perhaps the
>> value of epsilon is set too low to accommodate variability introduced
>> by the software stack and the various conversions that are done.
>>
>> Is my interpretation accurate (and thus this should be filed as a  
>> bug)
>> or is something else going on here?
>
> In the database, we store floating point values as strings so  
> there's no
> representational limit.  Equality means "exactly equal," not "close
> to".
>
>> On Tue, Jul 28, 2009 at 10:20 AM, Tom Morris<tfmorris at gmail.com>  
>> wrote:
>>> If I run this query twice
>>>
>>> [{"guid": "#9202a8c04000641f80000000087c7629",
>>> "type": "/location/location",
>>> "area": {
>>>   "connect": "insert",
>>>   "value":   0.0012141000000000003
>>> }}]
>>>
>>> The second run will return the following error
>>>
>>>   "info": {
>>>     "key":      "value",
>>>     "newvalue": 0.0012141000000000003,
>>>     "value":    0.0012141
>>>   },
>>>   "message": "Found existing value for unique property, try update",
>>>
>>> where I'd expect it to return "present."
>>>
>>> Is there any way around this behavior other than pre-reading and
>>> comparing myself (doubling the latency)?
>
> Case 1: your area computation really is accurate to 18 significant  
> digits
>
> I think that we're doing exactly the right thing.  Picking some
> general purpose epislon is virtually guaranteed to to make the guy
> who has lovingly hand crafted a fixed point area computation which is
> really accurate to 18 SD apoplectic with rage.
>
> Case 2: your area computation isn't really that accurate
>
> How about clamping to a modest 5 SD?
>
> I suppose we could come up with guidelines for how many significant
> digits should go into particular properties; even 5 seems like
> overkill for the purposes of describing real estate.
>
> -Scott
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers



More information about the Developers mailing list