[Developers] multiple create unless_exists in one query

Tim Sturge tsturge at metaweb.com
Tue Jan 15 22:00:42 UTC 2008


This is old, but it's interesting enough I though I should put in my 2 
cents:

I never thought anyone would notice what happens here; In fact the 
situation is if anything even worse than Alec's example.

Basically MQL does a "read the graph" pass and then a "consistency 
check" pass. The consistency check pass is necessary to make sure for 
example that you don't try to do something like

"query": [{ "id": "/some/name",
"name": { "value": "Fred", "connect": "update" },
}, { "id": "/another/name",
"name": { "value": "Bob", "connect": "update" }
}]

The problem is, what if /some/name and /another/name turn out to 
reference the same object? You can't make it called "Fred" and "Bob" so 
you need to fail the query as inconsistent.

With "create": "unless_exists" there are similar issues but they are 
even more horrible. Suppose you ask for:

"query": [ {"id":null,
"name": "Fred",
"type": "/film/actor",
"create": "unless_exists" },
{"id":null,
"name": "Fred",
"type": "/music/artist",
"create": "unless_exists" }]

Simple enough; you get back two different freds; the actor and artist. 
But now let's try a slightly longer version of the same query:

"query": [ {"id":null,
"name": "Fred",
"type": "/film/actor",
"create": "unless_exists" },
{"id":null,
"name": "Fred",
"type": "/music/artist",
"create": "unless_exists" },
{"id": null,
"name": "Fred",
"type": [ { "id": "/music/artist" }, { "id": "/film/actor" } ],
"create": "unless_exists"
]

You create the first two as before, but when you create the third you 
have duplicated the first two! The only way to make this query work is a 
single object satisfying all 3 clauses, and a single object will then 
also work for the first query.

So the problem is that the first query can be satisfied by one object or 
two. So if you ask to create more than one thing, it's very hard to 
decide if you intended them to be distinct or not.

I took one look at this, threw up my hands and decided to create 
everything that was not already present at query time. It was a simple 
rule, easy to explain, and until I saw this, I didn't think anyone would 
run into the complexities underneath unless they were deliberately 
poking at the edgecases.

You are really looking for the rule "this query will succeed when re-run 
returning the same objects" coupled with the rule "the maximum number of 
consistent objects are created", although even then there are cases 
which are ambiguous.

How would you feel if the behaviour changed so that the query:

"query": [{
    "type": "/user/avh/default_domain/foo",
    "name": "foo2",
    "create":"unless_exists",
    "id":null
},{
    "type": "/user/avh/default_domain/foo",
    "name": "foo2",
    "create":"unless_exists",
    "id":null
}]

failed outright, with the message "the missing objects cannot all be 
created simultaneously"? That's probably the most consistent thing to do.

Tim





Arthur van Hoff wrote:
> Hi Alec,
>
> I think you make a good point. However, I still would argue that the
> behavior that is currently implemented is not useful. Take the query you
> outlined. In the current implementation it would result in the wrong
> result UNLESS Fred already existed. If Fred does not exist you would end
> up with two different Freds, which clearly is not what you intended.
>
> Your example (I had to change the name because Fred already existed):
>
> "query":{
>     "id": null,
>     "children": {
>         "id": null,
>         "name": "Fredx's Kid",
>         "create": "unless_exists",
>         "parents": {
>             "id": null,
>             "name": "Fredx",
>             "create": "unless_exists"
>         }
>     },
>     "name": "Fredx",
>     "type": "/people/person",
>     "create": "unless_exists"
> }
>
> Result:
>
> {
>   "status": "200 OK", 
>   "code": "/api/status/ok", 
>   "result": {
>     "create": "created", 
>     "type": "/people/person", 
>     "children": {
>       "create": "created", 
>       "parents": {
>         "create": "created", 
>         "id": "/guid/9202a8c04000641f8000000006e601ce", 
>         "name": "Fredx"
>       }, 
>       "id": "/guid/9202a8c04000641f8000000006e601cb", 
>       "name": "Fredx's Kid"
>     }, 
>     "name": "Fredx", 
>     "id": "/guid/9202a8c04000641f8000000006e601c7"
>   }
> }
>
> Note the poor child with two dads.
>  
>
> -----Original Message-----
> From: developers-bounces at freebase.com
> [mailto:developers-bounces at freebase.com] On Behalf Of Alec Flett
> Sent: Thursday, December 20, 2007 9:58 AM
> To: For discussions about MQL,Freebase API and apps built on Freebase
> Subject: Re: [Developers] multiple create unless_exists in one query
>
> Arthur van Hoff wrote:
>   
>> It is fair to say that their ought to be an easy and obvious way to
>> handle multiple inserts (in the same query) of the same topic.
>>
>>   
>>     
> This is actually a pretty significant request - I can see how it would 
> make your life a lot easier if MQL could just "do the right thing", but 
> ultimately I'm not sure it's clear what the right thing is in the 
> general case, because it's not clear which part of the query MQL should 
> resolve first.
>
> Even if we did have some very specific, deterministic rules about the 
> order of resolution, those rules would probably have to be quite complex
>
> and it would be much harder for a developer to wrap their brain around 
> those rules. Kurt was able to explain the current rules in one sentence 
> ("all verifying reads ...are done before any writes are done") and 
> that's a huge win for the usability of MQL.
>
>
> Here's an in-depth look at it...given the write query:
>
> {"id": null,
>   "name": "Fred",
>   "type": "/people/person",
>   "create": "unless_exists",
>   "children": {"id": null,
>                       "name": "Fred's Kid"
>                       "create": "unless_exists",
>                       "parents": {"id": null,
>                                          "name": "Fred",
>                                          "create": "unless_exists"}}}
>
> There is maybe some ambiguity here. The first reference to "Fred" is 
> looking for a person "Fred" with a potential child of "Fred's Kid" - the
>
> 2nd reference to "Fred" is also looking for a person with a potential 
> child of "Fred's Kid" - the parent/child here is actually a constraint 
> on the unless_exists.
>
> So the real question becomes, do we resolve the first Fred, then assume 
> he has been created, then Fred's Kid, then Fred again? How does MQL 
> recognize that these are the 'same' person? Really, it would be because 
> they have the same constraints, and you'd have a sort of circular 
> dependency.
>
> Let's say we resolve these in a depth-first traversal of the query. When
>
> we resolve the 2nd Fred, do we have to also account for the fact that 
> Fred's Kid is now attached to the Fred that we just created, so that 
> that constraint is met? What if "Fred" existed, but "Fred's Kid" did 
> not? Would the 2nd reference to Fred match or not match the existing 
> "Fred"?
>
> And what if MQL did a breadth-first search resolution in a wider query 
> instead of depth-first, with redundant references to the same uncreated 
> object? That could result in a different set of ambiguities about how 
> the unless_exists are resolved.
>
> Or, do we decide that we're going to try to resolve all /people/person's
>
> named "Fred" first? In that case, you'd probably get today's behavior 
> since not Fred is attached to any Fred's Kid, and neither /people/person
>
> would match. But what if by some luck we tried to resolve "Fred's Kid" 
> first, and then "Fred" - would all the constraints match?
>
> Hopefully this illustrates that the developer would have to have a more 
> in-depth knowledge of MQL's resolution ordering than one already does.. 
> and someone might be posting with a different set of confusion around 
> which clauses get resolved first. With today's behavior, there isn't 
> much to understand - all clauses get resolved "simultaneously" and then 
> the write is done.
>
> I think this is one of those areas where on a query-by-query basis, a 
> human can easily decide what makes the most sense.. but from MQL's 
> perspective, there is a lot more ambiguity here.
>
> Alec
>  
>   
>> This is not perfect, but it will work, and should not affect
>> performance significantly.
>>
>> 								Kurt :-)
>>
>>  
>>   
>>     
>>> Below is an even simpler example that illustrates the problem using
>>> "create":"unless_exists" at the top level. It creates multiple foo2s
>>>     
>>>       
>> if
>>   
>>     
>>> no foo2 existed. Perhaps this can be fixed so that both cases
>>>       
> produces
>   
>>>     
>>>       
>> a
>>   
>>     
>>> single foo2, which seems the most natural outcome.
>>>
>>> "query": [{
>>>     "type": "/user/avh/default_domain/foo",
>>>     "name": "foo2",
>>>     "create":"unless_exists",
>>>     "id":null
>>> },{
>>>     "type": "/user/avh/default_domain/foo",
>>>     "name": "foo2",
>>>     "create":"unless_exists",
>>>     "id":null
>>> }]
>>>     
>>>       
>> Just do this query twice:
>>
>> {
>>   "query" : {
>>     "create" : "unless_exists",
>>     "id" : null,
>>     "name" : "foo002",
>>     "type" : "/user/avh/default_domain/foo"
>>   }
>> }
>>
>> 1st RESULT:
>> {
>>   "code" : "/api/status/ok",
>>   "result" : {
>>     "create" : "created",
>>     "id" : "/guid/9202a8c04000641f8000000006e601c3",
>>     "name" : "foo002",
>>     "type" : "/user/avh/default_domain/foo"
>>   }
>> }
>>
>> 2nd RESULT:
>>
>> {
>>   "code" : "/api/status/ok",
>>   "result" : {
>>     "create" : "existed",
>>     "id" : "/guid/9202a8c04000641f8000000006e601c3",
>>     "name" : "foo002",
>>     "type" : "/user/avh/default_domain/foo"
>>   }
>> }
>>
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>> _______________________________________________
>> Developers mailing list
>> Developers at freebase.com
>> http://lists.freebase.com/mailman/listinfo/developers
>>   
>>     
>
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
>   



More information about the Developers mailing list