[Developers] GUIDs in data dump
Scott Meyer
sm at metaweb.com
Tue Jul 1 01:13:33 UTC 2008
Will Fitzgerald wrote:
> Thanks, Alec. Let's consider this a bit more. As you know, we're relying on
> Freebase for a lot of our name data, and we would like to guarantee as much
> as possible the stability of a handle to a particular entity over time.
>
> Let¹s consider the comedian and actor, Steve Martin (just to take a name at
> random ...), We'd like a handle for this person that remains stable.
>
> In the March 2008 download, his id is OE/topic/en/steve_martin¹
> In the June 2008 download, his id will be OE/en/steve_martin¹
>
> because of changing "/topic/en" to "/en" [1].
>
> You wrote in [1]:
>
> ³To be clear, this is a one-time change while we're still in alpha. We
> really, really don't expect to do another system-wide change like
> this, especially once we're out of alpha.²
>
> This all implies that ids are pretty stable, but not guaranteed stable.
In an alpha product, yes. In hindsight, I think that it would
have been better to have the code honor the old '/topic' form
indefinitely. We need to get used to jumping through that hoop
and we should have just started with "/topic/en."
> So, I have a few questions:
>
> (1) Are ids meant to be unique (that is, there is an ideal 1-1 mapping
> between ids and entities)?
The mapping isn't 1-1 as an entity can have many ids, however, each
id maps to only one entity. Over time that entity may change (it may
get a different guid) but inasmuch as is possible the significance of
that entity will remain the same.
Obviously, in cases where we have to split entities apart, we can't
make that guarantee, but guids are no help in that situation either.
> (2) Will ids change would only change in a 'system-wide change'?
That is actually the case that won't happen again.
> (3) Will ids that are 'guidy' change? (for example,
> guid/9202a8c04000641f80000000006c3aa3, the id for "Norway women's national
> football team"; or 'guid/9202a8c04000641f8000000000a92044' for "Adam Bomb").
> (This is orthogonal to merges and splits).
No. However, as topics are merged, some guids will win and some will
lose. If you hang onto a guid, you need to be prepared to recognize
the case in which it has been merged with something else (it will
have a single "merged-with" property).
We understand that there is a problem with using ids as foreign keys.
Specifically, since the id - guid mapping is many to one, we have
to choose one when you ask for an id, and our choice may change
as (better) ids get added to the system. We're working on ways
to make the choice of ids more transparent, however, that is
tangential to the problem. A solution is to use a component
of the id path as a foreign key and to request that component
explicitly (via the enumeration feature) rather than making a
generic { "id" : null } request.
Regards,
-Scott
More information about the Developers
mailing list