[Developers] Question about wpid and freebase id
Christopher R. Maden
crism at metaweb.com
Wed Jul 23 15:13:15 UTC 2008
Shug Boabby wrote:
> I am currently using the Freebase API in client-side code, and some
> server side code. I wish to use a local database to resolve various
> freebase ids/guids to Wikipedia URLs (actually, I wish to have a
> backup incase Freebase is not available). However, although the WEX
> dumps have the wpid table, I am not able to find the Wikipedia Names
> anywhere.
Wikipedia names are not reliable, even though they use them in their
URLs. This is why Freebase generates Wikipedia URLs of the form <URL:
http://en.wikipedia.org/wiki/index.html?curid=16073 >; that works
regardless of what article 16073 has been renamed. It will only break
if article 16073 has been deleted. In the case that there is a new
article with the same name, *usually* it isn’t what you want.
> 1 - The closest thing I have found are the Freebase Names... are these
> related to the Wikipedia names? Are there any cases where a Freebase
> Name will not map to a Wikipedia Name (and vice versa)?
When Freebase topics are populated from Wikipedia articles, they are
given the same name, minus any disambiguating parenthetical phrases; the
article “San Francisco (film)” becomes a topic called “San Francisco”
(and usually typed as a Film). Likewise “San Francisco (Midicronica
song),” “San Francisco (typeface),” and “San Francisco (magazine).”
> 2 - Are there any instances where a Wikipedia article may redirect to
> another Wikipedia article, but these 2 articles have different
> freebase guids or ids? One could imagine the scenario where 2 distinct
> articles once existed on Wikipedia (and Freebase assigned guids/ids to
> each), but were later merged into one and the other changed to a
> redirect. (Assume that Wikipedia has remained static since the last
> data dump). My gut instinct is that they may have different freebase
> guids, but the freebase ids should be the same.
There are cases like this. And usually the Freebase IDs should *not* be
the same. Wikipedia articles often conflate multiple ideas. One
example is that someone makes a Wikipedia article for the soundtrack of
a movie and Freebase imports it. Someone else at Wikipedia decides that
the soundtrack article isn’t significant enough to warrant its own
article and merges it into the article on the movie. Freebase still
wants to preserve distinct topics for the film and its soundtrack, as
they have very different properties.
When the Wikipedia articles have been merged for a good reason, then the
Freebase topics should be merged as well.
> 3 - Is it possible to obtain the "freebase id" (not necessarily the
> guid) from the data dumps?
You can look at the /type/object/key properties of an object and follow
them back to the root node. This is what MQL does when constructing an
ID. However, there is not necessarily a single canonical ID for a
topic; the film _Moonraker_ is known as
/wikipedia/en/Moonraker_$0028film$0029, /wikipedia/en/Manuela,
/user/hal/netflix/movie/772477, and /authority/imdb/title/tt0079574,
among others.
~Chris
--
Christopher R. Maden
Data Architect
Freebase.com: <URL: http://www.freebase.com/ >
Metaweb Technologes, Inc. <URL: http://www.metaweb.com/ >
More information about the Developers
mailing list