[Developers] Question about wpid and freebase id
Shug Boabby
shug.boabby at gmail.com
Wed Jul 23 15:47:56 UTC 2008
Thanks for the explanantion, Chris.
Despite completely agreeing that Wikipedia "Named" IDs are not
reliable, I am nonetheless tied-in to using them by the project I am
working on.
What I need to do is to query freebase with a guid and get back the
Wikipedia Name (or id), resolved to the final trail if any redirects
are involved. I believe that the first result does this, so I have
thus far used "limit" : "1" and it seems to work... can you confirm if
this is the case?
Another type of query that I would like to make is to give the
Wikipedia Name to the Freebase API and receive back the Wikipedia Name
(or id) that it resolves to if redirects are involved... at the moment
I can only do this with 2 queries (first to get the guid, second the
same as described in the first paragraph). Is it possible to do this
in a single query?
In future releases of the WEX dump files, I would very much like to
see a TSV file containing:-
- Wikipedia IDs to Wikipedia Names (I realise this is obtainable by
parsing the XML)
- redirects using the Wikipedia Names or ids and not the Freebase
Names! (Incidentally, why are they the way they are? Surely the
Freebase names introduce another layer of ambiguity... and what is the
point of the integer in the first column?)
is this something you might consider?
2008/7/23 Christopher R. Maden <crism at metaweb.com>:
> Shug Boabby wrote:
>> I am currently using the Freebase API in client-side code, and some
>> server side code. I wish to use a local database to resolve various
>> freebase ids/guids to Wikipedia URLs (actually, I wish to have a
>> backup incase Freebase is not available). However, although the WEX
>> dumps have the wpid table, I am not able to find the Wikipedia Names
>> anywhere.
>
> Wikipedia names are not reliable, even though they use them in their
> URLs. This is why Freebase generates Wikipedia URLs of the form <URL:
> http://en.wikipedia.org/wiki/index.html?curid=16073 >; that works
> regardless of what article 16073 has been renamed. It will only break
> if article 16073 has been deleted. In the case that there is a new
> article with the same name, *usually* it isn't what you want.
>
>> 1 - The closest thing I have found are the Freebase Names... are these
>> related to the Wikipedia names? Are there any cases where a Freebase
>> Name will not map to a Wikipedia Name (and vice versa)?
>
> When Freebase topics are populated from Wikipedia articles, they are
> given the same name, minus any disambiguating parenthetical phrases; the
> article "San Francisco (film)" becomes a topic called "San Francisco"
> (and usually typed as a Film). Likewise "San Francisco (Midicronica
> song)," "San Francisco (typeface)," and "San Francisco (magazine)."
>
>> 2 - Are there any instances where a Wikipedia article may redirect to
>> another Wikipedia article, but these 2 articles have different
>> freebase guids or ids? One could imagine the scenario where 2 distinct
>> articles once existed on Wikipedia (and Freebase assigned guids/ids to
>> each), but were later merged into one and the other changed to a
>> redirect. (Assume that Wikipedia has remained static since the last
>> data dump). My gut instinct is that they may have different freebase
>> guids, but the freebase ids should be the same.
>
> There are cases like this. And usually the Freebase IDs should *not* be
> the same. Wikipedia articles often conflate multiple ideas. One
> example is that someone makes a Wikipedia article for the soundtrack of
> a movie and Freebase imports it. Someone else at Wikipedia decides that
> the soundtrack article isn't significant enough to warrant its own
> article and merges it into the article on the movie. Freebase still
> wants to preserve distinct topics for the film and its soundtrack, as
> they have very different properties.
>
> When the Wikipedia articles have been merged for a good reason, then the
> Freebase topics should be merged as well.
>
>> 3 - Is it possible to obtain the "freebase id" (not necessarily the
>> guid) from the data dumps?
>
> You can look at the /type/object/key properties of an object and follow
> them back to the root node. This is what MQL does when constructing an
> ID. However, there is not necessarily a single canonical ID for a
> topic; the film _Moonraker_ is known as
> /wikipedia/en/Moonraker_$0028film$0029, /wikipedia/en/Manuela,
> /user/hal/netflix/movie/772477, and /authority/imdb/title/tt0079574,
> among others.
>
> ~Chris
> --
> Christopher R. Maden
> Data Architect
> Freebase.com: <URL: http://www.freebase.com/ >
> Metaweb Technologes, Inc. <URL: http://www.metaweb.com/ >
> _______________________________________________
> Developers mailing list
> Developers at freebase.com
> http://lists.freebase.com/mailman/listinfo/developers
>
More information about the Developers
mailing list