[Data-modeling] What's the best way to representacronyms/initialisms?
Jeff Prucher
jeff at metaweb.com
Tue Jan 22 18:42:40 UTC 2008
I agree with Ed that common usage should determine whether a topic name
should be the expanded form or an acronym/intialism/abbreviation, although
this is obviously a judgement call, and anyone who feels differently is free
to rename the topic. This ability to rename the topic argues against
something like the "abbreviation" type
(http://www.freebase.com/view/schema/user/skud/default_domain/abbreviation),
since if someone renames the NASA topic to "National Aeronautics and Space
Administration", the topic is no longer an abbreviation.
I have two other thoughts. One would be to create a type for initialisms and
acronyms. It would exist separately from any topics that happened to share
that acronym, so there would be one topic for "NASA", the space agency, and
one for "NASA", the acronym. (This would get less weird for shared
initialisms like "SF", which can refer to San Francisco, science fiction,
and who-knows-what-all.) There could be a property that linked to a CVT --
one property of the CVT would be the expansion of the acronym as a text
string, the second property would be an optional link to the corresponding
Freebase topic. (It'd be a CVT so that there was no confusion about which
string went with which topic for shared acronyms.) The main problem with
this is that it would be confusing, and the acronym topic for NASA (or
whatever) would start to accrue other types since people would be certain to
make the wrong selection from autocomplete on occasion. We'd probably also
have to deal with a lot of merge requests between the acronym topic and the
more general topic.
That said, we'd still want the aliases on the topics that aren't of type
"acronym" to include the acronyms or expanded names, since, as Shawn points
out, it affects searches.
My current take on this (and I'm open to other opinions) is that my proposal
here is an interesting thought-experiment, but is probably not the right way
to handle this.
Jeff
_____
From: data-modeling-bounces at freebase.com
[mailto:data-modeling-bounces at freebase.com] On Behalf Of Ed Laurent
Sent: Monday, January 21, 2008 9:21 PM
To: Freebase data modeling mailing list
Subject: Re: [Data-modeling] What's the best way to
representacronyms/initialisms?
It would be nice to have a naming standard and universal acronym type that
fits all situations. I've been naming the topic as its full name or acronym
depending on which is more commonly used (e.g., ETM+
<http://www.freebase.com/view/guid/9202a8c04000641f8000000006acbd1d> ). I
always add the full name as a synonym if it the topic name is an acronym. I
add the acronym as a synonym if the topic is named using the full name but
the acronym it is commonly used. I try to use Kirrily's acronym type when
possible but I've also listed it as a machine readable string sometimes.
Either way, I always add an acronym property to the type if one is ever used
(e.g., satellite
<http://www.freebase.com/view/schema/user/spatialed/land_cover/satellite_sen
sor> sensor). This approach is not really a standard because I flip between
Kirrily's type and machine readable string. It's also subjective whether or
not the user thinks the acronym rises to the level of a synonym. However,
the info is there for when/if a standard is set and this approach seems to
work pretty well.
-Ed
On Jan 21, 2008 11:55 PM, Shawn Simister <narphorium at gmail.com> wrote:
Kirrily Robert wrote:
----- "Shawn Simister" <mailto:narphorium at gmail.com> <narphorium at gmail.com>
wrote:
I have the finished pipe working if anyone wants to try it out. Its an
acronym lookup tool.
http://pipes.yahoo.com/narphorium/acronym
There aren't many acronyms listed in Freebase yet so only the major ones
give any results right
now.
That's very cool! Reminds me I was thinking of creating an FB type for
acronyms and initialisms. Might make your pipes a little less maze-like if
that existed.
K.
Thanks for the suggestions, I too was thinking that an acronym type could
help out. As you may have seen, the pipe got a little complicated when I
tried to factor in topics where the acronym is the main title of the topic
and the full name is one of the aliases (eg. DVD, NASA). A specialized type
would definitely help in those cases.
On the other hand, using the alias property might be easier to understand
for a new user editing a topic and, as I understand it, the alias properties
also help to improve the overall searchability of Freebase.
I've extracted about 8,000 acronym/initialism definitions from Wikipedia
which could be added to existing Freebase topics. Finding a standard way of
representing this data would be the logical next step. Any thoughts?
Shawn
P.S. Congrats on your new job!
_______________________________________________
Data-modeling mailing list
Data-modeling at freebase.com
http://lists.freebase.com/mailman/listinfo/data-modeling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20080122/b144a194/attachment-0001.htm
More information about the Data-modeling
mailing list