[Data-modeling] What's the best way to representacronyms/initialisms?
Jeff Prucher
jeff at metaweb.com
Thu Jan 24 22:00:19 UTC 2008
You bring up an interesting (at least from a lexicographical standpoint)
point about abbreviations, which is that abbreviations can have multiple
definitions, even if they are abbreviations of the same phrase. But I'm not
sure it affects the model particularly. Using Shawn's model, the
abbreviation "OOC" would appear as an abbreviation on two topics -- "Out of
character" (for RPGs) and "Out of character" (for fanfic). Querying
against string values would reveal that "OOC" had multiple expansions.
Or, using the schema that I proposed downthread (it's still attached, if
you scroll down), you would just need to create two property instances for
"out of character" -- one for each of the corresponding topics. (I still
don't think that schema is the best way to do this.)
Looking up all the abbreviations used by a given field/interest group would
still be beyond either of these models, though.
Jeff
_____
From: data-modeling-bounces at freebase.com [mailto:data-modeling-
bounces at freebase.com] On Behalf Of Faye Li
Sent: Thursday, January 24, 2008 12:26 PM
To: Freebase data modeling mailing list
Subject: Re: [Data-modeling] What's the best way to
representacronyms/initialisms?
Hi,
I'm late to the discussion although I've watching this thread with interest.
Abbreviations/acronyms are inherently vague and the same grouping of
letters can stand for different things in different contexts. It would be
nice to record the complete name for an abbreviation along with the domain
or context where the complete name makes sense.
For example, OOC (out of character) is a term used in both the role-playing
gaming community as well as the domain of fan fiction, yet means different
things to the two groups. If I were looking up the term OOC, it'd be nice
to see a hint that constrains each complete name/definition to the domain
where the definition is recognized.
It's not unlike the way Fictional Character is modeled to include a
"Fictional Universe" property, thus clearly differentiating "The Doctor"
character in the "Doctor Who" series from "The Doctor" character (aka
Emergency Medical Hologram") in the "Star Trek: Voyager" universe.
Use cases I have in mind that will benefit from this modeling approach
include:
1) Looking up the complete name/explanation for an multi-defined
abbreviation/acronym, perhaps the ability to constrain by context
2) Looking up all abbreviations/acronyms in a domain (like looking up all
characters in a Fictional Universe)
-- Faye
Shawn Simister wrote:
Good point. I've updated the schema accordingly.
Robert Cook wrote:
Shawn -- I think this is the right model, although I would use "text"
instead of "machine readable string" as the expected type. Text literals
can have translations and would accommodate situations where the
abbreviation is different in other languages. For example, the topic
"AIDS" would be "SIDA" in several romance languages.
http://www.freebase.com/view/explore/topic/en/aids (scroll down to
"outgoing properties")
Сида (/type/text)
AIDS (/type/text)
에이즈 (/type/text)
SIDA (/type/text)
These are /type/object/name property values, but could easily be
/common/abbreviated_type/abbreviation values.
R
On Jan 23, 2008, at 12:23 AM, Shawn Simister wrote:
Thanks for all the feedback. I had no idea there was so much interest in
this topic.
Robert's idea of creating an Abbreviated Topic co-type would be my
preference of everything that has been discussed so far. It's simple, easy
to use and I also like the idea of expanding its scope to include all
abbreviations. The key to this approach would be to have the Freebase
autocomplete pick up on these new abbreviation values in the same way that
it currently treats the alias property. While I think that a CVT would
provide a lot of flexibility, it's just too confusing to attract new
abbreviation entries from casual users.
Maybe some of the commonly abbreviated types like Organization could
include Abbreviated Topic in their schema to encourage its use. In fact,
some types like Unit Profile already have abbreviation properties that
could be factored out.
I've published a draft version of the proposed Abbreviated Topic
<http://freebase.com/view/schema/user/narphorium/default_domain/abbreviated_
topic> type in my default domain just to make sure we're on the same page
about what this would look like.
Shawn
Robert Cook wrote:
I've often thought that the "also known as" (alias) field of /common/topic
is too broad and there might be other properties for capturing alternate
names. One idea would be to have a property called "abbreviation" that is
of type text. This property could be on a new type
/common/abbreviated_topic and would capture acronyms, initialisms (that are
typically thought of as acronyms) and abbreviations.
Topics like "National Educational Association" would be co-typed
"Abbreviated Topic", and the abbreviation property would contain "NEA".
For "NASA" it would contain "NASA". To capture "National Aeronautics and
Space Administration" there would be a property
/common/abbreviated_topic/complete_name
Of course, this introduces a denormalization. There would be two
properties with "NASA" - the display name of the topic and the new
abbreviation property. I personally think this is fine and I anticipate
that this pattern will happen elsewhere (perhaps /people/person/given_name
and /people/person/family_name; also /common/topic/common_misspellings).
Shawn -- would this work for imagined applications?
R
On Jan 22, 2008, at 10:42 AM, Jeff Prucher wrote:
I agree with Ed that common usage should determine whether a topic name
should be the expanded form or an acronym/intialism/abbreviation, although
this is obviously a judgement call, and anyone who feels differently is
free to rename the topic. This ability to rename the topic argues against
something like the "abbreviation" type
(http://www.freebase.com/view/schema/user/skud/default_domain/abbreviation),
since if someone renames the NASA topic to "National Aeronautics and Space
Administration", the topic is no longer an abbreviation.
I have two other thoughts. One would be to create a type for initialisms
and acronyms. It would exist separately from any topics that happened to
share that acronym, so there would be one topic for "NASA", the space
agency, and one for "NASA", the acronym. (This would get less weird for
shared initialisms like "SF", which can refer to San Francisco, science
fiction, and who-knows-what-all.) There could be a property that linked to
a CVT -- one property of the CVT would be the expansion of the acronym as a
text string, the second property would be an optional link to the
corresponding Freebase topic. (It'd be a CVT so that there was no
confusion about which string went with which topic for shared acronyms.)
The main problem with this is that it would be confusing, and the acronym
topic for NASA (or whatever) would start to accrue other types since people
would be certain to make the wrong selection from autocomplete on occasion.
We'd probably also have to deal with a lot of merge requests between the
acronym topic and the more general topic.
That said, we'd still want the aliases on the topics that aren't of type
"acronym" to include the acronyms or expanded names, since, as Shawn points
out, it affects searches.
My current take on this (and I'm open to other opinions) is that my
proposal here is an interesting thought-experiment, but is probably not the
right way to handle this.
Jeff
_____
From: data-modeling-bounces at freebase.com [mailto:data-modeling-
bounces at freebase.com] On Behalf Of Ed Laurent
Sent: Monday, January 21, 2008 9:21 PM
To: Freebase data modeling mailing list
Subject: Re: [Data-modeling] What's the best way to
representacronyms/initialisms?
It would be nice to have a naming standard and universal acronym type that
fits all situations. I've been naming the topic as its full name or acronym
depending on which is more commonly used (e.g., ETM+
<http://www.freebase.com/view/guid/9202a8c04000641f8000000006acbd1d> ). I
always add the full name as a synonym if it the topic name is an acronym. I
add the acronym as a synonym if the topic is named using the full name but
the acronym it is commonly used. I try to use Kirrily's acronym type when
possible but I've also listed it as a machine readable string sometimes.
Either way, I always add an acronym property to the type if one is ever
used (e.g., satellite sensor
<http://www.freebase.com/view/schema/user/spatialed/land_cover/satellite_sen
sor> ). This approach is not really a standard because I flip between
Kirrily's type and machine readable string. It's also subjective whether or
not the user thinks the acronym rises to the level of a synonym. However,
the info is there for when/if a standard is set and this approach seems to
work pretty well.
-Ed
_______________________________________________
Data-modeling mailing list
Data-modeling at freebase.com
http://lists.freebase.com/mailman/listinfo/data-modeling
_____
_______________________________________________
Data-modeling mailing list
Data-modeling at freebase.com
http://lists.freebase.com/mailman/listinfo/data-modeling
_____
_______________________________________________
Data-modeling mailing list
Data-modeling at freebase.com
http://lists.freebase.com/mailman/listinfo/data-modeling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20080124/6d37205d/attachment-0001.htm
More information about the Data-modeling
mailing list