[Data-modeling] What's the best way to representacronyms/initialisms?
Robert Cook
robert at metaweb.com
Thu Jan 24 23:04:59 UTC 2008
Just to really beat this one to death, the model that Shawn created
would handle Faye's use case. For instance, you could construct a MQL
query to find a domain-specific abbreviation by querying for the text
value on "Abbreviated topic" and looking for specific co-types in a
particular domain (like medicine). For instance, "find me the disease
that matches the abbreviation 'HIV'".
R
On Jan 24, 2008, at 2:00 PM, Jeff Prucher wrote:
> You bring up an interesting (at least from a lexicographical
> standpoint) point about abbreviations, which is that abbreviations
> can have multiple definitions, even if they are abbreviations of the
> same phrase. But I'm not sure it affects the model particularly.
> Using Shawn's model, the abbreviation "OOC" would appear as an
> abbreviation on two topics -- "Out of character" (for RPGs) and "Out
> of character" (for fanfic). Querying against string values would
> reveal that "OOC" had multiple expansions.
>
> Or, using the schema that I proposed downthread (it's still
> attached, if you scroll down), you would just need to create two
> property instances for "out of character" -- one for each of the
> corresponding topics. (I still don't think that schema is the best
> way to do this.)
>
> Looking up all the abbreviations used by a given field/interest
> group would still be beyond either of these models, though.
>
> Jeff
>
> From: data-modeling-bounces at freebase.com [mailto:data-modeling-bounces at freebase.com
> ] On Behalf Of Faye Li
> Sent: Thursday, January 24, 2008 12:26 PM
> To: Freebase data modeling mailing list
> Subject: Re: [Data-modeling] What's the best way to
> representacronyms/initialisms?
>
> Hi,
>
> I'm late to the discussion although I've watching this thread with
> interest.
>
> Abbreviations/acronyms are inherently vague and the same grouping of
> letters can stand for different things in different contexts. It
> would be nice to record the complete name for an abbreviation along
> with the domain or context where the complete name makes sense.
>
> For example, OOC (out of character) is a term used in both the role-
> playing gaming community as well as the domain of fan fiction, yet
> means different things to the two groups. If I were looking up the
> term OOC, it'd be nice to see a hint that constrains each complete
> name/definition to the domain where the definition is recognized.
>
> It's not unlike the way Fictional Character is modeled to include a
> "Fictional Universe" property, thus clearly differentiating "The
> Doctor" character in the "Doctor Who" series from "The Doctor"
> character (aka Emergency Medical Hologram") in the "Star Trek:
> Voyager" universe.
>
> Use cases I have in mind that will benefit from this modeling
> approach include:
> 1) Looking up the complete name/explanation for an multi-defined
> abbreviation/acronym, perhaps the ability to constrain by context
> 2) Looking up all abbreviations/acronyms in a domain (like looking
> up all characters in a Fictional Universe)
>
> -- Faye
>
>
> Shawn Simister wrote:
>>
>> Good point. I've updated the schema accordingly.
>>
>> Robert Cook wrote:
>>>
>>> Shawn -- I think this is the right model, although I would use
>>> "text" instead of "machine readable string" as the expected type.
>>> Text literals can have translations and would accommodate
>>> situations where the abbreviation is different in other
>>> languages. For example, the topic "AIDS" would be "SIDA" in
>>> several romance languages.
>>>
>>> http://www.freebase.com/view/explore/topic/en/aids (scroll down to
>>> "outgoing properties")
>>>
>>> Сида (/type/text)
>>> AIDS (/type/text)
>>> 에이즈 (/type/text)
>>> SIDA (/type/text)
>>>
>>> These are /type/object/name property values, but could easily be /
>>> common/abbreviated_type/abbreviation values.
>>>
>>> R
>>>
>>> On Jan 23, 2008, at 12:23 AM, Shawn Simister wrote:
>>>
>>>> Thanks for all the feedback. I had no idea there was so much
>>>> interest in this topic.
>>>>
>>>> Robert's idea of creating an Abbreviated Topic co-type would be
>>>> my preference of everything that has been discussed so far. It's
>>>> simple, easy to use and I also like the idea of expanding its
>>>> scope to include all abbreviations. The key to this approach
>>>> would be to have the Freebase autocomplete pick up on these new
>>>> abbreviation values in the same way that it currently treats the
>>>> alias property. While I think that a CVT would provide a lot of
>>>> flexibility, it's just too confusing to attract new abbreviation
>>>> entries from casual users.
>>>>
>>>> Maybe some of the commonly abbreviated types like Organization
>>>> could include Abbreviated Topic in their schema to encourage its
>>>> use. In fact, some types like Unit Profile already have
>>>> abbreviation properties that could be factored out.
>>>>
>>>> I've published a draft version of the proposed Abbreviated Topic
>>>> type in my default domain just to make sure we're on the same
>>>> page about what this would look like.
>>>>
>>>> Shawn
>>>>
>>>> Robert Cook wrote:
>>>>>
>>>>> I've often thought that the "also known as" (alias) field of /
>>>>> common/topic is too broad and there might be other properties
>>>>> for capturing alternate names. One idea would be to have a
>>>>> property called "abbreviation" that is of type text. This
>>>>> property could be on a new type /common/abbreviated_topic and
>>>>> would capture acronyms, initialisms (that are typically thought
>>>>> of as acronyms) and abbreviations.
>>>>>
>>>>> Topics like "National Educational Association" would be co-typed
>>>>> "Abbreviated Topic", and the abbreviation property would contain
>>>>> "NEA". For "NASA" it would contain "NASA". To capture
>>>>> "National Aeronautics and Space Administration" there would be a
>>>>> property /common/abbreviated_topic/complete_name
>>>>>
>>>>> Of course, this introduces a denormalization. There would be
>>>>> two properties with "NASA" - the display name of the topic and
>>>>> the new abbreviation property. I personally think this is fine
>>>>> and I anticipate that this pattern will happen elsewhere
>>>>> (perhaps /people/person/given_name and /people/person/
>>>>> family_name; also /common/topic/common_misspellings).
>>>>>
>>>>> Shawn -- would this work for imagined applications?
>>>>>
>>>>> R
>>>>>
>>>>>
>>>>> On Jan 22, 2008, at 10:42 AM, Jeff Prucher wrote:
>>>>>
>>>>>> I agree with Ed that common usage should determine whether a
>>>>>> topic name should be the expanded form or an acronym/intialism/
>>>>>> abbreviation, although this is obviously a judgement call, and
>>>>>> anyone who feels differently is free to rename the topic. This
>>>>>> ability to rename the topic argues against something like the
>>>>>> "abbreviation" type (http://www.freebase.com/view/schema/user/skud/default_domain/abbreviation
>>>>>> ), since if someone renames the NASA topic to "National
>>>>>> Aeronautics and Space Administration", the topic is no longer
>>>>>> an abbreviation.
>>>>>>
>>>>>> I have two other thoughts. One would be to create a type for
>>>>>> initialisms and acronyms. It would exist separately from any
>>>>>> topics that happened to share that acronym, so there would be
>>>>>> one topic for "NASA", the space agency, and one for "NASA", the
>>>>>> acronym. (This would get less weird for shared initialisms
>>>>>> like "SF", which can refer to San Francisco, science fiction,
>>>>>> and who-knows-what-all.) There could be a property that linked
>>>>>> to a CVT -- one property of the CVT would be the expansion of
>>>>>> the acronym as a text string, the second property would be an
>>>>>> optional link to the corresponding Freebase topic. (It'd be a
>>>>>> CVT so that there was no confusion about which string went with
>>>>>> which topic for shared acronyms.) The main problem with this
>>>>>> is that it would be confusing, and the acronym topic for NASA
>>>>>> (or whatever) would start to accrue other types since people
>>>>>> would be certain to make the wrong selection from autocomplete
>>>>>> on occasion. We'd probably also have to deal with a lot of
>>>>>> merge requests between the acronym topic and the more general
>>>>>> topic.
>>>>>>
>>>>>> That said, we'd still want the aliases on the topics that
>>>>>> aren't of type "acronym" to include the acronyms or expanded
>>>>>> names, since, as Shawn points out, it affects searches.
>>>>>>
>>>>>> My current take on this (and I'm open to other opinions) is
>>>>>> that my proposal here is an interesting thought-experiment, but
>>>>>> is probably not the right way to handle this.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> From: data-modeling-bounces at freebase.com [mailto:data-modeling-bounces at freebase.com
>>>>>> ] On Behalf Of Ed Laurent
>>>>>> Sent: Monday, January 21, 2008 9:21 PM
>>>>>> To: Freebase data modeling mailing list
>>>>>> Subject: Re: [Data-modeling] What's the best way to
>>>>>> representacronyms/initialisms?
>>>>>>
>>>>>> It would be nice to have a naming standard and universal
>>>>>> acronym type that fits all situations. I've been naming the
>>>>>> topic as its full name or acronym depending on which is more
>>>>>> commonly used (e.g., ETM+). I always add the full name as a
>>>>>> synonym if it the topic name is an acronym. I add the acronym
>>>>>> as a synonym if the topic is named using the full name but the
>>>>>> acronym it is commonly used. I try to use Kirrily's acronym
>>>>>> type when possible but I've also listed it as a machine
>>>>>> readable string sometimes. Either way, I always add an acronym
>>>>>> property to the type if one is ever used (e.g., satellite
>>>>>> sensor). This approach is not really a standard because I flip
>>>>>> between Kirrily's type and machine readable string. It's also
>>>>>> subjective whether or not the user thinks the acronym rises to
>>>>>> the level of a synonym. However, the info is there for when/if
>>>>>> a standard is set and this approach seems to work pretty well.
>>>>>>
>>>>>> -Ed
>>>>>>
>>>>>>
>>>>
>>>> _______________________________________________
>>>> Data-modeling mailing list
>>>> Data-modeling at freebase.com
>>>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>>
>>> _______________________________________________
>>> Data-modeling mailing list
>>> Data-modeling at freebase.com
>>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>>
>>
>> _______________________________________________
>> Data-modeling mailing list
>> Data-modeling at freebase.com
>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>
>
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20080124/95e70788/attachment-0001.htm
More information about the Data-modeling
mailing list