[Data-modeling] What's the best way to representacronyms/initialisms?

Robert Cook robert at metaweb.com
Thu Jan 24 23:04:59 UTC 2008


Just to really beat this one to death, the model that Shawn created  
would handle Faye's use case.  For instance, you could construct a MQL  
query to find a domain-specific abbreviation by querying for the text  
value on "Abbreviated topic" and looking for specific co-types in a  
particular domain (like medicine).  For instance, "find me the disease  
that matches the abbreviation 'HIV'".

R

On Jan 24, 2008, at 2:00 PM, Jeff Prucher wrote:

> You bring up an interesting (at least from a lexicographical  
> standpoint) point about abbreviations, which is that abbreviations  
> can have multiple definitions, even if they are abbreviations of the  
> same phrase. But I'm not sure it affects the model particularly.  
> Using Shawn's model, the abbreviation "OOC" would appear as an  
> abbreviation on two topics -- "Out of character" (for RPGs) and "Out  
> of character" (for fanfic).  Querying against string values would  
> reveal that "OOC" had multiple expansions.
>
> Or, using the schema that I proposed downthread (it's still  
> attached, if you scroll down), you would just need to create two  
> property instances for "out of character" -- one for each of the  
> corresponding topics. (I still don't think that schema is the best  
> way to do this.)
>
> Looking up all the abbreviations used by a given field/interest  
> group would still be beyond either of these models, though.
>
> Jeff
>
> From: data-modeling-bounces at freebase.com [mailto:data-modeling-bounces at freebase.com 
> ] On Behalf Of Faye Li
> Sent: Thursday, January 24, 2008 12:26 PM
> To: Freebase data modeling mailing list
> Subject: Re: [Data-modeling] What's the best way to  
> representacronyms/initialisms?
>
> Hi,
>
> I'm late to the discussion although I've watching this thread with  
> interest.
>
> Abbreviations/acronyms are inherently vague and the same grouping of  
> letters can stand for different things in different contexts. It  
> would be nice to record the complete name for an abbreviation along  
> with the domain or context where the complete name makes sense.
>
> For example, OOC (out of character) is a term used in both the role- 
> playing gaming community as well as the domain of fan fiction, yet  
> means different things to the two groups. If I were looking up the  
> term OOC, it'd be nice to see a hint that constrains each complete  
> name/definition to the domain where the definition is recognized.
>
> It's not unlike the way Fictional Character is modeled to include a  
> "Fictional Universe" property, thus clearly differentiating "The  
> Doctor" character in the "Doctor Who" series from "The Doctor"  
> character (aka Emergency Medical Hologram") in the "Star Trek:  
> Voyager" universe.
>
> Use cases I have in mind that will benefit from this modeling  
> approach include:
> 1) Looking up the complete name/explanation for an multi-defined  
> abbreviation/acronym, perhaps the ability to constrain by context
> 2) Looking up all abbreviations/acronyms in a domain (like looking  
> up all characters in a Fictional Universe)
>
> -- Faye
>
>
> Shawn Simister wrote:
>>
>> Good point. I've updated the schema accordingly.
>>
>> Robert Cook wrote:
>>>
>>> Shawn -- I think this is the right model, although I would use  
>>> "text" instead of "machine readable string" as the expected type.   
>>> Text literals can have translations and would accommodate  
>>> situations where the abbreviation is different in other  
>>> languages.  For example, the topic "AIDS" would be "SIDA" in  
>>> several romance languages.
>>>
>>> http://www.freebase.com/view/explore/topic/en/aids (scroll down to  
>>> "outgoing properties")
>>>
>>> Сида (/type/text)
>>> AIDS (/type/text)
>>> 에이즈 (/type/text)
>>> SIDA (/type/text)
>>>
>>> These are /type/object/name property values, but could easily be / 
>>> common/abbreviated_type/abbreviation values.
>>>
>>> R
>>>
>>> On Jan 23, 2008, at 12:23 AM, Shawn Simister wrote:
>>>
>>>> Thanks for all the feedback. I had no idea there was so much  
>>>> interest in this topic.
>>>>
>>>> Robert's idea of creating an Abbreviated Topic co-type would be  
>>>> my preference of everything that has been discussed so far. It's  
>>>> simple, easy to use and I also like the idea of expanding its  
>>>> scope to include all abbreviations. The key to this approach  
>>>> would be to have the Freebase autocomplete pick up on these new  
>>>> abbreviation values in the same way that it currently treats the  
>>>> alias property. While I think that a CVT would provide a lot of  
>>>> flexibility, it's just too confusing to attract new abbreviation  
>>>> entries from casual users.
>>>>
>>>> Maybe some of the commonly abbreviated types like Organization  
>>>> could include Abbreviated Topic in their schema to encourage its  
>>>> use. In fact, some types like Unit Profile already have  
>>>> abbreviation properties that could be factored out.
>>>>
>>>> I've published a draft version of the proposed Abbreviated Topic  
>>>> type in my default domain just to make sure we're on the same  
>>>> page about what this would look like.
>>>>
>>>> Shawn
>>>>
>>>> Robert Cook wrote:
>>>>>
>>>>> I've often thought that the "also known as" (alias) field of / 
>>>>> common/topic is too broad and there might be other properties  
>>>>> for capturing alternate names.  One idea would be to have a  
>>>>> property called "abbreviation" that is of type text.  This  
>>>>> property could be on a new type /common/abbreviated_topic and  
>>>>> would capture acronyms, initialisms (that are typically thought  
>>>>> of as acronyms) and abbreviations.
>>>>>
>>>>> Topics like "National Educational Association" would be co-typed  
>>>>> "Abbreviated Topic", and the abbreviation property would contain  
>>>>> "NEA".  For "NASA" it would contain "NASA".  To capture  
>>>>> "National Aeronautics and Space Administration" there would be a  
>>>>> property /common/abbreviated_topic/complete_name
>>>>>
>>>>> Of course, this introduces a denormalization.  There would be  
>>>>> two properties with "NASA" - the display name of the topic and  
>>>>> the new abbreviation property.  I personally think this is fine  
>>>>> and I anticipate that this pattern will happen elsewhere  
>>>>> (perhaps /people/person/given_name and /people/person/ 
>>>>> family_name; also /common/topic/common_misspellings).
>>>>>
>>>>> Shawn -- would this work for imagined applications?
>>>>>
>>>>> R
>>>>>
>>>>>
>>>>> On Jan 22, 2008, at 10:42 AM, Jeff Prucher wrote:
>>>>>
>>>>>> I agree with Ed that common usage should determine whether a  
>>>>>> topic name should be the expanded form or an acronym/intialism/ 
>>>>>> abbreviation, although this is obviously a judgement call, and  
>>>>>> anyone who feels differently is free to rename the topic.  This  
>>>>>> ability to rename the topic argues against something like the  
>>>>>> "abbreviation" type (http://www.freebase.com/view/schema/user/skud/default_domain/abbreviation 
>>>>>> ), since if someone renames the NASA topic to "National  
>>>>>> Aeronautics and Space Administration", the topic is no longer  
>>>>>> an abbreviation.
>>>>>>
>>>>>> I have two other thoughts. One would be to create a type for  
>>>>>> initialisms and acronyms. It would exist separately from any  
>>>>>> topics that happened to share that acronym, so there would be  
>>>>>> one topic for "NASA", the space agency, and one for "NASA", the  
>>>>>> acronym.  (This would get less weird for shared initialisms  
>>>>>> like "SF", which can refer to San Francisco, science fiction,  
>>>>>> and who-knows-what-all.)  There could be a property that linked  
>>>>>> to a CVT -- one property of the CVT would be the expansion of  
>>>>>> the acronym as a text string, the second property would be an  
>>>>>> optional link to the corresponding Freebase topic.  (It'd be a  
>>>>>> CVT so that there was no confusion about which string went with  
>>>>>> which topic for shared acronyms.)  The main problem with this  
>>>>>> is that it would be confusing, and the acronym topic for NASA  
>>>>>> (or whatever) would start to accrue other types since people  
>>>>>> would be certain to make the wrong selection from autocomplete  
>>>>>> on occasion.  We'd probably also have to deal with a lot of  
>>>>>> merge requests between the acronym topic and the more general  
>>>>>> topic.
>>>>>>
>>>>>> That said, we'd still want the aliases on the topics that  
>>>>>> aren't of type "acronym" to include the acronyms or expanded  
>>>>>> names, since, as Shawn points out, it affects searches.
>>>>>>
>>>>>> My current take on this (and I'm open to other opinions) is  
>>>>>> that my proposal here is an interesting thought-experiment, but  
>>>>>> is probably not the right way to handle this.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> From: data-modeling-bounces at freebase.com [mailto:data-modeling-bounces at freebase.com 
>>>>>> ] On Behalf Of Ed Laurent
>>>>>> Sent: Monday, January 21, 2008 9:21 PM
>>>>>> To: Freebase data modeling mailing list
>>>>>> Subject: Re: [Data-modeling] What's the best way to  
>>>>>> representacronyms/initialisms?
>>>>>>
>>>>>> It would be nice to have a naming standard and universal  
>>>>>> acronym type that fits all situations. I've been naming the  
>>>>>> topic as its full name or acronym depending on which is more  
>>>>>> commonly used (e.g., ETM+). I always add the full name as a  
>>>>>> synonym if it the topic name is an acronym. I add the acronym  
>>>>>> as a synonym if the topic is named using the full name but the  
>>>>>> acronym it is commonly used. I try to use Kirrily's acronym  
>>>>>> type when possible but I've also listed it as a machine  
>>>>>> readable string sometimes. Either way, I always add an acronym  
>>>>>> property to the type if one is ever used (e.g., satellite  
>>>>>> sensor).  This approach is not really a standard because I flip  
>>>>>> between Kirrily's type and machine readable string. It's also  
>>>>>> subjective whether or not the user thinks the acronym rises to  
>>>>>> the level of a synonym. However, the info is there for when/if  
>>>>>> a standard is set and this approach seems to work pretty well.
>>>>>>
>>>>>> -Ed
>>>>>>
>>>>>>
>>>>
>>>> _______________________________________________
>>>> Data-modeling mailing list
>>>> Data-modeling at freebase.com
>>>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>>
>>> _______________________________________________
>>> Data-modeling mailing list
>>> Data-modeling at freebase.com
>>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>>
>>
>> _______________________________________________
>> Data-modeling mailing list
>> Data-modeling at freebase.com
>> http://lists.freebase.com/mailman/listinfo/data-modeling
>>
>
> _______________________________________________
> Data-modeling mailing list
> Data-modeling at freebase.com
> http://lists.freebase.com/mailman/listinfo/data-modeling

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.freebase.com/pipermail/data-modeling/attachments/20080124/95e70788/attachment-0001.htm 


More information about the Data-modeling mailing list