Wordnet SQL解释[英] Wordnet SQL Explanation

本文是小编为大家收集整理的关于Wordnet SQL解释的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我正在尝试获取一个简单的同义词数据库,因此我可以找到用户输入的单词的同义词(没有其他!).为此,我抓住了WordNet SQL thesarus的副本( http://wnsql.sourceforge.net/),但是我现在在所有这些表中呈现,我找不到对其内容的任何简单解释:

adjpositions
adjpositiontypes
casedwords
lexdomains
lexlinks
linktypes
morphmaps
morphs
postypes
samples
semlinks
senses
synsets
vframemaps
vframes
vframesentencemaps
vframesentences
words

有人可以告诉我这些表所包含什么和我需要什么,因为我无法根据他们的数据解密他们的内容.

推荐答案

WordNet是一个超酷的单词数据库.我一直在研究它.我将在下面列出我的发现 - 希望它将帮助您更好地理解表.

同步表 SYNSETS表是数据库中最重要的表之一.它负责将WordNet中的所有定义包含.同步表中的每一行都有一个同步,定义,pos(语音字段的部分)和lexdomainid(链接到Lexdomain表) WordNet数据库中有117373合成器.

单词表 WordNet还具有一个"单词"表,只有两个字段:一个WordID和一个"引理".单词表负责将WordNet数据库中的所有引理(基本单词)包含. 此表中有146625条条目

所以..这两个表如何链接?答案?感官表!

感官表 感官表负责将单词(在单词表中)与定义(在同步表中)链接在一起. 感官表中的条目被称为"单词 - 义对" - 因为WordID与同步的每个配对都是单词的一个完整含义 - "单词的感觉".

WordNet数据库中总共有206,354个单词感官.

Lexdomains表 Lexdomains表由Sense表引用,并用于定义Word-Sense对属于的词汇域. Lexdomains表中有45个词汇域. 因此,Lexdomain表是WordNet"标记"单词态度对的方式.但是,这是非常有限的,因为单词态度对只能属于一个词汇域.

45个词汇域包括:

形容词: 所有,pert

副词 所有

名词 顶部,行为,动物,人工制品,属性,身体,认知,交流,事件,感觉,食物,食物,群体,位置,动机,物体,人,现象,植物,财产,过程,过程,数量,链接,链接,形状,状态,状态,实质,实质,实质,实质,实质时间,

动词 身体,变化,认知,沟通,竞争,消费,接触,创造,情感,运动,感知,拥有,社会,定位,天气,ppl

casedwords表 单词表中的某些单词自然具有首字母大写的IE:" A-Team".由于单词表将所有单词存储为小写字母,因此WordNet使用此表来指定单词的大写版本. 该表中有40313条条目.

WordNet DB中还有许多其他表,一旦我对它们进行了研究,我将再次发布.

查找您的同义词 要回答您有关同义词的问题 - 您需要执行以下操作.

假设您想找到" crient"一词的同义词.为了这样做,您首先搜索单词表以匹配" crancy"一词的引理.这将产生WordID 21253.然后,您将搜索感官表,以找到所有单词态度对carry carry.这会产生41个结果 - 每个结果列出了WordID 21253,而Senseid(这是单词sense对的索引)和SynSetID.

现在,然后您需要查询同步表的同步表,以便您可以访问同步表中的关联定义字段.

最后,要找到列出的每个综合体的同义词,您只需要搜索Sense表以查找共享同一同步的其他单词态度对.

示例: 下面列出了"携带"一词的41个单词义对之一:  Wordsense示例 如果我们查找此SynSetID 202083512的定义,您会发现"传输或用作传输媒介"

要找到此定义的所有同义词,然后您将搜索相同的Synsetid 202083512的感觉表 (注意:您需要左键加入单词表以获取实际的引理)

我希望这有助于您为您脱颖而出.

其他推荐答案

保罗·普雷比希(Paul Preibisch)解释了几个核心表,以下是其余部分的简短解释:

judpositionTypes - 定义了形容词可以使用英语,谓语,属性和直接后的三个位置.

辅助 - 链接混凝土单词(形容词)与其允许位置类型在Adjottytypes表中.

linkTypes - 定义了WordNet中使用的所有关系(链接)类型,其中大约有二十打. Lexlinks和Semlinks表都使用此表来定义每个链接的类型.某些链接类型标记为递归,这意味着,如果"家具"是"椅子"的超级nim,则"椅子"是"家具"的信.

lexlinks - 词汇链接,即单词之间的关系.示例:
悲伤 - 悲伤(派生)

semlinks - 语义链接,即同步之间的关系.示例:
椅子 - 家具(Hypernym)

变形 - 连接到"单词"表,包含不规则的单词形式.一个词可以具有多个形态,一个变体可以是多个单词的不规则形式,因此您还具有 morphmaps 表.示例:
算盘(单词)-Abaci(morph)
Abhor(word) - 憎恶,憎恶(变形)

postypes - 定义"语音部分".仅包含以下值:
n - 名词,v –verb,a - 形容词,R - 副词,S - 形容词卫星.

样品 - 合成集的样本句子.一个合成器可以具有多个样本.

vframemaps&vframes - vframes定义了一种标准的"动词模板". vframemaps将单词(动词)与相应的vframes链接.

vframeentencemaps&vframeentences - 与前两个表相似,在这里,您将整个句子作为动词模板.

其他推荐答案

要正确理解WordNet中各个术语的含义,您应该阅读广泛的文档.对于同义词,您将主要需要synsets表.您已下载的项目中的实际数据库表在项目的 schema page .

本文地址:https://www.itbaoku.cn/post/597541.html

问题描述

I'm trying to get a simple synonym database up and running, so I can find synonyms of words the user entered (nothing else!). For this I grabbed a copy of the Wordnet sql thesarus (http://wnsql.sourceforge.net/), but now I'm presented with all these tables, and I can't find any simple explanation for their content anywhere:

adjpositions
adjpositiontypes
casedwords
lexdomains
lexlinks
linktypes
morphmaps
morphs
postypes
samples
semlinks
senses
synsets
vframemaps
vframes
vframesentencemaps
vframesentences
words

Could someone tell me what these tables contain and which I need, since I cant decipher their content based on their data.

推荐答案

WordNet is a super cool word database. I have been researching it myself. I'll list my findings below - and hopefully it will help you to understand the tables better.

The Synset Table The synsets table is one of the most important tables in the database. It is responsible for housing all the definitions within WordNet. Each row in the synset table has a synsetid, a definition, a pos (parts of speech field) and a lexdomainid (which links to the lexdomain table) There are 117373 synsets in the WordNet Database.

The Words Table WordNet also has a “words” table, that only has two fields: a wordid, and a “lemma”. The words table is responsible for housing all the lemmas (base words) within the Wordnet Database. There are 146625 entries in this table

So.. how are these two tables linked? The answer? The sense table!

The Sense Table The sense table is responsible for linking together words (in the words table), with definitions (in the synset table). The entries in the sense table are referred as “word-sense pairs” - because each pairing of a wordid with a synset is one complete meaning of a word - a “sense of the word”.
There are a total of 206,354 word senses in the WordNet database.

The Lexdomains table The Lexdomains table is referenced by the sense table, and is used to define what lexical domain a word-sense pair belongs to. There are 45 lexical domains in the lexdomains table. The lexdomain table therefore, is WordNet’s way of “tagging” a word-sense pair. However, it is quite limited, because a word-sense pair can only belong to ONE lexical domain.

The 45 lexical domains include:

Adjectives: all, pert

Adverbs all

Nouns tops, act, animal, artifact, attribute,body, cognition, communication, event, feeling, food, group, location,motive,object, person, phenomenon, plant, possession, process, quantity,linkdef, shape, state, substance, time,

Verbs body, change, cognition,communication, competition, consumption, contact, creation, emotion, motion, perception, possession, social, stative, weather, ppl

The casedwords table Some words within the words table naturally have the first letter capitalized ie: “A-team”. Since the words table stores all words as lowercase, WordNet uses this table to specify the uppercase version of the word. There are 40313 entries in this table.

There are many other tables in the WordNet DB, once I have them researched, I'll post again.

Finding yer synonyms To answer your question regarding synonyms - You need to do the following.

Let's say you want to find the synonyms for the word "Carry". In order to do so, you would first search the words table for a lemma matching the word "carry". This would yield the wordid 21253. You would then search the senses table, to find all word-sense pairs for the word carry. This yields 41 results - each result lists the wordid 21253, and a senseid (which is the index of the word-sense pair) and a synsetid.

Now, you would then need to query the synset table for each of the synsetid's returned so you can access the associated definition field in the synset table.

Lastly to find the synonyms for each of the synsets listed, you'd simply need to search the sense table for other word-sense pairs that shared the same synset.

Example: One of the 41 word-sense pairs for the word "carry" is listed below: wordsense example If we lookup the definition for this synsetid 202083512, you will find “transmit or serve as the medium for transmission”

To find all the synonyms for this definition, you would then search the sense table for the same synsetid 202083512. This yields synonyms: channel, conduct, convey, impart, and transmit (note: you will need to left join the words table to get the actual lemmas)

I hope this helps demystify WordNet for you.. I'm finding it to be quite cool...

其他推荐答案

Paul Preibisch explained several core tables, here are short explanations for the rest of them:

adjpositiontypes - defines three positions that adjectives can take in English language, predicate, attributive and immediatelly postnominal.

adjpositions - links concrete words (adjectives) with their allowed position types in adjpositiontypes table.

linktypes - defines all relation (link) types used in wordnet, about two dozen of them. Both lexlinks and semlinks tables use this table to define the type of each link. Some link types are marked as recursive, meaning that if "furniture" is, for example, a hypernim to a "chair", then a "chair" is a hyponym to "furniture".

lexlinks - lexical links, i.e., relations between words. Example:
sad - saddness (derivation)

semlinks - semantic links, i.e. relations between synsets. Example:
chair - furniture (hypernym)

morphs - connected to "words" table, contains irregular word forms. One word can have multiple morphs, and one morph can be an irregular form for multiple words, so you additionally have the morphmaps table. Examples:
abacus (word) - abaci (morph)
abhor (word) - abhorred, abhorring (morphs)

postypes - defines "parts of speech". Contains only following values:
n – noun, v –verb, a – adjective, r – adverb, s – adjective satellite.

samples - sample sentences for synsets. One synset can have multiple samples.

vframemaps & vframes - vframes define a kind of standard "verb templates". Vframemaps links words (verbs) with corresponding vframes in which they can appear.

vframesentencemaps & vframesentences - similar to previous two tables, just here you have entire sentences as verb templates.

其他推荐答案

To properly understand the meaning of the various terms in Wordnet, you should read the extensive documentation. For synonyms, you'll primarily need the synsets table. The actual database tables in the project you've downloaded are described on the project's schema page.