Parse the text to build a DOM tree, propose a word co-occurrence model which bases on the DOM tree.
解析石油主题网页生成DOM树,设计了基于DOM的词共现模型。
参考来源 - 面向石油的主题搜索引擎研究·2,447,543篇论文数据,部分数据来源于NoteExpress
The paper extracts the text features based on a method of word co-occurrence frequency, which is related to the semantics of document.
本文基于词的同现频率的方法对文本的特征进行提取,涉及了文档的语义。
But traditional co-occurrence word retrieval methods used only a single statistic method, so the result is very imprecise, and needs lots of manual collation.
而传统的共现词提取方法仅仅局限在单一的一种统计量上,其结果十分不精确,需要人工再进行整理。
The word translation probability was calculated by 4 common co-occurrence models in bilingual corpus, and the translation equivalence was extracted according to translation probability.
基于双语语料库可自动抽取翻译等价对:利用4种常见的数学模型来计算任意两个词的共现频率,以共现频率的高低来获取翻译等价对。
应用推荐