搜索结果: 1-12 共查到“计算语言学 corpora”相关记录12条 . 查询时间(0.093 秒)
Mining Parallel Corpora from Sina Weibo and Twitter
Mining Parallel Corpora Sina Weibo Twitter
2016/7/7
Microblogs such as Twitter, Facebook, and Sina Weibo (China’s equivalent of Twitter) are a
remarkable linguistic resource. In contrast to content from edited genres such as newswire,
microblogs cont...
Reflections on the Penn Discourse TreeBank,Comparable Corpora,and Complementary Annotation
Penn Discourse TreeBank Comparable Corpora Complementary Annotation
2015/9/14
The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either...
Evaluating Centering for Information Ordering Using Corpora
Evaluating Centering Information Ordering Using Corpora
2015/9/7
In this article we discuss several metrics of coherence defined using centering theory and investigate the usefulness of such metrics for information ordering in automatic text generation. We estimate...
Constructing Corpora for the Development and Evaluation of Paraphrase Systems
Paraphrase Systems Constructing Corpora
2015/9/6
Automatic paraphrasing is an important component in many natural language processing tasks.
In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition o...
Orthographic Errors in Web Pages:Toward Cleaner Web Corpora
Orthographic Errors Web Pages Cleaner Web Corpora
2015/9/1
Since the Web by far represents the largest public repository of natural language texts, recent experiments, methods, and tools in the area of corpus linguistics often use the Web as a corpus. For app...
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
Machine Translation Performance Exploiting Non-Parallel Corpora
2015/8/31
We present a novel method for discovering parallel sentences in comparable, non-parallel corpora.We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether o...
Parallel Text Processing: Alignment and Use of Translation Corpora
Translation Corpora Alignment
2015/8/26
One can’t help but be fascinated by two sentences in parallel translation, the selfsame
meaning diffused, distributed, diverging across alternative expressions. In his Le Ton
beau de Marot: In Prais...
Exclamatives and heightened emotion: Extracting pragmatic generalizations from large corpora
corpus pragmatics exclamatives expressives logistic regression
2015/6/15
Exclamatives like What a dump!, Wow!, and Boy, you’ve grown! are, when uttered in context, rich in information about the speaker’s attitudes. Drawing on evidence from about 100, 000 online product rev...
AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA
AUTOMATIC ACQUISITION LARGE SUBCATEGORIZATION DICTIONARY CORPORA
2015/6/12
This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser run...
Who Leads Whom:Topical Lead-Lag Analysis across corpora
Who Leads Whom Topical Lead Lag Analysis across corpora
2015/6/10
Understanding the lead/lag of communities in the context of a given topic is an interesting problem in computational social science. In this work, we study the particular problem of whether research g...
Unsupervised morphological analysis of small corpora: First experiments with Kilivila
Unsupervised morphological analysis small corpora First experiments with Kilivila
2015/4/21
Language documentation involves linguistic analysis of the collected material, which is typically done manually. Automatic methods for language processing usually require large corpora. The method pre...
Extracting of Translation Unit from Chinese-English Parallel Corpora
Hong Kong Legal Documents Corpus Hong Kong Special Administration Region
2009/2/8
The field of machine translation has changed remarkably little since its earliest days in the fifties. So far, useful machine translation could only obtained in very restricted domain. We believe one ...