Educational Testing Service
Text, Language and Computation
Abstract We describe our submissions to the WMT11 shared MT evaluation task: MTeRater and MTeRater-Plus. Both are machine-learned metrics that use features from e-rater��, an automated essay scoring engine designed to assess writing... more
Abstract Automatic Machine Translation (MT) evaluation metrics have traditionally been evaluated by the correlation of the scores they assign to MT output with human judgments of translation performance. Different types of human... more
Abstract This annotation study is designed to help us gain an increased understanding of paraphrase strategies used by native and nonnative English speakers and how these strategies might affect test takers' essay scores. Toward that end,... more
Abstract. While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon (s) for recognizing sentiment polarity in essays... more
The intent of this article is to introduce readers to the area of natural language processing, commonly referred to as NLP. However, rather than just describing the salient concepts of NLP, this article uses the Python programming... more
Abstract We describe our first attempts to re-engineer the curriculum of our introductory NLP course by using two important building blocks:(1) Access to an easy-to-learn programming language and framework to build hands-on programming... more
Abstract Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares... more
Abstract Statistical n-gram language modeling is a very important technique in Natural Language Processing (NLP) and Computational Linguistics used to assess the fluency of an utterance in any given language. It is widely employed in... more
Abstract Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to... more
Abstract Hopper and Thompson (1980) defined a multi-axis theory of transitivity that goes beyond simple syntactic transitivity and captures how much" action" takes place in a sentence. Detecting these features requires a deep... more
Abstract Many problems in natural language processing can be viewed as variations of the task of measuring the semantic textual similarity between short texts. However, many systems that address these tasks focus on a single task and may... more
Abstract Many writing assessments use generic prompts about social issues. However, we currently lack an understanding of how test takers respond to such prompts. In the absence of such an understanding, automated scoring systems may not... more
Abstract This paper describes a new evaluation metric, TER-Plus (TERp) for automatic evaluation of machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on the success of TER as an evaluation metric and... more
The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks... more
Abstract The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts.... more
Abstract The frequent occurrence of divergences���structural differences between languages---presents a great challenge for statistical word-level alignment and machine translation. This paper describes the adaptation of DUSTer, a... more
Abstract Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to... more
Abstract The task of paraphrasing is inherently familiar to speakers of all languages. Moreover, the task of automatically generating or extracting semantic equivalences for the various units of language-words, phrases, and sentences-is... more
Information needs are complex, evolving, and difficult to express or capture (Taylor, 1962), a fact that is often overlooked by modern information retrieval systems. TREC, through the HARD track, has been attempting to introduce elements... more