Paul Deane
Educational Testing Service, Research and Development, Department Member
Research Interests:
The ability to distinguish statistically dif- ferent populations of speakers or writers can be an important asset in many NLP applications. In this paper, we describe a method of using document similarity measures to describe differences... more
The ability to distinguish statistically dif- ferent populations of speakers or writers can be an important asset in many NLP applications. In this paper, we describe a method of using document similarity measures to describe differences in be- havior between native and non-native speakers of English in a writing task.1
Research Interests:
Research Interests:
ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most... more
ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most significant components of variation from a large data matrix, and which has previously been used in methods like Latent Semantic Analysis to identify latent semantic variables in text. We present the results of applying these scores to standard synonym benchmark tests, and argue on the basis of these results that our similarity metric represents an aspect of word usage which is largely orthogonal to that addressed by other methods, such as Latent Semantic Analysis. In particular, it appears that this method captures similarity with respect to the participation of words in grammatical constructions, at a level of generalization corresponding to broad syntacticosemantic classes such as body part terms, kin terms and the like. Aside from assessing word similarity, this method has promising applications in language modeling and automatic lexical acquisition.
Abstract Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares... more
Abstract Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with another on the major classes: PERSON, ORGANIZATION, and LOCATION. We report on a comparison of three state-of-the-art named entity taggers: Stanford, LBJ, and IdentiFinder. The taggers were compared with respect to: 1) Agreement rate on the classification of entities by class, and 2) Percentage ...
Research Interests:
Systems and methods for detecting collocation errors in a text sample using a reference database from a corpus are provided. Collocation candidates are identified within the text sample based upon syntactic patterns in the text sample.... more
Systems and methods for detecting collocation errors in a text sample using a reference database from a corpus are provided. Collocation candidates are identified within the text sample based upon syntactic patterns in the text sample. Whether a given collocation candidate contains a collocation error is detected, the detecting including: determining a first association measure using the reference database for the given collocation candidate; determining whether the first association measure satisfies a predetermined condition and ...
Thesis (Ph. D.)--University of Chicago, Department of Linguistics, June 1987. Includes bibliographical references (leaves 351-359).
In educational settings, assessment targets determine the need for local validation. Instructional improvement, for example, is validated by examining the relationship between curricular innovations and improvements in criterion measures... more
In educational settings, assessment targets determine the need for local validation. Instructional improvement, for example, is validated by examining the relationship between curricular innovations and improvements in criterion measures such as course grades. In such cases, as both the educational measure-ment community (Cizek, 2008; Shepard, 2006) and the writing assessment community (Good, Osborne, and Birchfield, 2012; Huot, 1996; Lynne, 2004) recognize, assessments are most meaningful when they are site based, locally controlled, context sensitive, rhetorically informed, accountable, meaningful, and fair. In the context of a first-year writing course, there are multiple reasons and occasions for measurement. Before a student enrolls, some information may be available and used for placement; but placement decisions are not perfect, and it is important to identify students who may require additional instructional support (Complete College America, 2012). At course completion, ove...
ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most... more
ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most significant components of variation from a large data matrix, and which has previously been used in methods like Latent Semantic Analysis to identify latent semantic variables in text. We present the results of applying these scores to standard synonym benchmark tests, and argue on the basis of these results that our similarity metric represents an aspect of word usage which is largely orthogonal to that addressed by other methods, such as Latent Semantic Analysis. In particular, it appears that this method captures similarity with respect to the participation of words in grammatical constructions, at a level of generalization corresponding to broad syntacticosemantic classes such as body part terms, kin terms and the like. Aside from assessing word similarity, this method has promising applications in language modeling and automatic lexical acquisition.
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
... cf. Brugman (1980), Deane (1987a), Johnson (1987), Lakoff (1987), Lakoff and Johnson (1980), Lindner (1981), Norvig and Lakoff (1987)). In other words, semantic relatedness is a function of the structure of human memory. ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
This paper describes the first prototype of an automated tool for detecting collocation errors in texts written by non-native speakers of English. Candidate strings are extracted by pattern matching over POS-tagged text. Since learner... more
This paper describes the first prototype of an automated tool for detecting collocation errors in texts written by non-native speakers of English. Candidate strings are extracted by pattern matching over POS-tagged text. Since learner texts often contain spelling and morphological errors, the tool attempts to automatically correct them in order to reduce noise. For a measure of collocation strength, we use the rank-ratio statistic calculated over one billion words of native-speaker texts. Two human annotators evaluated the system's performance ...
Research Interests:
... Saint-Dizier and Viegas provide a brief review of lexical semantics covering classic concepts in several key frameworks, including Jackendoff's lexical ... Ann Copestake, and Alex Lascarides; 'A non-monotonic approach to... more
... Saint-Dizier and Viegas provide a brief review of lexical semantics covering classic concepts in several key frameworks, including Jackendoff's lexical ... Ann Copestake, and Alex Lascarides; 'A non-monotonic approach to lexical seman-tics' by Daniel Kayser and Hocine Abir. ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with... more
Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with another on the major classes: PERSON, ORGANIZATION, and LOCATION. We report on a comparison of three state-of-the-art named entity taggers: Stanford, LBJ, and IdentiFinder. The taggers were compared with respect to: 1) Agreement rate on the classification of entities by class, and 2) Percentage ...
