Skip to main content
    Download (.pdf)
    deleted
    The ability to distinguish statistically dif- ferent populations of speakers or writers can be an important asset in many NLP applications. In this paper, we describe a method of using document similarity measures to describe differences... more
    The ability to distinguish statistically dif- ferent populations of speakers or writers can be an important asset in many NLP applications. In this paper, we describe a method of using document similarity measures to describe differences in be- havior between native and non-native speakers of English in a writing task.1
    Download (.pdf)
    Download (.pdf)
    ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most... more
    ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most significant components of variation from a large data matrix, and which has previously been used in methods like Latent Semantic Analysis to identify latent semantic variables in text. We present the results of applying these scores to standard synonym benchmark tests, and argue on the basis of these results that our similarity metric represents an aspect of word usage which is largely orthogonal to that addressed by other methods, such as Latent Semantic Analysis. In particular, it appears that this method captures similarity with respect to the participation of words in grammatical constructions, at a level of generalization corresponding to broad syntacticosemantic classes such as body part terms, kin terms and the like. Aside from assessing word similarity, this method has promising applications in language modeling and automatic lexical acquisition.
    Abstract Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares... more
    Abstract Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with another on the major classes: PERSON, ORGANIZATION, and LOCATION. We report on a comparison of three state-of-the-art named entity taggers: Stanford, LBJ, and IdentiFinder. The taggers were compared with respect to: 1) Agreement rate on the classification of entities by class, and 2) Percentage ...
    Download (.pdf)
    Download (.pdf)
    Systems and methods for detecting collocation errors in a text sample using a reference database from a corpus are provided. Collocation candidates are identified within the text sample based upon syntactic patterns in the text sample.... more
    Systems and methods for detecting collocation errors in a text sample using a reference database from a corpus are provided. Collocation candidates are identified within the text sample based upon syntactic patterns in the text sample. Whether a given collocation candidate contains a collocation error is detected, the detecting including: determining a first association measure using the reference database for the given collocation candidate; determining whether the first association measure satisfies a predetermined condition and ...
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Thesis (Ph. D.)--University of Chicago, Department of Linguistics, June 1987. Includes bibliographical references (leaves 351-359).
    In educational settings, assessment targets determine the need for local validation. Instructional improvement, for example, is validated by examining the relationship between curricular innovations and improvements in criterion measures... more
    In educational settings, assessment targets determine the need for local validation. Instructional improvement, for example, is validated by examining the relationship between curricular innovations and improvements in criterion measures such as course grades. In such cases, as both the educational measure-ment community (Cizek, 2008; Shepard, 2006) and the writing assessment community (Good, Osborne, and Birchfield, 2012; Huot, 1996; Lynne, 2004) recognize, assessments are most meaningful when they are site based, locally controlled, context sensitive, rhetorically informed, accountable, meaningful, and fair. In the context of a first-year writing course, there are multiple reasons and occasions for measurement. Before a student enrolls, some information may be available and used for placement; but placement decisions are not perfect, and it is important to identify students who may require additional instructional support (Complete College America, 2012). At course completion, ove...
    ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most... more
    ABSTRACT This paper presents a novel method of generating word similarity scores, using a term by n-gram context matrix which is compressed using Singular Value Decomposition, a statistical data analysis method that extracts the most significant components of variation from a large data matrix, and which has previously been used in methods like Latent Semantic Analysis to identify latent semantic variables in text. We present the results of applying these scores to standard synonym benchmark tests, and argue on the basis of these results that our similarity metric represents an aspect of word usage which is largely orthogonal to that addressed by other methods, such as Latent Semantic Analysis. In particular, it appears that this method captures similarity with respect to the participation of words in grammatical constructions, at a level of generalization corresponding to broad syntacticosemantic classes such as body part terms, kin terms and the like. Aside from assessing word similarity, this method has promising applications in language modeling and automatic lexical acquisition.
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    ... cf. Brugman (1980), Deane (1987a), Johnson (1987), Lakoff (1987), Lakoff and Johnson (1980), Lindner (1981), Norvig and Lakoff (1987)). In other words, semantic relatedness is a function of the structure of human memory. ...
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    This paper describes the first prototype of an automated tool for detecting collocation errors in texts written by non-native speakers of English. Candidate strings are extracted by pattern matching over POS-tagged text. Since learner... more
    This paper describes the first prototype of an automated tool for detecting collocation errors in texts written by non-native speakers of English. Candidate strings are extracted by pattern matching over POS-tagged text. Since learner texts often contain spelling and morphological errors, the tool attempts to automatically correct them in order to reduce noise. For a measure of collocation strength, we use the rank-ratio statistic calculated over one billion words of native-speaker texts. Two human annotators evaluated the system's performance ...
    ... Saint-Dizier and Viegas provide a brief review of lexical semantics covering classic concepts in several key frameworks, including Jackendoff's lexical ... Ann Copestake, and Alex Lascarides; 'A non-monotonic approach to... more
    ... Saint-Dizier and Viegas provide a brief review of lexical semantics covering classic concepts in several key frameworks, including Jackendoff's lexical ... Ann Copestake, and Alex Lascarides; 'A non-monotonic approach to lexical seman-tics' by Daniel Kayser and Hocine Abir. ...
    Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with... more
    Most work on evaluation of named-entity recognition has been done in the context of competitions, as a part of Information Extraction. There has been little work on any form of extrinsic evaluation, and how one tagger compares with another on the major classes: PERSON, ORGANIZATION, and LOCATION. We report on a comparison of three state-of-the-art named entity taggers: Stanford, LBJ, and IdentiFinder. The taggers were compared with respect to: 1) Agreement rate on the classification of entities by class, and 2) Percentage ...
    The report is the first systematic evaluation of the sentence equivalence item type introduced by the GRE® revised General Test. We adopt a validity framework to guide our investigation based on Kane’s approach to validation whereby a... more
    The report is the first systematic evaluation of the sentence equivalence item type introduced by the GRE® revised General Test. We
    adopt a validity framework to guide our investigation based on Kane’s approach to validation whereby a hierarchy of inferences that
    should be documented to support score meaning and interpretation is evaluated. We present evidence relevant to the generalization
    inference as well as evidence of construct representation.We analyzed the pool of sentence equivalence items in three studies. The first
    and second studies focused on the generalization inference and sought to document the construction principles behind the sentence
    equivalence items, specifically the nature of the vocabulary tested.The third study focused on construct representation and evaluated
    the contribution of the stem, the keys, and the distractors to item difficulty. We concluded that the vocabulary tested by the sentence
    equivalence items is appropriate given the purpose of the GRE, namely, to assist in the selection of graduate students. The difficulty
    of the items was shown to be, in part, a function of the familiarity of the vocabulary as well as the context in which the vocabulary is
    tested, which we argue is positive validity evidence.
    Download (.pdf)

    And 1 more