Skip to main content
Matthias  von Davier
    This article describes an exploration of the distinction between typological and factorial latent variables in the domain of personality theory. Traditionally, many personality variables have been considered to be factorial in nature,... more
    This article describes an exploration of the distinction between typological and factorial latent variables in the domain of personality theory. Traditionally, many personality variables have been considered to be factorial in nature, even though there are examples of typological constructs dating back to Hippocrates. Recently, some reconceptualizations of typological constructs have emerged due, in part, to the availability of more rigorous methodological tools for identification of types (or nominal latent traits). These tools ...
    Download (.pdf)
    ... 31B Some Notes on Models for Cognitively Based Skills Diagnosis. Shelby J. Haberman, Matthias von Davier. Available online 11 October 2006. Excerpt. Note: This is a one-page preview only. Click here to download preview. Enable... more
    ... 31B Some Notes on Models for Cognitively Based Skills Diagnosis. Shelby J. Haberman, Matthias von Davier. Available online 11 October 2006. Excerpt. Note: This is a one-page preview only. Click here to download preview. Enable JavaScript for PDF Excerpt to view it inline. ...
    ABSTRACT
    Research Interests:
    Download (.pdf)
    Download (.pdf)
    The SM-questionnaire developed by Snyder (1974) covers self-report items like 'I'm not always the person I appear to be' that usually were analyzed in a quantitative fashion, ie, by summing the item... more
    The SM-questionnaire developed by Snyder (1974) covers self-report items like 'I'm not always the person I appear to be' that usually were analyzed in a quantitative fashion, ie, by summing the item responses after coding all items in the same direction. In hundreds of empirical ...
    Download (.pdf)
    Download (.pdf)
    ABSTRACT Diagnostic models combine multiple binary latent variables in an attempt to produce a latent structure that provides more information about test takers' performance than do unidimensional latent variable models. Recent... more
    ABSTRACT Diagnostic models combine multiple binary latent variables in an attempt to produce a latent structure that provides more information about test takers' performance than do unidimensional latent variable models. Recent developments in diagnostic modeling emphasize the possibility that multiple skills may interact in a conjunctive way within the item function, while individual skills still may retain separable additive effects. This extension of either the conjunctive deterministic-input-noisy-and (DINA) model to the generalized version (G-DINA) or the compensatory/additive general diagnostic model (GDM) to the log-linear cognitive diagnostic model (LCDM) is aimed at integrating models with conjunctive skills and those that assume compensatory functioning of multiple skill variables. More recently, a result was proven mathematically that the fully conjunctive DINA model, which combines all required skills in a single binary function, may be recast as a compensatory special case of the GDM. This can be accomplished in more than one form such that the resulting transformed skill-space definitions and design (Q) matrices are different from each other but mathematically equivalent to the DINA model, producing identical model-based response probabilities. In this report, I extend this equivalency result to the LCDM and show that a mathematically equivalent, constrained GDM can be defined that yields identical parameter estimates based on a transformed set of compensatory skills.
    This article summarizes a number of features of probabilistic models providing additional diagnostic information on participants to supplement estimates of ability or other attributes. These features are commonly referred to as fit... more
    This article summarizes a number of features of probabilistic models providing additional diagnostic information on participants to supplement estimates of ability or other attributes. These features are commonly referred to as fit diagnostics (Molenaar, 1983). Fit diagnostics provide additional information as to how a model fails to predict certain individual responses. Instead of discussing the drawbacks of classical test theory
    In large-scale educational assessments, such as the Programme for International Student Assessment (PISA) and the Trends in Mathematics and Science Study (TIMSS), a primary concern is with the estimation of the population-level... more
    In large-scale educational assessments, such as the Programme for International Student Assessment (PISA) and the Trends in Mathematics and Science Study (TIMSS), a primary concern is with the estimation of the population-level characteristics of a number of latent variables and the relationships between latent variables and other variables. Typically these studies are undertaken in contexts in which there are constraints
    Download (.pdf)
    Download (.pdf)
    In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale... more
    In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often ignore item misfit in score scale calibrations. We also seek to obtain improved model-data fit estimates when calibrating international score scales. To this end, we examine the use of two alternative score scale calibration procedures: (a) a language-based score scale and (b) a more parsimonious international scale wherein a large proportion of international parameters are used with a subset of country-based parameters for items that misfit in the international scale. In our analyses, we used data from all 40 countries participating in the Progress in International Reading Literacy Study. Our findings revealed that current score scale calibr...
    Download (.pdf)
    ABSTRACT Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique... more
    ABSTRACT Multidimensional item response models can be based on multivariate normal ability distributions or on multivariate polytomous ability distributions. For the case of simple structure in which each item corresponds to a unique dimension of the ability vector, some applications of the two-parameter logistic model to empirical data are employed to illustrate how, at least for the example under study, comparable results can be achieved with either approach. Comparability involves quality of model fit as well as similarity in terms of parameter estimates and computational time required. In both cases, numerical work can be performed quite efficiently. In the case of the multivariate normal ability distribution, multivariate adaptive Gauss-Hermite quadrature can be employed to greatly reduce computational labor. In the case of a polytomous ability distribution, use of log-linear models permits efficient computations.
    Download (.pdf)
    Download (.pdf)
    Download (.pdf)
    ... 31B Some Notes on Models for Cognitively Based Skills Diagnosis. Shelby J. Haberman, Matthias von Davier. Available online 11 October 2006. Excerpt. Note: This is a one-page preview only. Click here to download preview. Enable... more
    ... 31B Some Notes on Models for Cognitively Based Skills Diagnosis. Shelby J. Haberman, Matthias von Davier. Available online 11 October 2006. Excerpt. Note: This is a one-page preview only. Click here to download preview. Enable JavaScript for PDF Excerpt to view it inline. ...
    Download (.pdf)
    Download (.pdf)
    ABSTRACT The following values have no corresponding Zotero field: Author Address: Educational Testing Service, Princeton, NJ, US Research Notes: 20070626 von Davier, A. A., Carstensen, C. H., & von Davier, M. (2006). Linking... more
    ABSTRACT The following values have no corresponding Zotero field: Author Address: Educational Testing Service, Princeton, NJ, US Research Notes: 20070626 von Davier, A. A., Carstensen, C. H., & von Davier, M. (2006). Linking Competencies in Educational Settings and Measuring Growth (No. RR-06-12). Princeton, NJ: ETS. Der Report ist eine meist sehr knapp gehaltene Sammlung und Beschreibung von Methoden rund um das Linking und Skaling. Die Beschreibungen sind jedoch so kompakt, dass ich, um sie wirklich zu verstehen, noch in der Originalliteratur nachlesen müsste. Wahrscheinlich sind die Beschreibungen aber sehr gut, wenn man schon einmal verstanden hatte, wie die einzelnen Methoden funktionieren. Bemerkenswerte Äußerung zum Thema Längsschnittstudien und ihre Modellierung in der Praxis (S. 23): Therefore, it appears that the inferential ambitions of the models usually exceed the capacity of the data to support these inferences, and, at the same time, the consequences of these inferences might be serio
    Download (.pdf)
    The need for equating arises when two or more tests on the same construct or subject area can yield different scores for the same examinee. The goal of test equating is to allow the scores on different forms of the same tests to be used... more
    The need for equating arises when two or more tests on the same construct or subject area can yield different scores for the same examinee. The goal of test equating is to allow the scores on different forms of the same tests to be used and interpreted interchangeably. Item response theory (IRT; Hambleton, Swaminathan, & Rogers, 1991; Lord, 1980; Thissen
    Research Interests:
    Download (.pdf)
    ABSTRACT The chapter gives an overview of Rasch models for the measurement of change across repeated observations of the same individuals and items. The models described herein include extensions of the original Rasch model that allow one... more
    ABSTRACT The chapter gives an overview of Rasch models for the measurement of change across repeated observations of the same individuals and items. The models described herein include extensions of the original Rasch model that allow one to analyze multidimensional latent constructs and to incorporate heterogeneity of change across individuals. In particular, the use of mixture-distribution Rasch models in longitudinal research allows one to model quantitative interindividual differences in a latent trait at each occasion, together with qualitative interindividual differences in the course of development. A mover-stayer mixed-Rasch model can be specified as a special case that reflects the assumption that change over time occurs for some latent subpopulation but not for another. An empirical example illustrates that the mover-stayer mixed-Rasch model can provide a parsimonious and viable account of observed heterogeneity of change.
    Download (.pdf)
    Download (.pdf)
    This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of... more
    This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.

    And 35 more