 Psychometric software

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.^{[citation needed]}
Contents
Sources
Because only a few commercial businesses (most notably Assessment Systems Corporation and Scientific Software International) develop specialized psychometric tools, there exist many free tools developed by researchers and educators. Important websites for free psychometric software include:
 CASMA at the University of Iowa, USA
 REMP at the University of Massachusetts, USA
 Software from Brad Hanson
 Software from John Uebersax
 Software from J. Patrick Meyer
 Software directory at the Institute for Objective Measurement
Classical test theory
Classical test theory is an approach to psychometric analysis that has weaker assumptions than item response theory and is more applicable to smaller sample sizes.
CITAS
CITAS (Classical Item and Test Analysis Spreadsheet) is a free Excel workbook designed to provide scoring and statistical analysis of classroom tests. Item responses (ABCD) and keys are typed or pasted into the workbook, and the output automatically populates; unlike other programs, CITAS does not require any "running" or experience in psychometric analysis, making it accessible to school teachers and professors. It is available for free download here.
jMetrik
jMetrik [3] is free and open source software for conducting a comprehensive psychometric analysis. It was developed by J. Patrick Meyer at the University of Virginia. Current methods include classical item analysis, differential item functioning (DIF) analysis, confirmatory factor analysis, item response theory, IRT equating, and nonparametric item response theory. The item analysis includes proportion, point biserial, and biserial statistics for all response options. Reliability coefficients include Cronbach's alpha, Guttman's lambda, the FeldtGilmer Coefficient, the FeldtBrennan coefficient, decision consistency indices, the conditional standard error of measurement, and reliability if item deleted. The DIF analysis is based on nonparametric item characteristic curves and the MantelHaenszel procedure. DIF effect sizes and ETS DIF classifications are included in the output. Confirmatory factor analysis is limited to the common factor model for congeneric, tauequivalent, and parallel measures. Fit statistics are reported along with factor loadings and error variances. IRT methods include the Rasch, partial credit, and rating scale models. IRT equating methods include mean/mean, mean/sigma, Haebara, and StockingLord procedures.
jMetrik also include basic descriptive statistics and a graphics facility that produces bar charts, pie chart, histograms, kernel density estimates, and line plots.
jMetrik is a pure Java application that runs on 32bit and 64bit versions of Windows, Mac, and Linux operating systems. jMetrik requires Java 1.6 on the host computer. jMetrik is available as a free download from www.ItemAnalysis.com.
Iteman
Iteman is a commercial program specifically designed for classical test analysis, producing rich text (RTF) reports with graphics, narratives, and embedded tables. It calculates the proportion and point biserial of each item, as well as high/low subgroup proportions, and detailed graphics of item performance. It also calculates typical descriptive statistics, including the mean, standard deviation, reliability, and standard error of measurement, for each domain and the overall tests. It is only available from Assessment Systems Corporation [4].
Lertap
Lertap (Laboratory of Educational Research Test Analysis Program) is a comprehensive software package for classical test analysis developed for use with Microsoft Excel. It includes test, item, and option statistics, classification consistency and mastery test analysis, procedures for cheating detection, and extensive graphics (e.g., trace lines for item options, conditional standard errors of measurement, scree plots, boxplots of group differences, histograms, scatterplots).
DIF, differential item functioning, is supported in the Excel 2007 and Excel 2010 versions of Lertap. MantelHaenszel methods are used; graphs of results are provided.
Lertap will produce ASCII data files ready for input to Xcalibre and Bilog MG.
Several sample datasets for use with Lertap and/or other item and test analysis programs are available [5]; these involve both cognitive tests, and affective (or rating) scales. Technical papers related to the application of Lertap are also available [6].
Lertap was developed by Larry Nelson at Curtin University; commercial versions are available from Assessment Systems Corporation [7].
TAP
TAP (the Test Analysis Program) is a free program for basic classical analysis developed by Gordon Brooks at the University of Ohio. It is available here.
ViStaCITA
ViStaCITA (Classical Item and Test Analysis) is a module included in the Visual Statistics System (ViSta) that focuses on graphicaloriented methods applied to psychometric analysis. It is freely available at [8]. It was developed by Ruben Ledesma, J. Gabriel Molina, Pedro M. ValeroMora, and Forrest W. Young.
Item response theory calibration
Item response theory (IRT) is a psychometric approach which assumes that the probability of a certain response is a direct function of an underlying trait or traits. Various functions have been proposed to model this relationship, and the different calibration packages reflect this. Several software packages have been developed for additional analysis such as equating; they are listed in the next section.
BILOGMG
BILOGMG is a software program for IRT analysis of dichotomous (correct/incorrect) data, including fit and differential item functioning. It is commercial, and only available from Scientific Software International [9] or Assessment Systems Corporation [10].
ICL
ICL (IRT Command Language) performs IRT calibrations, including the 1, 2, and 3 parameter logistic models as well as the partial credit model and generalized partial credit model. It can also generate response data. As the name implies, it is completely command code driven, with no graphical user interface. It is available for free download here.
jMetrik
jMetrik [11] is free and open source software for conducting a comprehensive psychometric analysis. It was developed by J. Patrick Meyer at the University of Virginia. Current methods include classical item analysis, differential item functioning (DIF) analysis, confirmatory factor analysis, item response theory, IRT equating, and nonparametric item response theory. The item analysis includes proportion, point biserial, and biserial statistics for all response options. Reliability coefficients include Cronbach's alpha, Guttman's lambda, the FeldtGilmer Coefficient, the FeldtBrennan coefficient, decision consistency indices, the conditional standard error of measurement, and reliability if item deleted. The DIF analysis is based on nonparametric item characteristic curves and the MantelHaenszel procedure. DIF effect sizes and ETS DIF classifications are included in the output. Confirmatory factor analysis is limited to the common factor model for congeneric, tauequivalent, and parallel measures. Fit statistics are reported along with factor loadings and error variances. IRT methods include the Rasch, partial credit, and rating scale models. IRT equating methods include mean/mean, mean/sigma, Haebara, and StockingLord procedures.
jMetrik also include basic descriptive statistics and a graphics facility that produces bar charts, pie chart, histograms, kernel density estimates, and line plots.
jMetrik is a pure Java application that runs on 32bit and 64bit versions of Windows, Mac, and Linux operating systems. jMetrik requires Java 1.6 on the host computer. jMetrik is available as a free download from www.ItemAnalysis.com.
MULTILOG
MULTILOG is an extension of BILOG to data with polytomous (multiple) responses. It is commercial, and only available from Scientific Software International [12] or Assessment Systems Corporation [13].
PARSCALE
PARSCALE is a program designed specifically for polytomous IRT analysis. It is commercial, and only available from Scientific Software International [14] or Assessment Systems Corporation [15].
PARAM3PL
PARAM3PL [16] is a free program for the calibration of the 3parameter logistic IRT model. It was developed by Lawrence Rudner at the Education Resources Information Center (ERIC). The latest release was version 0.89 in June 2007. It is available from ERIC here.
TESTFact
Testfact features [17]  Marginal maximum likelihood (MML) exploratory factor analysis and classical item analysis of binary data  Computes tetrachoric correlations, principal factor solution, classical item descriptive statistics, fractile tables and plots  Handles up to 10 factors using numerical quadrature: up to 5 for nonadaptive and up to 10 for adaptive quadrature  Handles up to 15 factors using Monte Carlo integration techniques  Varimax (orthogonal) and PROMAX (oblique) rotation of factor loadings  Handles an important form of confirmatory factor analysis known as "bifactor" analysis: Factor pattern consists of one main factor plus group factors  Simulation of responses to items based on user specified parameters  Correction for guessing and notreached items  Allows imposition of constraints on item parameter estimates  Handles omitted and notpresented items  Detailed online HELP documentation includes syntax and annotated examples.
WINMIRA 2001
WINMIRA 2001 is a program for analyses with the Rasch model for dichotomous and polytomous ordinal responses, with the latent class analysis, and with the Mixture Distribution Rasch model for dichotomous ^{[1]} and polytomous item responses ^{[2]}. The software provides conditional maximum likelihood (CML) estimation of item parameters, as well as MLE and WLE estimates of person parameters, and person and itemfit statistics as well as information criteria (AIC, BIC, CAIC) for model selection. The software also performs a parametric bootstrap procedure for the selection of the number of mixture components. A free student version is available from Matthias von Davier's webpage at http://www.vondavier.com/[18], a commercial version is available through ASSESS.COM at [19].
Winsteps
Winsteps is a program designed for analysis with the Rasch model, a oneparameter item response theory model which differs from the 1PL model in that each individual in the person sample is parameterized for item estimation and it is prescriptive and criterionreferenced, rather than descriptive and normreferenced in nature.^{[3]} It is commercially available from Winsteps, Inc. [20]. A previous DOSbased version, BIGSTEPS, is also available.
Xcalibre
XCalibre is a commercial program that performs marginal maximum likelihood estimation of both dichotomous (1PLRasch, 2PL, 3PL) and polytomous IRT models, utilizing text files for both input and output. The interface is pointandclick; no command code required. It is only available from Assessment Systems Corporation [21].
Additional item response theory software
Because of the complexity of IRT, there exist few software packages capable of calibration. However, many software programs exist for specific ancillary IRT analyses such as equating and scaling. Examples of such software follow.
eqboot
eqboot is an open source syntaxbased Java application for conducting IRT equating and computing the bootstrap standard error of equating developed by J. Patrick Meyer. The program runs on any 32 or 64bit operating system that has the Java Runtime Environment (JRE) version 1.6 or higher installed. At the moment, the programs only support equating with binary items. EQBOOT will compute equating constants using the mean/mean, mean/sigma, Haebara,^{[4]} and StockingLord^{[5]} procedures. It will also compute the standard error of equating if the user provides a comma delimited file of bootstrapped item parameter estimates from both forms, a comma delimited file of bootstrapped ability estimates for Form X examinees, and a comma delimited file of bootstrapped ability estimates for Form Y examinees. Options allow the user to specify the criterion function for the Haebara and StockingLord methods.^{[6]} In addition, the examinee distribution over which the criterion function is minimized may be set to the observed theta estimates, a histogram of theta estimates, a kernel density estimate of theta estimates, or uniformly spaced values on the theta scale. The software is a free download from www.ItemAnalysis.com.
IRTEQ
IRTEQ [22] is a freeware Windows GUI application that implements IRT scaling and equating developed by Kyung (Chris) T. Han. It implements IRT scaling/equating methods that are widely used with the “NonEquivalent Groups Anchor Test” design: Mean/Mean,^{[7]} Mean/Sigma,^{[8]} Robust Mean/Sigma,^{[9]} and TCC methods.^{[10]}^{[11]} For TCC methods, IRTEQ provides the user with the option to choose various score distributions for incorporation into the loss function. IRTEQ supports various popular unidimensional IRT models: Logistic models for dichotomous responses (with 1, 2, or 3 parameters) and the Generalized Partial Credit Model (GPCM) (including Partial Credit Model (PCM), which is a special case of GPCM) and Graded Response Model (GRM) for polytomous responses. IRTEQ can also equate test scores on the scale of a test to the scale of another test using IRT true score equating.^{[12]}
ResidPlots2
ResidPlots2 [23] is a free program for IRT graphical residual analysis. It was developed by Tie Liang, Kyung (Chris) T. Han, and Ronald K. Hambleton at the University of Massachusetts.
WinGen
WinGen [24] is a free Windowsbased program that generates IRT parameters and item responses. Kyung (Chris) T. Han at the University of Massachusetts.^{[13]}
ST
ST [25] conducts item response theory (IRT) scale transformations for dichotomously scored tests.
POLYST
POLYST [26] conducts IRT scale transformations for dichotomously and polytomously scored tests.
STUIRT
STUIRT [27] conducts IRT scale transformations for mixedformat tests (tests that include some multiple choice items and some polytomous items).
Decision Consistency
Decision consistency methods are applicable to criterionreferenced tests such as licensure exams and academic mastery testing.
jMetrik
jMetrik [28] is free and open source software for conducting a comprehensive psychometric analysis. Detailed information is listed above. jMetrik includes Huynh's decision consistency estimates if cutscores are provided in the item analysis.
General statistical analysis software
Software designed for general statistical analysis can often be used for certain types of psychometric analysis. Moreover, code for more advanced types of psychometric analysis is often available.
R
R is a programming environment designed for statistical computing and production of graphics. It is freely available at [29].
SPSS
SPSS, originally called the Statistical Package for the Social Sciences, is a commercial general statistical analysis program where the data is presented in a spreadsheet layout and common analyses are menu driven.
SPlus
SPlus is a commercial analysis package based on the programming language S.
SAS
SAS is a commercially available package for statistical analysis and manipulation of data. It is also commandbased.
References
 ^ Rost, J. (1990). Rasch models in latent classes: An integration of two approaches to item analysis. Applied Psychological Measurement, 14, 271282.
 ^ von Davier, M., & Rost, J. (1995). Polytomous mixed Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.), Rasch models, foundations, recent developments, and applications (pp. 371382). New York: Springer.
 ^ Rasch dichotomous model vs. Oneparameter Logistic Model [1]. Rasch Measurement Transactions [2], 2005, 19:3 p. 1032
 ^ Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144‐149.
 ^ Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201210.
 ^ Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods.Journal of Educational and Behavioral Statistics, 32, 371397.
 ^ Loyd & Hoover, 1980
 ^ Marco, 1977
 ^ Linn, Levine, Hastings, & Wardrop, 1981
 ^ Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Research, 22, 144‐149.
 ^ Stocking, M.L., & Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201210.
 ^ Lord, F.M. (1980). Applications of item response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
 ^ Han, K. T. (2007). WinGen: Windows software that generates IRT parameters and item responses. Applied Psychological Measurement, 31, 457459.
Categories: Lists of software
 Psychometrics
 Educational software
 Data analysis software
Wikimedia Foundation. 2010.