# Rating scale

Rating scale

A rating scale is a set of categories designed to elicit information about a quantitative attribute in social science. Common examples are the Likert scale and 1-10 rating scales for which a person selects the number which is considered to reflect the perceived quality of a product.

"Concerning rating scales as systems of educational marks see articles about education in different countries (named "Education in ...", for example, Education in Ukraine)."

Background

A rating scale is an instrument that requires the rater to assign the rated object that have numerals assigned to them.

In psychometrics, rating scales are often referenced to a statement which expresses an attitude or perception toward something. The most common example of such a rating scale is the Likert scale, in which a person is asked to select a category label from a list indicating the extent of disagreement or agreement with a statement.

The basic feature of any rating scale is that it consists of a number of categories. These are usually assigned integers. For example, an example of the use of a Likert scale is as follows.

:Statement: I could not live without my computer.

:Response options:

:*1. Strongly Disagree:*2. Disagree:*3. Agree:*4. Strongly Agree

It is common to treat the numbers obtained from a rating scale directly as measurements by calculating averages, or more generally any arithmetic operations. Doing so is not, however, justified. In terms of the levels of measurement proposed by S.S. Stevens, the data are ordinal categorisations. This means, for example, that to strongly agree with the above statement implies a more favourable perception of computers than does to agree with the statement. However, the numbers are not interval-level measurements in Stevens' schema, which means that equal differences do not represent equal intervals between the degree to which one values computers. For example, the difference between strong agreement and agreement is not necessarily the same as the difference between disagreement and agreement. Strictly, even demonstrating that categories are ordinal requires empirical evidence based on patterns of responses (Andrich, 1978).

More than one rating scale is required to measure an attitude or perception due to the requirement for statistical comparisons between the categories in the polytomous Rasch model for ordered categories (Andrich, 1978). In terms of Classical test theory, more than one question is required to obtain an index of internal reliability such as Cronbach's alpha (Cronbach, 1951), which is a basic criterion for assessing the effectiveness of a rating scale and, more generally, a psychometric instrument.

Rating scales used online

Rating scales are used widely online in an attempt to provide indications of consumer opinions of products. Examples of sites which employ ratings scales are IMDb, Epinions.com, Internet Book List, Yahoo! Movies, Amazon.com, BoardGameGeek, TV.com and Ratings.net. The Criticker website uses a rating scale from 0 to 100 in order to obtain "personalised film recommendations".

In almost all cases, online rating scales only allow one rating per user per product, though there are exceptions such as "Ratings.net", which allows users to rate products in relation to several qualities. Most online rating facilities also provide few or no qualitative descriptions of the rating categories, although again there are exceptions such as "Yahoo! Movies" which labels each of the categories between F and A+ and BoardGameGeek, which provides explicit descriptions of each category from 1 to 10. Often, only the top and bottom category is described, such as on "IMDb"'s online rating facility.

With each user rating a product only once, for example in a category from 1 to 10, there is no means for evaluating internal reliability using an index such as Cronbach's alpha. It is therefore impossible to evaluate the validity of the ratings as measures of viewer perceptions. Establishing validity would require establishing both reliability and accuracy (i.e. that the ratings represent what they are supposed to represent).

Another fundamental issue is that online ratings usually involve convenience sampling much like television polls, i.e., they represent only the conglomeration of those inclined to submit ratings.

Sampling is one factor which can lead to results which have a specific bias or are only relevant to a specific subgroup. To illustrate the importance of such factors, consider an example. Suppose that a film's marketing strategy and reputation is such that 90% of its audience are attracted to the particular kind of film; i.e. it does not appeal to a broad audience. Suppose also that the film is very popular among the audience that does see the film and, in addition, that those who feel most strongly about the film are inclined to rate the film online. This combination may lead to very high ratings of the film which do not generalize beyond the people who actually see the film (or possibly even beyond those who actually rate it).

Qualitative description of categories is an important feature of a rating scale. For example, if only the points 1-10 are given without description, some people may select 10 rarely whereas other may select the category often. If, instead, "10" is described as "near flawless", the category is more likely to mean the same thing to different people. This applies to all categories, not just the extreme points. Even with category descriptions, some may be harsher raters than others. Rater harshness is also a consideration in marking essays in educational contexts. [http://66.102.7.104/search?q=cache:o1l_qRDI9QwJ:www.cambridgeesol.org/rs_notes/rs_nts13.pdf+rater+harshness+references&hl=en&ct=clnk&cd=4] .

These issues are also compounded when aggregated statistics such as averages are used for lists and rankings of products. User ratings are at best ordinal categorizations. While it is not uncommon to calculate averages or means for such data, doing so cannot be justified because in calculating averages, equal intervals are required to represent the same difference between levels of perceived quality. The key problems with aggregate data based on the kinds of rating scales commonly used online are as follow:
*Averages should not be calculated for data of the kind collected.
*It is usually impossible to evaluate the reliability or validity of user ratings.
*Products are not compared with respect to explicit, let alone common, criteria.
*Only users inclined to submit a rating for a product do so.
*Data are not usually published in a form that permits evaluation of the product ratings.

More developed methodologies include Choice Modelling or Maximum Difference methods, the latter being related to the Rasch model due to the connection between Thurstone's law of comparative judgement and the Rasch model.

References

* Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. "Psychometrika, 16", 297-333.
* Andrich, D. (1978). A rating formulation for ordered response categories. "Psychometrika", 43, 357-74.

ee also

*Semantic differential
*Voting system
*MaxDiff

* [http://www.rasch-analysis.com/ How to apply Rasch analysis]

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Hamilton Rating Scale for Depression — The Hamilton Rating Scale for Depression (HRSD), also known as the Hamilton Depression Rating Scale (HDRS) or HAM D, is a 21 question multiple choice questionnaire that clinicians may use to rate the severity of a patient s major depression.… …   Wikipedia

• Cooper-Harper rating scale — The Cooper Harper rating scale is a set of criteria used by test pilots and flight test engineers to evaluate the handling qualities of aircraft during flight test. The scale ranges from 1 to 10, with 1 indicating the best handling… …   Wikipedia

• Montgomery-Åsberg Depression Rating Scale — The Montgomery Åsberg Depression Rating Scale (abbreviated MADRS) is a ten item diagnostic questionnaire which psychiatrists use to measure the severity of depressive episodes in patients with mood disorders. It was designed in 1979 by British… …   Wikipedia

• Childhood Autism Rating Scale — (CARS) is a behavior rating scale intended to help diagnose autism. CARS was developed by Eric Schopler, Robert J. Reichier, and Barbara Rochen Renner. CARS was designed to help differentiate children with autism from those with other… …   Wikipedia

• Unified Parkinson's Disease Rating Scale — The Unified Parkinson s Disease Rating Scale (Unified Parkinson s Disease Rating Scale) is a rating scale used to follow the longitudinal course of Parkinson s disease. It is made up of the following sections: #Mentation, behavior, and mood;… …   Wikipedia

• Comprehensive Psychopathological Rating Scale — The Comprehensive Psychopathological Rating Scale (CPRS) is a scale for rating the severity of psychiatric symptoms and observed behaviour. CPRS was developed by Swedish psychiatrists Marie Åsberg, Carlo Perris, Daisy Schalling, and Göran Sedvall …   Wikipedia

• Columbia Suicide Severity Rating Scale — The Columbia Suicide Severity Rating Scale, or C SSRS, is a suicidal ideation rating scale created by researchers at Columbia University.[1] It rates an individual s degree of suicidal ideation on a scale, ranging from wish to be dead to active… …   Wikipedia

• Gait Abnormality Rating Scale — (GARS)cite journal |author=Wolfson L, Whipple R, Amerman P, Tobin JN |title=Gait assessment in the elderly: a gait abnormality rating scale and its relation to falls |journal=J Gerontol |volume=45 |issue=1 |pages=M12–9 |year=1990 |month=January… …   Wikipedia

• Young Mania Rating Scale — The Young Mania Rating Scale (abbreviated YMRS) is an eleven item, multiple choice diagnostic questionnaire which psychiatrists use to measure the severity of manic episodes in children and adolescents between the ages of 5 and 17. It was first… …   Wikipedia

• Childhood autism rating scale — Pour les articles homonymes, voir Cars. La Childhood Autism Rating Scale (CARS) est une échelle d évaluation de l autisme infantile qui a été élaborée et adaptée pour l âge pré scolaire. Elle a été développée par les chercheurs américains Eric… …   Wikipédia en Français