- Copy testing
Marketing Key concepts Product marketing · Pricing
Distribution · Service · Retail
Ethics · Effectiveness · Research
Segmentation · Strategy · Activation
Management · Dominance
Promotional content Advertising · Branding · Underwriting
Direct marketing · Personal sales
Product placement · Publicity
Sales promotion · Sex in advertising
Loyalty marketing · SMS marketing
Premiums · Prizes
Promotional media Printing · Publication · Broadcasting
Out-of-home advertising · Internet
Point of sale · Merchandise
Digital marketing · In-game advertising
In-store demonstration · Word-of-mouth
Brand ambassador · Drip marketing · Visual merchandising
Copy testing is a specialized field of marketing research. It is the study of television commercials prior to airing them, and is defined as research to determine an ad’s effectiveness based on consumers’ responses to the ad. It covers all media including print, TV, radio, Internet etc. Although also known as copy testing, pre-testing is considered the more accurate, modern name (Young, p.4) for the prediction of how effectively an ad will perform, based on the analysis of feedback gathered from the target audience. Each test will either qualify the ad as strong enough to meet company action standards for airing or identify opportunities to improve the performance of the ad through editing. (Young, p.213)
Pre-testing is also used to identify weak spots within an ad campaign, to more effectively edit 60-second ads to 30-second ads or 30’s to 15’s, to select images from the spot to use in an integrated campaign’s print ad, to pull out the key moments for use in ad tracking, and to identify branding moments. 
Features of a Good Copy Testing system
In 1982, a consortium of 21 leading advertising agencies including N.W.Ayers, D’Arcy, Grey, McCann-Erikson, Needham Harper & Steers, Ogilvy & Mather, J.Walter Thompson, Young & Rubicam etc. released a public document where they laid out the PACT (Positioning Advertising Copy Testing) Principles on what constitutes a good copy testing system. According to PACT, a good copy testing system is one that meets the following criteria:
- Provides measurements which are relevant to the objectives of the advertising
- Requires agreements about how the results will be used in advance of each specific test.
- Provides multiple measurements – because single measurements are generally inadequate to assess the performance of an advertisement/
- Based on a model of human response to communications – the reception of a stimulus, the comprehension of the stimulus and the response to the stimulus.
- Allows for consideration of whether the advertising stimulus should be exposed more than once.
- Recognizes that the more finished a piece of copy is, the more soundly it can be evaluated and requires, as a minimum, that alternative executions be tested in the same degree of finish.
- Provides controls to avoid the biasing effects of the exposure context.
- Takes into account basic considerations of sample definition.
- Demonstrates reliability and validity.
Four Types of Copy Testing Scores
There are four general themes woven into the last century of copy testing. To understand how the different types of measures relate to one another, see the heuristic advertising model here Ameritest TV Ad Model or here Copymetrics Attention, Emotion and Memory Model.
Report Card Measures
The first theme is the quest for a valid, single-number statistic to capture the overall performance of the advertising creative. This search has spawned the creation of various report card measures. These measures are used to filter commercial executions and help management make the go/no go decision about which ads to air. (Young, p. 7). The predominant copy testing measure of the 1950s and 1960s, Day-After Recall (DAR) was interpreted to measure an ad’s ability to “break through” into the mind of the consumer and register a message from the brand in long-term memory. (Honomichl) Once this measure was adopted by Procter and Gamble, it became a research staple. (Honomichl)
In the 1970s and 1980s, after DAR was determined to be a poor predictor of sales, the research industry began to depend on the measure of persuasion as an accurate predictor of sales. This shift was led, in part, by researcher Horace Schwerin who pointed out, “the obvious truth is that a claim can be well remembered but completely unimportant to the prospective buyer of the product – the solution the marketer offers is addressed to the wrong need.” (Honomichl). As with DAR, it was Procter and Gamble’s acceptance of the persuasion measure (also known as motivation) that made it an industry standard. Recall scores were still provided in copy testing reports with the understanding that persuasion was the measure that mattered. (Honomichl)
The 1970s also saw a re-examination of the “breakthrough” measure. As a result, an important distinction was made between the attention-getting power of the creative execution and how well “branded” the ad was. Thus, the separate measures of attention and branding were born. (Young, p.12)
In the 70s, 80s, and 90s, tests were conducted to validate a link between the recall score and actual sales. For example, Procter and Gamble reviewed 10 year’s worth of split-cable tests (100 total) and found no significant relationship between recall scores and sales. (Young, pp. 3-30) In addition, Wharton University’s marketing guru Leonard Lodish conducted an even more extensive review of test market results and also failed to find a relationship between recall and sales. (Lodish pp. 125-139) Harold Ross of Mapes & Ross found that persuasion was a better predictor of sales than recall. (Ross pp.13-16)
The second theme is the development of diagnostic copy testing, the main purpose of which is optimization. Understanding why diagnostic measures such as attention, brand linkage, and motivation are high or low can help advertisers identify creative opportunities to improve executions. (Young, p.7)
Different approaches have been developed by research companies to determine the report card measures of attention, brand linkage, and motivation. For example, Unilever analyzed a database of commercials “triple-tested” using the three leading approaches to the measure of branding (Ameritest, ASI, and Millward Brown) which shows that each of the three is measuring something uncorrelated with, and therefore different from, the other two. (Kastenholtz, Kerr & Young). This condition has to be text via to the best of advertisement in section of division
The third theme is the development of non-verbal measures in response to the belief of many advertising professionals that much of a commercial’s effects – e.g. the emotional impact – may be difficult for respondents to put into words or scale on verbal rating statements. In fact, many believe the commercial’s effects may be operating below the level of consciousness. (Young, p.7) According to researcher Chuck Young, “There is something in the lovely sounds of our favorite music that we cannot verbalize – and it moves us in ways we cannot express.” (Young, p.22)
In the 1970s, researchers, such as Herbert Krugman sought to measure these non-verbal measures biologically by tracking brain wave activities as respondents watched commercials. (Krugman) Others experimented with galvanic skin response, voice pitch analysis, and eye-tracking. (Young, p.22) These efforts were not popularly adopted, in part, because of the limitations of the technology as well as the poor cost-effectiveness of what was widely perceived as academic, not actionable research.
In the 1990s, the Picture Sorts were created as a method of deconstructing a viewer’s dynamic response to the film on multiple levels. A Flow of Attention graph, as one example of a Picture Sort, measures how the eye pre-consciously filters the visual information in an ad and serves both as a gatekeeper for human consciousness and as an interactive search engine. More mainstream than the biological measures, Picture Sorts have been used extensively for on-line ad testing and, because they are not language-dependent, have been used around the world by major advertisers as diverse as IBM and Unilever. (Young, p.24) Example of Ameritest Flow of Attention Graph
More recently, research companies have started to use psychological tests, such as the Stroop effect, to measure the emotional impact of copy. These techniques exploit the notion that viewers do not know why they react to a product, image, or ad in a certain way (or that they reacted at all) because such reactions occur outside of awareness, through changes in networks of thoughts, ideas, and images.
The fourth theme, which is a variation on the previous two, is the development of moment-by-moment measures to describe the internal dynamic structure of the viewer’s experience of the commercial, as a diagnostic counterpoint to the various gestalt measures of commercial performance or predicted impact. (Young, p.7)
In the early 1980s the shift in analytical perspective from thinking of a commercial as the fundamental unit of measurement to be rated in its entirety, to thinking of it as a structured flow of experience, gave rise to experimentation with moment-by-moment systems. The most popular of these was the dial-a-meter response which required respondents to turn a meter, in degrees, toward one end of a scale or another to reflect their opinion of what was on screen at that moment. PDF
Unless the dial-a-meter is calibrated by normalizing the data to each individual’s reaction time, the aggregate sample data will be spread across many measurement intervals. Second, dial-a-meters contain an uncertainty range around which moment is actually being measured because of differences in respondent response times. Relatively little has been published to validate dial-a-meter diagnostics to traditional measures of overall ad performance such as recall and persuasion. PDF
In the 1990s, the Ameritest Picture Sorts shifted the frame of measurement from clock time (the dial-a-meter approach) to the “subjective time” of experience which is tied to the rate of information flow in the film, or the ad’s visual complexity. Instead of providing a rating whenever the alarm rings, respondents rate a Picture Sort image only when the mood, message, or image changes significantly. The data results are clear, easy to understand, and visually appealing. (Young, p. 23) Examples of an Ameritest Flow of Emotion Graph can be seen in The Advertising Research Handbook, (Young, p. 202) and here  in Exhibit 2.
In addition, the dial-a-meter’s single-scale limitations are overcome with a set of moment-by-moment measures in three dimensions: wiktionary: Flow of Attention Flow of Attention which measures the memorability of each moment, Flow of Emotion which measures the positive or negative emotional response to each moment, and Flow of Meaning which measures how well the brand’s strategic values are being communicated in each moment.
The Future: Seven Trends
Chuck Young, author of The Advertising Research Handbook, offers his views on the trends that will shape the way we do business in the future. (Young pp.27-30)
- There will be an emergence of global research standards for global brands. Increasingly, multi-nationals are focusing on the need to build global brands, and for their brands to speak with one voice around the world. This calls for global advertising campaigns that will be increasingly visual in style. Providing both a standard way to measure advertising performance from one region to another, and the tools to identify how different cultural factors affect advertising response, will become more important for managing ad spending in the global marketplace.
- There will be more advertising measurement, not less. Advertising is becoming more expensive and the range of executional options becoming so diverse that more control over the process is being demanded by major clients today. Procurement departments, in particular, under the banner of accountability, are challenging advertising agencies and research companies to provide more proof of value to justify ad budgets. This will drive growth in this important sector of advertising research.
- Most copy testing will move to the Internet. In an age of rapid-response marketing, the emphasis is on speed of decision-making. The Internet is the obvious choice for shortening the time involved in the research step of the creative development cycle. Many suppliers have already begun migrating their advertising research to the web (for both television and print testing). Even measuring attention can already be done online with AttentionTracking. Economic pressure will probably force the majority of testing online in the near future.
- The new value proposition will be filtering plus optimization. For the foreseeable future, the cost of advertising executions will continue to go up. To manage that cost, managers will be increasingly interested in airing only their strongest ideas so that they don’t spend a large portion of their advertising budgets on average ideas. Ad managers will be looking for every opportunity to make executions work harder and research systems will outperform this growing category if they can validate the power of their diagnostics, providing proof that they actually help make ads more effective.
- Ad research will move beyond semantics – putting a new emphasis on “holistic” or 360-degree measurement of integrated advertising campaigns. Both the forces of globalization and the evolution of rich, multi-sensory media environments will continue to challenge execution from the print execution to the Internet ad.
- Mathematics models of advertising ROI will begin to incorporate measures of creative quality.
Currently, researchers working with marketing-mix models to determine advertising ROI do not usually include measures of creative quality. As a result, current mix models are biased toward media weight or spend. In the future, sophisticated modelers will start to include a “quality” variable in these models, particularly as new forms of tracking research begin to provide relative performance rankings of competitive ads.
- aesthetic emotion
- brand linkage
- brand stretch
- branding moment
- copy sort
- day-after recall (DAR)
- Flow of Attention
- Flow of Emotion
- Flow of Meaning
- Picture Sorts
- program engagement
- selling-edge analysis
- semantic information
- stopping power
- http://www.ameritest.net/products/tv.php Ameritest TV Ad Model
Example of Ameritest Flow of AttentionGraph http://www.ameritest.net/products/tv.php
- http://www.copymetrics.com Copymetrics Copy test New approach to test effectiveness of ads using cognitive sciences, evaluating effect on Attention, Emotion and Memory.
PDF (196 KiB).
Understanding Copy Pretesting (1994) Published by Advertising Research Foundation,NY.
Honomichl, J. J., Honomichl on Marketing Research, Lincolnwood, IL: NTC Business Books, 1986.
Kastenholz, J., Kerr, G., & Young, C., Focus and Fit: Advertising and Branding Join Forces to Create a Star. Marketing Research, Spring 2004, 16-21.
Krugman, H., Memory Without Recall, Exposure Without Perception. Journal of Advertising Research, July/August, 1977.
Lodish, L. M., Abraham, M., Kalmenson, Slk, Livelsberger, J., Lubetkin, B., Richardson, B., & Stevens, M. E., How TV Advertising Works: A Meta-Analysis of 389 Real World Split Cable TV Advertising Experiments. Journal of Marketing Research, May 2995, 125-139.
Ross, H., Recall vs. Persuasion: Ans Answer. Journal of Marketing Research, 1982, 22(1), 13-16.
Young, Charles E., The Advertising Research Handbook, Ideas in Flight, Seattle, WA, April 2005.
Wikimedia Foundation. 2010.