Logo IMG
HOME > PAST ISSUE > Article Detail


Rising Scores on Intelligence Tests

Test scores are certainly going up all over the world, but whether intelligence itself has risen remains controversial

Ulric Neisser

The ABCs of Intelligence

Because there are many different forms of mental ability, there are also many different kinds of tests and test items. Some are verbal and others are visual in format. Some tests consist only of abstract-reasoning problems, and others focus on such special competencies as arithmetic, spatial imagery, reading, vocabulary, memory or general knowledge. The broad-spectrum tests, which establish actual IQ scores, typically include a wide variety of items. Before considering these general instruments, however, we must take a brief look at the relations among different specialized tests and how those relations are traditionally interpreted.

The degree to which any two tests measure something in common can be indexed by their correlation r, which in principle ranges from -1 to +1. A positive r means that individuals who score high on one test also tend to score high on the other; a negative r, which rarely occurs in this context, means that high scores on one test go with low scores on the other. When the same group of individuals takes a number of different tests, one can compute an r for each pair of tests considered separately, and the result is a correlation matrix. For intelligence tests, the correlation matrix tends to consist of r's that are all positive, but well below 1.00.

Early in this century, the British psychologist Charles Spearman made the first formal factor analyses of such correlation matrices. He concluded that a single common factor accounted for the positive correlations among tests—a notion still accepted in principle by many psychometricians. Spearman christened it g for "general factor." In any test battery, the test that best measures g is—by definition—the one that has the highest correlations with all the others. The fact that most of these g-loaded tests typically involve some form of abstract reasoning led Spearman and his successors to regard g as the real and perhaps genetically determined essence of intelligence.

Although that view remains widely held, it is not a necessary conclusion. Other factor analyses of such data are possible and have been proposed. Today, some psychometricians regard g as little more than a statistical artifact, whereas others seem even more convinced than Spearman himself that it reflects a basic property of the brain. Whatever g may be, at least we know how to measure it. The accepted best measure is a (usually untimed) test of visual reasoning called Raven's Progressive Matrices, which was first published in 1938 by Spearman's student John C. Raven and is now available in several different levels of difficulty. As we shall see, Raven's test plays a central role in recent analyses of the worldwide rise in test scores.

In contrast to specialized instruments like the Raven, the tests most widely used in America include a wide variety of different items and subtests. The best known of these "IQ tests" are the Stanford-Binet and the various Wechsler scales. The Wechsler Intelligence Scale for Children (WISC), for example, has five "verbal" subtests (information, comprehension, arithmetic, vocabulary and explaining similarities) and five "performance" subtests in which a child must copy designs using patterned blocks, put several related pictures in their proper order and so on. A child's scores on these subtests are added up, and the tester converts the total to an IQ by noting where it falls in the established distribution of WISC scores for the appropriate age.

That distribution itself—the crucial reference for assigning IQ scores—is simply the empirical result that was obtained when the test was initially standardized. By convention, the mean of each age group in the standardization sample defines an IQ score of 100; by further convention, the standard deviation of the sample defines 15 IQ points. Given appropriate sampling and a normal distribution, this implies that about two-thirds of the population in any given age group will have IQs between 85 and 115.

IQ defined in this way reflects relative standing in an age group, not absolute achievement. The mean-scoring eight-year-old attains a higher raw score on the WISC than the mean-scoring seven-year-old, but both have IQs of 100 because they are at the middle of their distributions. So in one sense (as measured by raw scores), a normal child becomes systematically more intelligent with age; in another sense, his or her intelligence remains relatively stable. Although raw scores rise systematically throughout the school years, IQs themselves rarely change much after age 5 or 6.

comments powered by Disqus


Subscribe to American Scientist