- Students' Performance on National Assessments
- Students' Performance on International Assessments of Mathematics and Science

One of the central goals of educators and legislators is increasing overall student achievement, with a special focus on increasing learning by low performers. Concern also centers
on advancing U.S. performance in relation to that of other countries, especially in mathematics, science, and technical fields. The most commonly used tools for measuring changes
in achievement are standardized assessments. (The terms *achievement* and *performance* are used interchangeably in this section when discussing scores on these tests.)

This section is divided into two parts. The first examines trends in mathematics and science achievement among public and private school students in the United States, using two kinds of national data. Longitudinal data follow the same group of students over several years, allowing observers to track how individual students learn over time. In some cases, longitudinal test data may also be linked to teaching practices and other factors thought to influence achievement. New test data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLS-K), collected in 2007, allow study of performance changes among a kindergarten cohort through eighth grade and of changes over time in initial achievement gaps among groups of students.

Cross-sectional data, in contrast to longitudinal data, provide information on particular groups' performance measured at different points in time. The National Assessment of Educational Progress (NAEP) data presented in the first section, for example, examine performance of fourth and eighth graders who were sampled in various years between 1990 and 2007. These data indicate whether and how achievement is changing over time for comparable groups of students.

The second part of this section compares student achievement in the United States with that in other countries. The latest Trends in International Mathematics and Science Study (TIMSS:2007) allows comparisons of U.S. fourth and eighth graders with their counterparts in other countries. The Program for International Student Assessment (PISA:2006) provides test score data for 15-year-olds in the same subjects. These international assessments are both cross-sectional studies.

**Mathematics and Science Performance
as Students Progress Through Elementary
and Middle Grades**

ECLS-K has followed a group of students who first entered
kindergarten in fall 1998 over 9 school years. (The
mathematics and science education of students who are
homeschooled is not addressed in this chapter; see sidebar
"Homeschooling in the United States.") The study concluded
in spring 2007, when most students were in eighth grade.^{[2]}
The sample used in this analysis included roughly 8,000 students.
ECLS-K is unusual among major national and international
data collections not only in its focus on the earlier
years of schooling but also because it allows researchers to
examine students' performance in light of variables likely
to influence learning. Cognitive tests measured students'
mathematics knowledge in kindergarten and grades 1, 3, 5,
and 8 and tracked their science understanding in grades 3,
5, and 8. The study also collected demographic and family
information from a parent and surveyed teachers and schools
for information about school environments, teacher qualifications,
and classroom practices.

**Gains in Mathematics Test Scores and Gap Changes.** Students begin kindergarten with differing levels of mathematics
skills, and researchers have suggested several factors that may be related to these initial gaps. A body of research
has focused in particular on initial gaps between white and
black children. The early home environment, including how well parents prepare children for school (e.g., time spent
reading to them) plays a role (Magnuson, Rosenbaum, and
Waldfogel 2008; Jencks and Phillips 1998). Other reasons
posited include income and education differences among
parents (Magnuson, Rosenbaum, and Waldfogel 2008;
Campbell et al. 2008), school segregation (Vigdor and Ludwig
2008), access to effective and well-trained teachers
(Corcoran and Evans 2008), ability to listen and concentrate,
and children's fine motor skills, which need to reach a certain
level of development for young children to learn to write
and draw (Grissmer and Eiseman 2008).

Students' mathematics achievement was measured on a
single scale ranging from 0 to 174 throughout the study, allowing
the tracking of achievement growth and comparisons
between groups as children progressed through elementary
and middle grades. The 1998–99 kindergarten cohort started
school with an average mathematics score of 26 and gained
113 points by the spring of eighth grade, to 139 (table

For most characteristics, gaps widened during the early
years of school (when the overall score changes were greater)
and then stabilized or even narrowed slightly starting at
grade 3 or 5, when the rate of overall growth also declined.
Students' relative achievement when starting school had an
influence on growth and eventual grade 8 scores, shown by
the trajectories of those scoring in the lowest, middle two,
and highest quartiles in kindergarten (figure

In another example, white children scored 29 on the test given in the fall of their kindergarten year and Asians scored 30, compared with 22 for both black and Hispanic children. The gaps between white and black students and Hispanic and Asian students reached a certain point and then stabilized after grade 3.

Gaps based on a few characteristics narrowed a little in
later grades: English proficiency in kindergarten, primary
language spoken at home, and the white–Hispanic gap. See appendix table

**Proficiency in Different Skill Areas.** The ECLS-K test
data also indicate whether students were proficient in nine
mathematics skill areas. (The skills are arranged in a hierarchy
such that proficiency in a given area presumes proficiency
in the areas below it. See sidebar "Mathematics Skills
Areas Assessed" for definitions.) By eighth grade, nearly all
students were proficient in ordinality and sequence, addition
and subtraction, and multiplication and division (appendix table

Substantial differences among groups appeared in the three
highest skill areas—rate and measurement, fractions, and area
and volume—and differences grew as the difficulty level increased.
For example, 63% of students whose mothers had
a bachelor's degree were proficient in fractions, compared
with 16% of students whose mothers had not completed high
school (figure

Differences by initial math skills in kindergarten were
also considerable for the *highest* skill area in which students
had reached proficiency by eighth grade (table

Some early low achievers in kindergarten did reach proficiency in high skill areas, however: 24% achieved proficiency with rate and measurement, 7% with fractions, and 2% with area and volume, the highest skill area assessed. Thus, although most initial low-scoring students progressed relatively slowly, some managed to overcome obstacles they had at school entry.

High mathematics scores in kindergarten were also
strong predictors of proficiency with higher-level mathematical
concepts in eighth grade. By grade 8, 37% of those who scored in the highest quartile in kindergarten had achieved
proficiency in all of the skill areas shown in table

**Gains in Science Test Scores and Gap Changes.** ECLS-K
science assessments were given in grades 3, 5, and 8 and,
as with mathematics, were measured on a single scale, in
this case from 0 to 111. The average science score in grade
3 was 51 points, increasing to 83 by grade 8. In general,
growth patterns were similar to those found with mathematics
over these higher grades: few changes in gap size, and
those changes that did occur were minimal (appendix table

**Trends in Mathematics and Science Performance in Grades 4 and 8 Through 2007**

NAEP includes two assessment programs. The *national *(or* main*) NAEP assesses national samples of 4th and 8th
grade students at regular intervals and 12th grade students
occasionally. These assessments are updated periodically
to reflect contemporary standards of what students should
know and be able to do in various subjects, including science
and mathematics. Student achievement measured by NAEP
is documented in an ongoing series of reports, The Nation's Report Card, that first began in 1969. A second testing program,
the NAEP *Long-Term Trend* (LTT), is based on nationally
representative samples of 9-, 13-, and 17-year-olds.
The mathematics content framework for NAEP LTT has remained
the same since it was first given in 1973, permitting
analyses of trends over more than three decades.

This section briefly summarizes NAEP science trends—reported in detail in *Science and Engineering Indicators
2008* (NSB 2008) and then focuses on the new mathematics
score data for fourth and eighth graders in 2007 and on
trends in these scores from 1990 to 2007. New data are neither
available for 12th grade mathematics nor for science in
any grade.^{[3]} The NAEP LTT scores in mathematics are also
updated through 2008, for three age groups.

NAEP rates students' performance in two ways: average scale scores and the percentage reaching various achievement levels. Scale scores place students along a continuous scale based on their overall performance on the assessment. A single mathematics scale of 0 to 500 points covers both grades 4 and 8. See sidebar "Development and Content of NAEP Mathematics Assessments" for further information on the assessments' content and design. The NAEP website has a searchable database of released NAEP test items (http://nces.ed.gov/nationsreportcard/itmrls).

**Science Performance.** No new NAEP science data are
available for any grade; a science assessment was conducted
in early 2009 and data will be available in early 2010,
too late for inclusion in this volume. As reported in *Science
and Engineering Indicators 2008* (NSB 2008), average
NAEP science scores increased for 4th graders, held steady
for 8th graders, and declined for 12th graders between 1996 and 2005 (NCES 2006a). Rising scores among lower-performing
and average fourth graders were the primary
drivers of the increase. The proportion of students reaching
the *proficient* level for their grade in science held steady at
grades 4 and 8, and declined a bit at grade 12. Proficiency
rates were lower among 12th graders than among students
in the lower grades.

**Mathematics Performance of Fourth and Eighth Graders.** The upward achievement trends that occurred
through 2005 on the NAEP fourth and eighth grade mathematics
tests continued with the 2007 tests. Between 1990
and 2007, the average mathematics score for fourth graders
rose from 213 to 240, and for eighth graders from 263 to 281
(appendix table

At both grade levels, students' scores increased in each of
the five content areas tested (number sense, properties, and
operations; measurement; geometry and spatial sense; data
analysis, statistics, and probability; and algebra and functions)
(Lee, Grigg, and Dion 2007). Performance also improved
across the achievement distribution in both grades,
with scores at five selected percentiles of the score distribution
(10th, 25th, 50th, 75th, and 90th) all increasing consistently
over these years (figure

Achievement trends for nearly all demographic groups
reflected the same upward movement (table

The scores of fourth graders in each racial/ethnic group
with 1990–2007 data available rose consistently over those
17 years. Black fourth graders had the largest score increase,
at 34 points (figure

NAEP 2009 results, released as this volume was going to press, show that the upward trend in fourth grade mathematics scores has halted, that mathematics scores of eighth graders have continued to improve, and that score gaps among racial/ethnic groups are unchanged (NCES 2009a).

**Gaps in Mathematics Performance.** In most years,
boys had marginally higher mathematics scores than girls,
and these gaps remained about equal over the 17-year period
(appendix table

Most gaps among racial/ethnic groups that existed in 1990 remained in 2007, but some have narrowed, especially in recent years. The average score gap between white and black fourth graders decreased from 32 to 26 scale points between 1990 and 2007. Among eighth graders, the gap increased from 1990 to 2000 but then decreased from 2000 to 2007. Similarly, the gaps between white and Hispanic students in both grades narrowed from 2000 to 2007.

Score gaps related to family income, as indicated by student eligibility for subsidized lunches, also shrank between 1996 (the first year available) and 2007, as well as between 2000 and 2007 for fourth graders. For eighth graders, the gap between low-income and other students was about the same in 1996 and 2007, with some fluctuations in between. It showed a decrease from 2000 to 2007.

Achievement is also measured in a different way from
the scale scores discussed above: the percentages of students
scoring at or above the *basic* and *proficient* levels and reaching the *advanced* proficiency level set by the NAEP
governing board. Students also improved steadily from
1990 to 2007 on this measure (figure *advanced* level (Lee, Grigg, and Dion 2007).

**Long-Term Trends in Mathematics Performance.** The
NAEP Long-Term Trend assessment program has tested
students ages 9, 13, and 17 in mathematics for more than
three decades. LTT assessments differ from the main NAEP
assessment, whose frameworks and tests are revised over
time to follow changes in common curriculum at targeted
grade levels, in that the LTT assessment for each grade level
has tested the same knowledge and skills over time.^{[4]}

Since this testing program began, 9- and 13-year-olds
raised their scores, while 17-year-olds' scores were essentially
flat, with no difference between the first test score
(304) in 1973 and the most recent (306) in 2008 (appendix table

In each age group, black students gained more points than white students over the earlier part of the period, narrowing the gaps with whites. The gap between blacks and whites for 9-year-olds narrowed from 35 points in 1973 to 26 in 2008. For 13-year-olds, the gap decreased substantially, from 46 to 28 points. For both of the younger age groups, this narrowing occurred mainly through 1986; after that, both racial groups increased their scores at roughly similar rates. Among 17-year-olds, the 1973 gap between blacks and whites of 40 points decreased to 26 points in 2008, with the smallest gap appearing in 1990.

Hispanic students at all three ages gained more points over time than did whites on the mathematics assessments, particularly 13- and 17-year-olds. The score gaps with their white peers thus appeared to decrease, but none of those changes was significant, in part due to relatively small Hispanic sample sizes in some years.

Parents' educational attainment, a measure of socioeconomic status, was collected from 13- and 17-year-olds. At all levels of parental education, 13-year-olds' achievement increased over the 35 years, while 17-year-olds' performance improved only among students whose parents had not finished high school.

Two recent assessments place U.S. student achievement in mathematics and science in an international context: the Trends in International Mathematics and Science Study and the Program for International Student Assessment. TIMSS and PISA differ in several fundamental ways; see sidebar "Differences Between TIMSS and PISA Assessments." Reports on TIMSS and PISA test results typically compare U.S. performance with that of all participating countries or with that of all members of the Group of Eight (G-8) or Organisation for Economic Co-operation and Development (OECD) (Gonzales et al. 2008; Miller et al. 2009; Gonzales et al. 2004; Baldi et al. 2007). The differences in the characteristics of countries that participate in these two studies, however, confound comparisons between the United States' relative standing on the two assessments.

This section compares U.S. performance to that of a subset of nations that either have advanced economies that compete globally in fields related to science, technology, engineering, and mathematics (STEM) or have developing economies with rapidly growing capabilities in these areas. Most of the selected countries were included because of their current capabilities in science and technology. A few Asian countries that are seeking to develop such capacity were also included to highlight student performance in these highly dynamic countries. (This geographic focus is maintained where possible in the international sections of other chapters.) Not all of the 28 selected nations participated in each assessment, so the number available for comparison with the United States differs by test. Scores for all participating nations are shown in appendix tables.

Results from the two assessments are contradictory: U.S. average scores on TIMSS tend to place the United States around the middle of the group of selected nations, and in mathematics, the United States improved over time. In contrast, U.S. scores on PISA were generally near the bottom of the group, and the U.S. standing relative to other nations declined in both mathematics and science. Some of these performance differences may be explained by the differences in the tests and which countries participate (see sidebars "Differences Between TIMSS and PISA Assessments" and "Sample Items From TIMSS and PISA Assessments").

**Mathematics Performance of U.S. Fourth
and Eighth Graders on TIMSS**

The fourth grade TIMSS mathematics exam covers three
content areas: number, geometric shapes and measures,
and data display. The eighth grade assessment addresses
four content domains: number, algebra, geometry, and data
and chance.

**Performance Trends.** Over the 12 years since the first
TIMSS mathematics assessments in 1995, U.S. fourth and
eighth graders raised their scores and international ranking
(Gonzales et al. 2008). The fourth grade average of 529 in
2007 was 11 points higher than in 1995. For eighth graders,
the U.S. average of 508 in 2007 reflected a 16-point rise over
1995's score (figure

Not only did U.S. fourth graders' mathematics scores
increase, but the U.S. position relative to selected other nations
also shifted upward from 1995 to 2007. Of the selected
nations whose fourth graders participated in both the 1995 and 2007 TIMSS, four outscored the United States in 1995,
compared with three in 2007 (figure ^{[5]}
U.S. eighth graders
also gained ground over time, outperforming no foreign peers in 1995 but two in 2007. Students from eight of the
selected nations outscored U.S. eighth graders in 1995, compared
with four in 2007.

**Performance on the 2007 TIMSS Mathematics Tests.** The fourth grade tests focused on three content domains:
number, geometric shapes and measures, and data display
(about half the assessment emphasized the number domain,
including introductory algebra). For eighth grade,
the four content domains were number, algebra, geometry,
and data and chance. The cognitive domains addressed in
TIMSS are the same for both grades—knowing, applying,
and reasoning.

U.S. fourth graders' average score on the 2007 TIMSS
mathematics assessment (529) was just below the combined
average for 14 selected nations (534)
(table

The U.S. eighth grade average mathematics score of 508
was also below the combined average (514) for 16 selected
nations and below 5 nations' individual averages (table

Although U.S. students as a whole did not lead the world
in TIMSS mathematics, two U.S. states that participated individually
(Massachusetts and Minnesota) provide examples of
high performance (see sidebar "Two States' Performance on
TIMSS: 2007"). Scores at the 90th percentile present another
way to examine high-achieving students (those who scored
higher than 90% of all test takers). In mathematics, the 90th
percentile score for U.S. fourth graders was 625, lower than
that of six other nations (table

**Science Performance of U.S. Fourth
and Eighth Graders: TIMSS**

**Performance Trends.** In contrast to the mathematics
trends, which showed improvement in both grades, the average
scores of U.S. students on the TIMSS science assessment
have remained flat since 1995. Fourth graders have
lost ground internationally, whereas eighth graders slightly
improved their position relative to other nations (Gonzales
et al. 2008). At fourth grade, the United States outperformed
six of seven selected nations in 1995 but only two of them
in 2007. In addition, the single comparison nation that did
better than the United States in 1995 (Japan) was joined by
Singapore and Hong Kong in 2007.

The trend in U.S. standing of eighth graders was slightly upward: nations scoring higher than the United States on the science assessment dropped from eight in 1995 to six in 2007. In addition, the United States had not outperformed any of the 10 other nations in 1995 but outscored 2 of them in 2007 (Sweden and Norway).

**Performance on the 2007 TIMSS Science Tests.** The
fourth grade science tests focused on three content areas:
life, physical, and earth sciences; and on three main skills:
knowing, applying, and reasoning. At eighth grade, content
areas expanded to four: biology, chemistry, physics, and
earth sciences. The cognitive domains underlying test development
were the same for both grades: knowing, applying,
and reasoning. The fourth grade tests emphasize knowing
more than the eighth grade tests, while reasoning is a greater
focus in eighth grade.

On the 2007 TIMSS science test for fourth graders, four
of the comparison nations scored higher and six scored
lower than the United States, putting the United States just
above the middle of the group (table

The U.S. 90th percentile score for fourth graders was 643, ranking lower than in 2 other nations and higher than in 8, or above the midpoint for these 15 nations (Gonzales et al. 2008). The difference between Singapore (whose fourth graders led all countries) and the United States at the 90th percentile was 58 points. In eighth grade, U.S. students at the 90th percentile in science scored roughly in the middle of the group—lower than in six other nations and higher than in five. See sidebar "Linking NAEP and TIMSS Results."

**Mathematics Performance of U.S. 15-Year-Olds: PISA**

**Performance Trends. **In contrast to the TIMSS results,
U.S. 15-year-olds' performance consistently dropped on the
PISA tests of mathematical and scientific literacy in relation
to student performance in other nations. The U.S. mathematics
average of 474 in 2006 is 19 points lower than in
2000, when the first PISA exams were given, but changes in
the tests mean that the scores cannot be directly compared
(
OECD 2001; Baldi et al. 2007). While the United States
scored below 7 nations in 2000, it scored below 15 nations
in 2006 (of 19 nations with data available for both years).

**Performance on the 2006 PISA Mathematics Test.**
PISA assesses 15-year-old students in all OECD nations
and a range of other nations every 3 years on literacy in
mathematics, science, and reading. The mathematics test
covers four content areas: space and shape, change and relationships,
quantity, and uncertainty. A main mathematics
skill tested is problem solving (explored in greatest depth in
2003, when math was PISA's main focus). Sjøberg (2007)
and Goldstein (2004) discuss PISA's content, including
challenges and critiques.

On the most recent PISA tests, the U.S. score was 474,
below 18 of the selected nations' scores (table

The U.S. score at the 90th percentile in mathematics was 593, lower than that in 18 other nations that participated in the PISA exam and higher than in another 3 nations (Thailand, Indonesia, and Brazil) (Baldi et al. 2007). None of the OECD member nations had a lower 90th percentile score than the United States.

**Science Performance of U.S. 15-Year-Olds: PISA**

**Performance Trends.** The U.S. rank among selected nations
declined on the PISA scientific literacy test, as on the
mathematics assessment. In 2000, the United States scored
below 6 other selected nations (out of 19 participating in both
years), but in 2006, that number doubled to 12 (figure ^{[6]}

**Performance on the 2006 PISA Science Test.**
To measure
scientific literacy, PISA includes three skill areas:
identifying and understanding scientific issues, explaining
phenomena scientifically, and using scientific evidence. Students
were tested on their grasp of essential scientific concepts
and theories in four content areas: physical systems,
living systems, earth and space systems, and technology
systems. Test items probed whether students understood
how scientists obtain evidence (scientific means of inquiry)
and how scientists use data. The test scores range from 1
to 1,000, and the mean for the 2006 science test was set at
500. The score scale is divided into six distinct proficiency
levels that measure competence in science concepts and
reasoning; each proficiency level encompasses roughly 75
points (OECD 2007). To put score differences in context,
the average gain from one grade to the next was 38 points,
or roughly half a full proficiency level. (This one-grade gain
was measured using data from nations with sufficient numbers
of 15-year-olds in two consecutive grades.)

The science literacy performance of U.S. 15-year-olds in
2006 placed the United States below 15 of 24 other nations
and above 4, far below the midpoint (table

The U.S. 90th percentile score in scientific literacy was 628, below the corresponding score in 10 of the 24 nations with data, but above it in 9, putting U.S. top-scoring students just below the middle of the 90th percentile science score distribution for these selected nations. Thus, U.S. high achievers in science placed in a better position relative to other countries than did U.S. students on average.