2003 and 2006 Data Correction Status

Update: 03/09/2011

Data tables reporting race/ethnicity and corresponding technical tables have been replaced with corrected versions in the 2003 and 2006 editions of the report Characteristics of Doctoral Scientists and Engineers in the United States, NSF 06-320 and NSF 09-317.

Update: 12/07/2010

Data files for the 2008 Survey of Doctorate Recipients (SDR) have been reviewed and verified. The first publication of these data will be through an InfoBrief, scheduled for release in early January, that contains employment tables, with employment status broken out by race/ethnicity. Preliminary 2008 SDR data and corrected 2006 SESTAT data will be included in the report Women, Minorities, and Persons with Disabilities: 2011, scheduled for release at the end of January.

The 2008 SESTAT integrated workforce file, which combines data from the 2008 National Survey of College Graduates, the 2008 National Survey of Recent College Graduates, and the 2008 SDR, is in production. It will be released for use upon completion.

Update: 11/09/2010

Corrected Public Use Data Files for the 2003 and the 2006 Survey of Doctorate Recipients (SDR) and the integrated Scientists and Engineers Statistical Data System (SESTAT) are now available. Corrected data are also available in the SESTAT Data Tool. We are correcting affected tables in the previously-published Detailed Statistical Tables reports and also the restricted data files provided to licensees. These updated products will be released as soon as they are available.

Update: 11/03/2010

The race and ethnicity data error in the 2003 and 2006 Survey of Doctorate Recipients has been corrected, and the data reviewed and validated. The tables and figure below show changes in counts of S&E doctorates and reporting of race/ethnicity for both survey cycles. Revised data for the 2003 and 2006 SDR and the integrated SESTAT file will soon be available in the SESTAT data tool (https://sestat.nsf.gov/sestat/sestat.html) and in the public use files. We are working to correct restricted data files provided to licensees and affected tables in the previously-published Detailed Statistical Tables reports. These updated products will be released as soon as they are available.

Changes in Counts of S&E Doctorates by Race/Ethnicity, 2003 and 2006 SDR
2003 2006
Number % of total % change (number) Number % of total % change (number)
All doctorates
Previously reported 685,300 100.00 711,800 100.00
Corrected 685,300 100.00 711,800 100.00
Change 0 0.00 0.00 0.00 0.00 0.00
Previously reported 17,020 2.48 19,760 2.78
Corrected (any race) 17,590 2.57 19,680 2.76
Change 570 0.08 3.35 -80 -0.01 -0.40
Previously reported 668,280 97.52 692,040 97.22
Corrected (any race) 667,700 97.43 692,110 97.23
Change -580 -0.08 -0.09 70 0.01 0.01
American Indian/Alaska Native
Previously reported 4,470 0.65 4,700 0.66
Corrected (single race) 1,120 0.16 1,430 0.20
Change -3,350 -0.49 -74.94 -3,270 -0.46 -69.57
Previously reported 108,150 15.78 114,220 16.05
Corrected (single race) 104,780 15.29 111,360 15.64
Change -3,370 -0.49 -3.12 -2,860 -0.40 -2.50
Previously reported 18,960 2.77 20,310 2.85
Corrected (single race) 17,240 2.52 19,160 2.69
Change -1,720 -0.25 -9.07 -1,150 -0.16 -5.66
Native Hawaiian/Other Pacific Islander
Previously reported 720 0.11 870 0.12
Corrected (single race) 580 0.08 730 0.10
Change -140 -0.02 -19.44 -140 -0.02 -16.09
Previously reported 535,600 78.16 551,620 77.50
Corrected (single race) 536,420 78.28 552,080 77.56
Change 820 0.12 0.15 460 0.06 0.08
2 or more races (nonHispanic)
Previously reported 380 0.06 320 0.04
Corrected 7,570 1.10 7,350 1.03
Change 7,190 1.05 1892.11 7,030 0.99 2196.88

Reporting of Race/Ethnicity by S&E Doctorates in 2003 and 2006 SDR
2003 2006
Number Percent Number Percent
All doctorates 685,300 100.00 711,790 100.00
Hispanic (any race) 17,590 2.57 19,680 2.77
Non-Hispanic 667,700 97.43 692,110 97.23
Single race 660,130 96.33 684,760 96.20
2 races 7,080 1.03 6,870 0.96
3 or more races 490 0.07 480 0.07
American Indian/Alaska Native, all reporting 5,170 100.00 5,440 100.00
Single race 1,120 21.65 1,430 26.25
2 or more races 4,050 78.35 4,010 73.75
Asian, all reported 107,800 100.00 114,160 100.00
Single race 104,780 97.19 111,360 97.55
2 or more races 3,020 2.81 2,800 2.45
Black, all reported 18,360 100.00 20,240 100.00
Single race 17,240 93.92 19,160 94.68
2 or more races 1,120 6.08 1,080 5.32
Native Hawaiian/Other Pacific Islander, all reporting 1,250 100.00 1,430 100.00
Single race 580 46.21 730 50.99
2 or more races 670 53.79 700 49.01
White, all reporting 543,300 100.00 558,850 100.00
Single race 536,420 98.73 552,080 98.79
2 or more races 6,880 1.27 6,760 1.21

Update: 10/07/2010

SRS has conducted extensive reviews of the revised raw data files for 2003 and 2006 and we are now satisfied that the revised race and ethnicity data for those years are correct. The data have been subjected to a range of quality control checks and have passed those reviews. Next we will tabulate and populate revised tables for the 2003 and 2006 SDR Detailed Statistical Tables reports. The revised tables will also be subjected to detailed quality control checks.

Update: 10/01/2010

We have calculated and are now reviewing the revised 2003 data for accuracy. Once the 2003 data are validated, we will implement a similar process to correct the 2006 data, and will then update the 2003 and 2006 SDR data files and the 2003 and 2006 files for the overall S&E workforce (SESTAT). We then plan to re-issue the 2003 and 2006 SDR and SESTAT data files and make corrections to the affected products.

Initial Notice: 9/9/2010

SRS has uncovered a data problem related to individuals reporting more than one race in the Survey of Doctorate Recipients (SDR). This problem affects many previously released publications and products—for some the ramifications are significant and for others less so—and SRS wants to alert the community and its data users to the issue as soon as possible. We apologize profusely for any inconvenience this problem may cause you.

The 2000 Census incorporated a new question for race that allowed an individual to check more than one category. For example, an individual could identify himself or herself as American Indian and white or Asian, black and white. OMB required all federal surveys to implement a similar question quickly and SRS introduced collecting data on multi-race individuals in its surveys between 2000 and 2003.

How to report the multi-race data (there are hundreds of possible combinations) has been a challenge for all agencies (and researchers using the data). OMB has provided some guidelines on how to report the data, but has left agencies substantial latitude how to do so. SRS developed procedures to report race/ethnicity data for multi-race individuals, which have been evolving over time. Initially, when there were relatively few such individuals, they were reported in an "other" category (which usually included those whose race/ethnicity was unknown and Native Hawaiians/Other Pacific Islanders, another very small group that was introduced with the 2000 Census). More recently, as the number of respondents reporting more than one race has increased, SRS has been able to report separately on the number of those reporting more than one race in some instances, such as in the Summary Report for the Survey of Earned Doctorates (SED) beginning with the 2007 data.

The SDR is a longitudinal sample survey; the SDR sample is drawn from the SED. For the 2003 survey, to insure sufficient sample for minorities in the SDR, SRS and its contractor, NORC, established procedures to maximize inclusion of minority individuals in the SDR sample. One of these procedures involved detailed specifications for assigning individuals in the sample frame who had reported more than one race in a previous survey to a single race category for the purposes of sampling only to maximize the sample for small race/ethnicity categories. The most simplistic way of describing what was done was to take an example—if I designated myself as multi-race composed of American Indian and white in the SED, then for sampling purposes the specifications we and the contractor developed assigned me to the non-Hispanic American Indian strata.

SRS expected the detailed data files received from the contractor for the 2003 and 2006 SDRs to include all the race detail for each individual so that data for multi-race individuals would indicate each of the races a respondent had reported. However, for most multi-race individuals on the file, this did not happen; rather the contractor provided data files which classified most such individuals using the sampling classification, meaning that non-Hispanics who reported more than one race are generally shown in the files as having reported only one race. To continue the example, for the data file my record should have indicated that I chose American Indian and white, but instead it showed only American Indian. In publications using the data I would be included in the American Indian category rather than the multi-race category. As a result, publications using the race data from the 2003 and 2006 SDRs have overestimated the number of blacks, Asians, and American Indians, and have underestimated the number of multi-race individuals.

The magnitude of the problem is greatest for American Indians/Alaska Natives (AI/AN) as the following illustrates.

2001 SDR 2,100 AI/AN doctorate recipients
(when there was no multi-race data)
2003 SDR 4,500 AI/AN doctorate recipients.
This number should have been 1,500.
2006 SDR 4,600 AI/AN doctorate recipients.
This number should have been 1,600.

This error has substantial ramifications for a number of SRS products. The SDR is a component of the Scientists and Engineers Statistical Data System (SESTAT); thus 2003 and 2006 SESTAT data files are likewise affected. SDR and SESTAT data on variables other than race and ethnicity are not affected.

SRS is in the process of identifying what has been impacted, but at the least the error impacts to varying degrees:

  • Chapters 3 and 5 of Science and Engineering Indicators for 2008 and 2010
  • Women, Minorities, and Persons with Disabilities in Science and Engineering, 2007 and 2009
  • Three SRS InfoBriefs
  • Public use and restricted data files provided to SRS licensees for the 2003 and 2006 SDR and SESTAT
  • Detailed Statistical Tables for 2003 and 2006 for SDR
  • Data provided to CEOSE
  • Data provided for one table to the U.S. Statistical Abstract

SRS has just begun to address the problem, but has developed a plan for moving forward. The following is a brief summary of the steps being put in place.

  1. The contractor will prepare new data files (this will not be completed until late September) and SRS will conduct a stringent quality control review of the new data files.
  2. SRS staff has assembled a list of all products affected by the data file error and the specific text, tables, and figures affected in each product.
  3. The problem will be addressed for every affected product but the strategies for doing so will vary based on the magnitude of the problem, available resources, and the availability of alternative sources of the correct information. The options for addressing the problem range from a reissue of an entire publication to posting a notice on the SRS website pointing to a product or a publication where the correct data may be found.
  4. SRS developed and begun implementing a communication plan informing a broad range of data users and interested parties (both internal to NSF as well as external) of the problem. The plan will include keeping users and interested parties informed of progress in rectifying the problem as well as informing them when corrected data are available. A notice went up on the SRS website on August 30 and discussions with NSF's Committee on Equal Opportunity in Science and Engineering have taken place. Numerous data users and government officials are being contacted.
  5. As part of the outreach to interested parties SRS will also solicit ideas for possible other products related to American Indians and multi-race individuals that could be produced. For instance, SRS will try to expedite release of InfoBriefs for the 2008 SDR and the 2008 National Survey of Recent College Graduates, potentially with a focus on race and ethnicity.
  6. SRS has developed an inventory of American Indian groups that need to be notified and solicited for input for new/revised products, and these groups have already been contacted. SRS is continuing to seek out other groups to contact of the data issue.

SRS will keep the community informed of progress in identifying the impact of the problem, and will notify data users of the corrective steps it plans and its implementation schedule. SRS is extremely distressed about this error and will institute additional procedures to ensure that such a problem does not occur in the future.

Lynda T. Carlson, Ph.D.
Director, Division of Science Resources Statistics
National Science Foundation

