Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates

Background on NSF's Workforce Surveys


The target population for the NSF/NCSES SESTAT system of surveys is noninstitutionalized individuals living in the United States who are 75 years of age or less and are considered to be scientists and engineers. NCSES currently defines scientists and engineers as those who hold a bachelor's degree or higher in an S&E or S&E-related field OR who have a bachelor's degree or higher in a non-S&E field but hold an S&E or S&E-related occupation.[2]

NSF uses a system of three surveys with separate frames to achieve coverage of the population of scientists and engineers. These surveys are the following: the NSCG, the National Survey of Recent College Graduates (RCG), and the Survey of Doctorate Recipients (SDR). The three surveys utilize nearly identical data collection instruments and data processing procedures and are fielded at the same time with the same reference date.[3] The three surveys have been designed to provide the advantages of a large sample size and greater coverage of the target population, with special emphasis given to relatively rare populations (e.g., doctorates, recent graduates, and minorities).

Data from the three surveys for all cases that qualify as scientists and engineers according to the SESTAT target population definition are integrated into a comprehensive database, the SESTAT integrated file, covering all college-educated scientists and engineers in the United States. In creating the integrated database, the issue of potential for eligibility for more than one of the surveys is addressed, as it is possible that cases identified in one survey might also be eligible for another survey.[4] The integrated file is used to produce national estimates of the number and characteristics of scientists and engineers in the United States.

The SESTAT surveys are conducted approximately every 2 to 3 years and provide cross-sectional time-series data; preliminary SESTAT longitudinal files have been prepared for the period covering 1993–99.[5] The NSCG provides the majority of cases in the SESTAT integrated database and represents the "stock" of scientists and engineers at the beginning of the decade. The SDR provide the stock of experienced U.S. doctorates, as well as the "flow" of U.S. new science, engineering, and health (SEH) doctorates. The RCG captures the flow of new U.S. SEH bachelor's and master's graduates.

The NSCG is a panel survey with a new panel selected at the beginning of each decade. Respondents to the NSCG who are identified as eligible for the SESTAT target population are eligible for the NSCG follow-up surveys for the rest of the decade. The RCG is a cross-sectional survey of new bachelor's- and master's-degree recipients in SEH fields; after entering the SESTAT system in the RCG (a new flow), a subsample is followed in the NSCG (as part of the stock). The target population for the SDR is all SEH doctorates awarded at U.S. institutions. While the overall sample size of the SDR is held steady, for each new round a sample of new SEH doctorates is added to the SDR sample from its frame, the Survey of Earned Doctorates (SED). Figure 1Figure. shows a conceptual diagram of the stocks and flows that make up the SESTAT system of surveys.

[2]  Beginning in 2003, the coverage of the SESTAT target population was expanded to include S&E-related degrees or occupations. A major component of this population expansion was in health fields and occupations. The Survey of Doctorate Recipients (SDR) had always included those with doctorates in health fields, and these cases had been included in the integrated database. Beginning with 2003, the National Survey of Recent College Graduates (RCG) population was expanded to include recent U.S. bachelor's- and master's-degree earners in health fields, and these individuals were included in the integrated database. Individuals with health degrees or occupations were also captured in the 2003 NSCG and included in the SESTAT database (as were individuals with other S&E-related degrees or occupations). A detailed description of S&E, S&E-related, and non-S&E fields and occupations can be found at http://sestat.nsf.gov/docs/ed03maj.html and http://sestat.nsf.gov/docs/occ03maj.html, respectively.

[3] Information on each of the surveys can be found at: http://www.nsf.gov/statistics/srvygrads/ (NSCG); http://www.nsf.gov/statistics/srvyrecentgrads/ (RCG); and http://www.nsf.gov/statistics/srvydoctoratework/ (SDR). Information on SESTAT can be found at http://sestat.nsf.gov/.

[4] To resolve this issue, SESTAT has developed a statistical integration process, employing a unique linkage rule. Each survey is weighted according to the frame developed for that survey. Additionally, a series of overlap variables are calculated that allows for the identification of cases that are eligible for more than one survey. To remove these multiple selection opportunities, each case within the SESTAT target population is uniquely linked to one and only one component survey, and that individual is included in the SESTAT integrated file only when he or she is selected for that linked survey.

[5] There has not been a significant amount of longitudinal analysis conducted with the SESTAT surveys. Longitudinal weights have only recently been created and have not yet been made available to non-SRS users. However, there is an interest in the user community in conducting longitudinal analysis, much of which is best done with longitudinal weights.

Working Paper | NCSES 12-201 | August 2012