NCSES Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates
National Science Foundation National Center for Science and Engineering Statistics
Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates

Adding a Field of Degree Question to the ACS


While the ACS provides opportunities for the improvement of coverage of the SESTAT target population, there is one area where the ACS is no different than the decennial Census in that it does not collect any information on the field of an individual's college degree(s). The SESTAT target population has two components (S&E or S&E-related degree OR S&E or S&E-related occupation), but it was only possible to use the latter as a sampling characteristic for the NSCG from the long form, which resulted in a sample with many non-SESTAT-eligible cases. Adding an item to the ACS related to the field of a person's college degree would greatly increase the efficiency of the ACS as a sample frame for the NSCG, as then the first part of the SESTAT population definition could be used as a sampling characteristic, and the ACS would contain data on both dimensions that determine whether an individual meets the SESTAT criteria for being classified as a scientist or engineer.

NCSES began discussing the issue of adding a field of degree (FOD) item to the ACS several years ago and has worked with other agencies, the Office of Management and Budget (OMB), the U.S. Census Bureau, and congressional staff on this process. A variety of question formats and the content of an FOD item were discussed, investigated, and tested. A new FOD item would immediately follow the educational attainment question on the ACS and would only be asked for those whose highest level of educational attainment was a bachelor's degree or higher. Such an item could ask about FOD for one or more degrees. The test for the SESTAT definition on educational background is conducted by looking at the entire educational history (all degrees held) to determine if one or more of them are in S&E or S&E-related degrees. In 2003, approximately 40% of the NSCG reported having one or more degrees at the bachelor's level or higher. Ideally, for the purpose of NSCG sampling, ACS would collect information about the FOD for all degrees.

After initial discussions with the U.S. Census Bureau, it became evident that there would only be room on the ACS questionnaire for a question on a single degree. Therefore, it was necessary to identify a single degree that could be collected, and NSF recommended that such an item focus on the FOD for an individual's bachelor's degree, not the most obvious approach given that the ACS educational attainment question requests information on each individual's highest degree of attainment. It might have been easier to implement an FOD question related to a person's highest degree. However, this would cause significant coverage problems for the NSCG. In the 2003 NSCG population, 17% of individuals whose first bachelor's degree was in an S&E or S&E-related field reported that their highest degree was in a non-S&E field. Only a small proportion (4%) of non-S&E bachelor's-degree holders reported that their highest degree was in an S&E or S&E-related field. By asking for the field of bachelor's degree instead of field of highest degree, fewer NSCG (and SESTAT) coverage problems were likely to result. Thus, for stratification purposes for sampling, asking about a person's bachelor's degree is the best choice if FOD for only one degree is possible. The sampling efficiency for SESTAT-eligible cases would be greater (i.e., there would be fewer cases sampled that were not SESTAT eligible) asking about FOD of the bachelor's degree rather than of the highest degree. Finally, for the purpose of analysis of the degree holders in any field, not just in S&E or related fields, data about the same degree level for all sample cases would likely be useful.

NSF recommended that the content of the FOD item collect information specifically about all degree fields for the bachelor's degree and not just those in S&E or related fields. Gathering information about all degree fields would make the information much richer for analytical purposes for a wide variety of users and would improve its value for NSF purposes as well. Such data can be used to compare patterns for those in S&E and S&E-related fields with those in non-S&E fields. Respondent accuracy in reporting degrees in S&E and S&E-related fields is likely to be better if there are specific categories for non-S&E fields, with examples, rather than simply a list of S&E or S&E-related categories and then a residual category labeled as "other" or "non-S&E."

Beginning in 2006, NCSES worked with the U.S. Census Bureau as well as two groups of academic researchers to develop and test alternative formats of an FOD question. Based on the preliminary research, two alternative formats of the question were developed and tested in the 2007 ACS Methods Test, completed in fall 2007. The result of the 2007 ACS Methods Test was to use an open ended question for FOD, which would be coded by Census. Details on the two question formats are provided later in this paper.

The evaluation of the 2007 ACS Methods Test for FOD did not reveal major problems with the FOD items. OMB approved adding an open ended question to the ACS in 2009.[23]

NSCG Sampling with an ACS FOD Item

The addition of an FOD question on the ACS would affect the cost and efficiency of any of the NSCG design options outlined earlier for using the ACS as a sample frame. For example, regardless of the option, because it would be possible with the FOD item to mimic more closely the SESTAT target population with respect to educational background, the oversampling needed to find sufficient SESTAT-eligible cases in non-S&E occupational groups could be reduced substantially.

The potential for cost reductions and efficiency improvements throughout the decade will depend on how often and how extensively the ACS frame is used for drawing samples for the NSCG, on the format used to collect the FOD data, and on the accuracy of the FOD data. The sample size needed to obtain efficiency similar to the postcensal surveys in the past was not determined precisely until decisions were made and testing provided information on the quality of the FOD information collected on the ACS. However, it was reasonable to presume that the sample size of a once-a-decade sample could be cut substantially. Alternatively, the sample size could be maintained (or reduced somewhat less) to yield a larger in-scope sample, allowing better coverage and the ability to report for rare populations or small domains, such as race/ethnicity or sex in S&E occupations.

Major improvements that could result from having an FOD question to use for sampling for the NSCG, regardless of which option(s) for sampling were chosen, included the following:

  1. More efficient screening could result. The ratio of NSCG cases sampled to SESTAT-eligible respondents could decline considerably from the 2003 rate of 2.6 to 1.0; the reduction would depend on the sampling strategy chosen, the question format, and the response accuracy.

  2. Improved efficiency of the sample provides the opportunity to maximize the return on a set sample size by better targeting more of the available sample for key groups (e.g., women, minorities, and persons with disabilities), which would help SESTAT to improve estimates for these populations.

  3. A smaller, more efficient sample could allow for time and resources to be spent on quality improvements for the survey.

  4. A smaller sample size and fewer respondents would mean a reduced overall respondent burden in terms of burden hours.

Even with an FOD item, some level of screening of cases drawn from the ACS is necessary because the combination of the FOD question and occupation does not fully identify all SESTAT-eligible cases. For example, the FOD for the bachelor's degree does not allow for the identification of non-S&E bachelor's-degree holders with non-S&E occupations who have an S&E or S&E-related degree at the master's-degree level. Additionally, there may be some accuracy in reporting of the degree field or occupation (either type 1 or type 2 errors) on the ACS that could be verified with the NSCG follow-up survey. Furthermore, there is value to NSF to collect some data periodically for comparison purposes on those who are not scientists and engineers.

The use of an ACS sampling frame provides ample opportunities for variation in the NSCG survey design. Having an FOD question would enhance all four options for utilizing the ACS as a frame for the NCSG. The impact on each of the four options is discussed below.

  1. Continue the current approach, refreshing the sample once a decade
    With an FOD item, the sample size needed to produce a yield of SESTAT-eligible cases equivalent to that stemming from the postcensal NSCG surveys in the past several decades could be much smaller. The ACS frame could be used once per decade, with a much smaller sample, but it would not have to be limited to the decennial Census time frame.

  2. Update the entire sample more than once a decade
    Redrawing the sample more frequently than once a decade would involve greater costs than Option 1, but the difference in costs would be much less with an FOD item than without it. Some screening would be necessary every time a new ACS sample was drawn because of measurement error (both FOD and occupation).

  3. Rotating sample approach
    A rotating sample approach maintains some of the longitudinal aspects of the historical design. Similar to Option 2, the addition of an FOD item would reduce the size of each new sample panel from ACS that would be needed to obtain the desired number of SESTAT-eligible cases.

  4. Selective updates
    Addition of the FOD item could facilitate this option for utilizing the ACS to the extent that the target group of interest is related to the field of a person's bachelor's degree. For example, during the rise and fall of the technology/IT firms, it would have been of great interest to compare the employment patterns of those in IT occupations with differing degree backgrounds, such as computer science, engineering, or non-S&E fields.

Based upon the recommendation by CNSTAT after presenting the four options, NCSES chose option 3.

Technical Issues for Sampling from ACS

There are several issues that impact the use of the ACS for the NSCG sample design: the form of the FOD question and how swapped and imputed data for educational attainment and FOD are assigned in the ACS. Each is discussed below.

  1. Form of the FOD question
    The efficiency, attractiveness, and costs of the various designs for the NSCG depend on the form of the FOD question. The more detailed the FOD information available for sampling, the better samples can be allocated to domains of interest. The U.S. Census Bureau (Rothgeb and Beck 2007), Don Dillman (Washington State University; see Dillman, Mahon-Haft, et al. 2006; Dillman, Mahon-Haft, and Wright 2006), and Jon Krosnick (Stanford University; see Cobb, Krosnick, and Bannon 2006) conducted a series of experiments that led to the development of two versions of FOD items that were tested on the 2007 ACS Methods Test—one is a categorical question, and the other is an open-ended version. Each is shown below.

  2. Categorical version (FOD1)

    This question focuses on this person's BACHELOR'S DEGREE. In which of the following major fields did this person receive his/her BACHELOR'S DEGREE(S)? Mark (X) "Yes" or "No" box for each category.

        Yes No
    a. Biological, Agricultural, Physical, Earth, or Other Natural Sciences |__| |__|
    b. Health, Nursing, or Medical Fields |__| |__|
    c. Engineering, Computer Sciences, or Mathematical Sciences |__| |__|
    d. History, Arts, or Humanities |__| |__|
    e. Psychology, Economics, or Other Social Sciences |__| |__|
    f. Business or Management |__| |__|
    g. Education or Education Administration |__| |__|
    h. Some other major field – Specify |__| |__|

    Open-ended version (FOD2)

    This question focuses on this person's BACHELOR'S DEGREE. Please print below the specific major(s) of any BACHELOR'S DEGREES this person has received. (For example: chemical engineering, elementary teacher education, organizational psychology.)



    There were issues to be resolved regardless of the version of the question that was chosen. Some of these issues affect the version that was chosen; others affect the use of the data for sampling or analysis. Table 4Excel table. lists some of the issues of concern.

    A sampling design for NCSG using the ACS with FOD could be crafted to produce little or no undercoverage in an initial sample drawn from the ACS. The form of the FOD question and the accuracy of the information provided impacts the gains in efficiency. For example, how accurate will the reports on the FOD item be for those reporting for others in the household (proxy reports) compared to those reporting for themselves? If the FOD and occupation items can be used to accurately distinguish scientists and engineers from other college graduates, substantial gains in efficiency were possible. For NSCG sampling purposes, the most important concern was whether a degree is accurately reported as falling into an S&E, an S&E-related, or a non-S&E category.[24]

    The accuracy of the FOD reporting will be evaluated after the first NSCG is conducted using the ACS. The information from the detailed education history collected as part of the NSCG from the individual (where there are no proxy reporters) can be compared to the information reported on FOD (and educational attainment) in the ACS. Analysis of the reinterviews in the 2007 Methods Panel testing of the two FOD items showed that responses to both versions of the FOD item were reliable and valid. However, validity on the full ACS sample is an open question.

    Some number of cases apparently not meeting the criteria of being a scientist or engineer (a non-S&E bachelor's degree and a non-S&E occupation) would be drawn in the NSCG sample from the ACS frame both to provide a comparison group and to account for those in non-S&E occupations with a non-S&E bachelor's degree but an S&E or S&E-related degree at a higher level. It was advisable in drawing the first NSCG sample from the ACS to allocate part of the sample to test the efficiency of the FOD item for sampling purposes, either drawing a larger number of apparently non-S&E cases that might be done otherwise or drawing a portion of it using the long-form procedures without taking the FOD information into account.

  3. Swapped and imputed data for education level and FOD

  4. The U.S. Census Bureau regularly uses a technique called swapping data to create public-use data sets (a decision based on the Bureau's overall disclosure policies). Swapping is done during the survey data processing. NCSES requested that the edited ACS file, before swapping, be used for weighting and creation of the NSCG sampling frame. Using swapped data would greatly reduce the stratification efficiency, especially when disproportionate stratified sampling is used to target precision levels for selected domains. The U.S. Census Bureau allowed use of unswapped data from the ACS for sampling for the 2010 NSCG.

    Another technical concern was the use of imputed data from the ACS. Imputed educational attainment level data (the U.S. Census Bureau calls them allocated data) should not be used for sampling. Imputed data create an unacceptable amount of undercoverage of those with a bachelor's degree (estimated at 3% to 7%; see Finamore, Hall, and Fecso 2006) as well as sampling inefficiency (when those with an imputed education level of a bachelor's degree turn out not to have a bachelor's degree). Records that have imputed educational attainment level data were put aside prior to sampling, and a small sample of these ACS cases could be subsequently sampled to measure bias.

    Adding an FOD question to the ACS could create an entirely new issue related to imputation. Given the relatively poor performance of the imputation methods for education level (the imputation performs much like the full-file, missing-at-random model), it is unclear how imputation should be done for missing FOD. For individuals with an S&E or S&E-related occupation, FOD imputation might perform well. For other occupations, it is not obvious that an acceptable imputation model can be developed. It may be that such cases will need to be treated as missing and reweighted. A program of research on imputation and nonresponse weighting for missing FOD is desirable.

Top of page.


[23] With a full year of data available from the ACS (2005), NCSES can begin to work with the U.S. Census Bureau to explore the use of the ACS without the FOD item for the NSCG (and for analysis). NCSES needed to use the ACS whether or not there was an FOD degree question.

[24] In the categorical version of the FOD question tested, only one set of S&E-related fields (health) can be captured accurately. In order to identify samples in other S&E-related fields, NSF had to sample some of the non-S&E FOD categories and some non-S&E occupations. For example, in order to find individuals with degrees in science or math teacher education (an S&E-related field), it was necessary to sample some individuals with bachelor's degrees in "Education or education administration" and some secondary teachers.

Using the American Community Survey as the Sampling Frame for the National Survey of College Graduates
Working Paper | NCSES 12-201 | August 2012