Survey Overview
Key Survey Information
Survey Design
Data Collection and Processing Methods
Survey Quality Measures
Data Comparability and Changes
Definitions
Technical Table
Note(s)

Survey Overview

Purpose. The National Survey of College Graduates (NSCG), conducted by the National Center for Science and Engineering Statistics (NCSES) within the National Science Foundation (NSF), provides data on the characteristics of the nation's college graduates, with a focus on those in the science and engineering workforce. It samples individuals who are living in the United States during the survey reference week, have at least a bachelor's degree, and are younger than 76. By surveying college graduates in all academic disciplines, the NSCG provides data useful in understanding the relationship between college education and career opportunities, as well as the relationship between degree field and occupation.

The NSCG is designed to provide demographic, education, and career history information about college graduates and to complement another survey conducted by NCSES: the Survey of Doctorate Recipients (SDR, https://www.nsf.gov/statistics/srvydoctoratework/). These two surveys share a common reference date, and they use similar questionnaires and data processing guidelines.

Data collection authority. The information collected in the NSCG is solicited under the authority of the NSF Act of 1950, as amended, and the America COMPETES Reauthorization Act of 2010. In accordance with an interagency agreement, the U.S. Census Bureau collects the NSCG data under the authority of Title 13, Section 8, of the United States Code. The Office of Management and Budget control number is 3145-0141, with an expiration date of 29 February 2020.

Survey sponsor. NCSES.

Key Survey Information

Frequency. Biennial.

Initial survey year. 1993.

Reference period. The week of 1 February 2017.

Response unit. Individuals with a bachelor's degree or higher.

Sample or census. Sample.

Population size. Approximately 61.2 million individuals.

Sample size. Approximately 124,000 individuals.

Survey Design

Target population. The NSCG target population includes individuals who meet all of the following criteria:

Earned a bachelor's degree or higher prior to 1 January 2016
Are not institutionalized and reside in the United States (or Puerto Rico or another U.S. territory) as of 1 February 2017
Are younger than 76 years as of 1 February 2017

Sampling frame. Using a rotating panel design, the NSCG includes new sample cases from the 2015 American Community Survey (ACS) and returning sample cases from the 2015 NSCG.

The NSCG sampling frame for new sample cases included the following eligibility requirements:

Resided in the United States or Puerto Rico as of the ACS interview date
Noninstitutionalized as of the ACS interview date
Earned at least a bachelor's degree as of the ACS interview date
Under the age of 76 as of 1 February 2017
Had an accurate name and complete address on the ACS data file

Returning sample cases from the 2015 NSCG originated from three different frames (the 2009 ACS, 2011 ACS, and 2013 ACS) and had the following eligibility requirements:

A complete interview or temporarily ineligible during their initial survey cycle
Under the age of 76 as of 1 February 2017
Not a "hard" refusal during the 2015 NSCG survey cycle

Sample design. The NSCG sample design is cross-sectional with a panel element. As a cross-sectional study, the NSCG provides estimates of the size and characteristics of the college graduate population for a point in time. The panel element of the design consists of the follow-up surveys conducted every two or three years after the first survey year.

The NSCG uses a stratified sampling design to select its sample from the eligible sampling frame.

In the new sample, cases in the science and engineering (S&E) strata were selected using systematic sampling, and cases in the non-S&E strata were selected with probability proportional to size (PPS) sampling. Among the returning sample, all eligible cases were selected. The sampling strata were defined by the cross classification of the following four variables:

Young graduate oversample group eligibility indicator (2 levels)
Demographic group (9 levels)
Highest degree type (3 levels)
Detailed occupation group (25 levels)

As was the case in the 2015 NSCG, the 2017 NSCG includes an oversample of young graduates to improve the precision of estimates for this important population.

Data Collection and Processing Methods

Data collection. The data collection period lasted approximately 6 months (17 April 2017–29 October 2017). The NSCG used a trimodal data collection approach: self-administered online survey (Web), self-administered paper questionnaire (mail), and computer-assisted telephone interview (CATI). Individuals in the sample were started in one mode depending on their past preference and available contact information. After an initial survey invitation, the data collection protocol included sequential contacts by postal mail, e-mail, and telephone that ran throughout the data collection period. At any time during data collection, sample members could choose to complete the survey using any of the three modes. Nonrespondents to the initial survey invitation received follow-up with alternate survey modes.

Quality assurance procedures were in place at each data collection step (e.g., address updating, printing, package assembly and mailing, questionnaire receipt, data entry, coding, CATI, and post-data collection processing).

Mode. About 79% of the participants completed the survey by Web, 12% by mail, and 9% by CATI.

Response rates. Response rates were calculated on complete responses, that is, from instruments with responses to all critical items. Critical items are those containing information needed to report labor force participation (including employment status, job title, and job description), college education (including degree type, degree date, and field of study), and location of residency on the reference date. There were 83,672 complete questionnaires. The overall unweighted response rate was 70%; the weighted response rate was 71%.

Data editing. Complete case data were captured and edited under the three separate data collection modes for the 2017 NSCG. The Web survey captured most of the survey responses and had internal editing controls where appropriate. A computer-assisted data entry (CADE) system was used to process the mail paper forms. Complete responses from the three separate modes were merged for subsequent coding, editing, and cleaning necessary to create an analytical database.

Following established NCSES guidelines for coding NSCG survey data, including verbatim responses, staff were trained in conducting a standardized review and coding of occupation and education information, certifications, "other/specify" verbatim responses, state and country geographical information, and postsecondary institution information. For standardized coding of occupation (including auto-coding), the respondent's reported job title, duties and responsibilities, and other work-related information from the questionnaire were reviewed by specially trained coders who corrected known respondent self-reporting errors to obtain the best occupation codes. For standardized coding of field of study associated with any reported degree (including auto-coding), the respondent's reported department, degree level, and field of study information from the questionnaire were reviewed by specially trained coders who corrected known respondent self-reporting errors to obtain the best field of study codes.

Imputation. Logical imputation was primarily accomplished as part of editing. In the editing phase, the answer to a question with missing data was sometimes determined by the answer to another question. In some circumstances, editing procedures found inconsistent data that were blanked out and therefore subject to statistical imputation.

The item nonresponse rates reflect data missing after logical imputation or editing but before statistical imputation. For key employment items—such as employment status, sector of employment, and primary work activity—the item nonresponse rates ranged from 0.00% to 1.07%. Nonresponse to questions deemed sensitive was higher: nonresponse to salary and earned income was 4.54% and 6.30%, respectively, for the new sample members and 5.33% and 7.43%, respectively, for the returning members. Personal demographic data of the new sample members had variable item nonresponse rates, with sex at 0.00%, birth year at 0.02%, marital status at 0.24%, citizenship at 0.13%, ethnicity at 0.92%, and race at 2.33%. The nonresponse rates for returning sample members were 1.52% for marital status and 1.41% for citizenship.

Item nonresponse was typically addressed using statistical imputation methods. Most NSCG variables were subjected to hot deck imputation, with each variable having its own class and sort variables chosen by regression modeling to identify nearest neighbors for imputed information. For some variables, there was no set of class and sort variables that was reliably related to or suitable for predicting the missing value, such as day of birth. In these instances, random imputation was used, so that the distribution of imputed values was similar to the distribution of reported values without using class or sort variables.

Imputation was not performed on critical items or on verbatim-based variables. In addition, for some missing demographic information, the NSCG imported the corresponding data from the ACS, which had performed its own imputation.

Weighting. Because the NSCG is based on a complex sampling design and subject to nonresponse bias, sampling weights were created for each respondent to support unbiased population estimates. The final analysis weights account for several factors, including the following:

Adjustments to account for undercoverage of recent immigrants and undercoverage of recent degree-earners
Adjustment for incorrect names or incomplete address information on the sampling frame
Differential sampling rates
Adjustments to account for inability to locate sample cases and unit nonresponse
Adjustments to align the sample distribution with population controls
Trimming of extreme weights
Overlap procedures to convert weights that reflect the population of each individual frame (2009 ACS, 2011 ACS, 2013 ACS, and 2015 ACS) into a final sample weight that reflects the 2017 NSCG target population

The final sample weights enable data users to derive survey-based estimates of the NSCG target population. The variable name on the NSCG public use data files for the NSCG final sample weight is WTSURVY. More detailed information on weighting is contained in the 2017 NSCG Methodology Report.

Variance estimation. The successive difference replication method (SDRM) was used to develop replicate weights for variance estimation. The theoretical basis for the SDRM is described in Wolter (1984) and in Fay and Train (1995). As with any replication method, successive difference replication involves constructing a number of subsamples (replicates) from the full sample and computing the statistic of interest for each replicate. The mean square error of the replicate estimates around their corresponding full sample estimate provides an estimate of the sampling variance of the statistic of interest. The 2017 NSCG produced 320 sets of replicate weights. Please contact the NSCG Project Officer to obtain the NSCG replicate weights and the replicate weight user guide.

Disclosure protection. To protect against the disclosure of confidential information provided by NSCG respondents, the estimates presented in NSCG data tables are rounded to the nearest 1,000. Percentages were calculated based on unrounded estimates.

Data table cell values based on counts of respondents that fall below a predetermined threshold are deemed to be sensitive to potential disclosure, and the letter "D" indicates this type of suppression in a table cell.

Survey Quality Measures

Sampling error. NSCG estimates are subject to sampling errors. Estimates of sampling errors associated with this survey were calculated using replicate weights and are included in each table of estimates. Data table estimates with coefficients of variation (that is, the estimate divided by the standard error) that exceed a predetermined threshold are deemed unreliable and are suppressed. The letter "S" indicates this type of suppression in a table cell.

Coverage error. Coverage error occurs in sample estimates when the sampling frame does not accurately represent the target population and is a type of nonsampling error. Any missed housing units or missed individuals within sample households in the ACS would create undercoverage in the NSCG. Additional undercoverage errors may exist because of self-reporting errors in the NSCG sampling frame that led to incorrect classification of individuals as not having a bachelor's degree or higher when in fact they held such a degree.

Nonresponse error. The weighted response rate for the 2017 NSCG was 71%; the unweighted response rate was 70%. Analyses of NSCG nonresponse trends were used to develop nonresponse weighting adjustments to minimize the potential for nonresponse bias in the NSCG estimates. A hot deck imputation method was used to compensate for item nonresponse.

Measurement error. The NSCG is subject to reporting errors from differences in interpretation of questions and by modality (Web, mail, CATI). To reduce measurement errors, the NSCG questionnaire items were pretested in focus groups and cognitive interviews.

Data Comparability and Changes

Data comparability. Year-to-year comparisons of the nation's college-educated population can be made among the 1993, 2003, 2010, 2013, 2015, and 2017 survey cycles because many of the core questions remained the same. Because the 1995, 1997, 1999, 2006, and 2008 surveys do not provide full coverage of the nation's college-educated population, any comparison between these cycles and other cycles should be limited to those individuals educated or employed in S&E fields.

Small but notable differences exist across some survey cycles, however, such as the collection of occupation and education data based on more recent taxonomies. Also, because of the use of different reference months in some survey cycles, seasonal differences may occur when making comparisons across years. Thus, use caution when interpreting cross-cycle comparisons.

There is overlap in the cases included in the 2010, 2013, 2015, and 2017 NSCG. This sample overlap consists of cases that originated in the 2009, 2011, or 2013 ACS. The overlap among cases allows for the ability to conduct longitudinal analysis of this subset of the NSCG sample. To link cases on the NSCG public use data files across survey years 2010, 2013, or 2015, the REFID (reference identifier) unique identification variable can be used. To aid in this longitudinal analysis, single-frame weights are available for each survey year that allow for the evaluation of estimates from each frame independently. Please contact the NSCG Project Officer to obtain the single-frame weights.

To reduce the risk of disclosure of confidential information, the REFID variable is not included in the 2017 NSCG public use file. Hence, it is not possible to link 2017 NSCG data to prior survey years. In place of REFID, the OBSNUM (observation number) variable was added to the 2017 NSCG public use file to serve as a single-cycle case identifier.

Changes in survey coverage and population.

Prior to the 2010 survey cycle, the NSCG used the decennial census long form as its sample frame to select new sample cases. This long form-based sample was interviewed every two to three years throughout the decade. With the long form occurring only once every decade, it was not possible to refresh the NSCG sample during the decade. As a result, the long form-based NSCG sample suffered from increasing undercoverage of recent graduates and recent immigrants throughout the 1990 and 2000 decades. Furthermore, by only following the S&E workforce population in subsequent survey cycles, the NSCG was not able to provide complete information on people entering or exiting the S&E workforce. To reduce the shortfall in recent graduates, the NSCG supplemented its sample each cycle with recent bachelor's and master's degree earners in S&E fields from another NCSES survey, the National Survey of Recent College Graduates (NSRCG).
After the 2000 decennial census, the Census Bureau discontinued the long form and introduced the ACS. In response to this change, NCSES commissioned a Committee on National Statistics (CNSTAT) panel to examine proposed sample design options for the NSCG based on the ACS as opposed to the long form [1]. Using recommendations from the CNSTAT panel, NCSES introduced a new rotating panel sample design for the NSCG in the 2010 survey cycle. In this rotating panel design, the NSCG selects a new sample every survey cycle from the most recent ACS and follows the cases for four survey cycles. Every new panel receives a baseline survey interview and three biennial follow-up interviews. After the fourth cycle, the cases rotate out of the NSCG and are replaced by a newly selected panel of cases from the most recent ACS. Through this rotating panel design and the selection of a new sample every NSCG survey cycle, the NSCG is able to address the undercoverage of recent graduates and recent immigrants that had existed in the past. In addition, by providing coverage of the college graduates population every cycle, the NSCG can track the transition of college graduates in and out of the S&E workforce.
Beginning in the 2013 cycle, NCSES discontinued the NSRCG and expanded the sample of young college graduates in the NSCG. The 2010 NSRCG sample cases were carried forward through the 2015 NSCG survey cycle as the NSCG transitioned to a design that included a young college graduate oversample.
The 2017 NSCG marks the full implementation of the four-panel rotating panel design that began with the 2010 NSCG (see figure 1). The 2017 NSCG includes 124,000 sample cases drawn from the following:

Returning sample from the 2010 NSCG who were originally selected from the U.S. Census Bureau's 2009 ACS
Returning sample from the 2013 NSCG who were originally selected from the 2011 ACS
Returning sample from the 2015 NSCG who were originally selected from the 2013 ACS
New sample selected from the 2015 ACS

Approximately 76,000 cases were selected from the returning sample members, who originated in the 2009, 2011, or 2013 ACS. These returning sample members were included for one of the three biennial follow-up interviews that are part of the rotating panel design. For the baseline survey interview, about 48,000 new sample cases were selected from the 2015 ACS.

FIGURE 1. Rotating panel deign and sample sizes for the National Survey of College Graduates: 2010-19

Changes in questionnaire.

2017. The 2017 NSCG questionnaire includes two new questions about U.S. military service that are asked on the ACS: veteran status and period of service.
2015. The 2015 NSCG questionnaire added a section on professional certifications and licenses.
2013. The 2013 NSCG questionnaire added questions about attendance at community colleges, amounts borrowed to finance undergraduate and graduate degrees, and sources of financial support for undergraduate and graduate degrees. The 2013 questionnaire also differed from the 2010 questionnaire by splitting the first response category for the indicator of sample member location on the survey reference date into two categories. "United States, Puerto Rico, or another U.S. territory" became "United States or Puerto Rico" and "Another U.S. territory."
2010. The 2010 NSCG questionnaire added items on components of job satisfaction, importance of job benefits, year of retirement, whether employer is a new business, and degree of difficulty concentrating, remembering, or making decisions.

Changes in reporting procedures.

In past years, NSCG data were combined with data from the SDR and the NSRCG to form the Scientists and Engineers Statistical Data System (SESTAT). The last series of tables produced from SESTAT used 2013 NSCG data. Since then, NSCG data have been used in numerous tables for NCSES's two Congressionally mandated reports (Science and Engineering Indicators and Women, Minorities, and Persons with Disabilities in Science and Engineering).

Changes in microdata.

2017. The 2017 NSCG microdata does not include the unique personal identifier variable REFID, so it cannot be linked to other NSCG data files. Instead, it includes the variable OBSNUM as a cycle-specific case identifier.

Definitions

Full-time and part-time employment. Full-time (working 35 hours or more per week) and part-time (working less than 35 hours per week) employment status is for the principal job only and not for all jobs held in the labor force. For example, an individual could work part time in his or her principal job but full time in the labor force.

Occupation data. The occupational classification of the respondent was based on his or her principal job (including job title) held during the reference week—or on his or her last job held, if not employed in the reference week (survey questions A5 and A6 as well as A16 and A17). Also used in the occupational classification was a respondent-selected job code (survey questions A7 and A18). (See table A-1 for a list and classification of occupations reported in the NSCG.)

Race and ethnicity. Ethnicity is defined as Hispanic or Latino or not Hispanic or Latino. Values for those selecting a single race include American Indian or Alaska Native, Asian, black or African American, Native Hawaiian or Other Pacific Islander, and white. Those persons who report more than one race and who are not of Hispanic or Latino ethnicity also have a separate value.

Salary. Median annual salaries are reported for the principal job, rounded to the nearest $1,000, and computed for full-time employed scientists and engineers. For individuals employed by educational institutions, no accommodation was made to convert academic year salaries to calendar year salaries.

Sector of employment. Employment sector is a derived variable based on responses to questionnaire items A13, A14, and A15. In the data tables, the category 4-year educational institutions includes 4-year colleges or universities, medical schools (including university-affiliated hospitals or medical centers), and university-affiliated research institutes. Other educational institutions include 2-year colleges, community colleges, technical institutes, precollege institutions, and other educational institutions (which respondents reported verbatim in the survey questionnaire). Private, for-profit includes respondents who were self-employed in an incorporated business. Self-employed includes respondents who were self-employed or were a business owner in a non-incorporated business.

Underrepresented minority. This category comprises three racial or ethnic minority groups (blacks or African Americans, Hispanics or Latinos, and American Indians or Alaska Natives) whose representation in S&E education or employment is smaller than their representation in the U.S. population.

Technical Table

Table	Title	Excel	PDF
A-1	Crosswalk of occupations used in the National Survey of College Graduates	View Excel	View PDF

Note(s)

[1] National Research Council. (2008). Using the American Community Survey for the National Science Foundation's Science and Engineering Workforce Statistics Programs. Panel on Assessing the Benefits of the American Community Survey for the NSF Division of Science Resources Statistics, Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.