# Appendix | Methodology

## Types of Data Sources

Much of the data cited in Indicators comes from surveys. Surveys strive to measure characteristics of target populations. To generalize survey results correctly to the population of interest, a survey’s target population must be rigorously defined, and the criteria determining membership in the population must be applied consistently in determining which units to include in the survey. After a survey’s target population has been defined, the next step is to establish a list of all members of that target population (i.e., a sampling frame). Members of the population must be selected from this list using accepted statistical methods so that it will be possible to generalize from the sample to the population as a whole. Surveys sometimes sample from lists that, to varying extents, omit members of the target population because complete lists are typically unavailable.

Some surveys are censuses (also known as universe surveys), in which the survey attempts to obtain data for all population units. The decennial census, in which the target population is all U.S. residents, is the most familiar census survey. Indicators uses data from the Survey of Earned Doctorates, an annual census of individuals who earn research doctorates from accredited U.S. institutions, for information about the numbers and characteristics of new U.S. doctorate holders.

Other surveys are sample surveys, in which data are obtained for only a portion of the population units. Samples can be drawn using either probability-based or nonprobability-based sampling procedures. A sample is a probability sample if each unit in the sampling frame has a known, nonzero probability of being selected for the sample. Probability samples are preferred because their use allows the computation of measures of precision and the subsequent statistical evaluation of inferences about the survey population. An example of a sample survey is the National Survey of College Graduates (NSCG). The NSCG gathers data on the nation’s college graduates, with particular focus on those educated or employed in an S&E field. In nonprobability sampling, the sample is drawn with an unknown probability of selection. Polls that elicit responses from self-selected individuals, such as opt-in Internet surveys or phone-in polls, are examples of nonprobability sample surveys. Except for some Asian surveys referenced in Chapter 7, sample surveys included in Indicators use probability sampling.

Surveys may be conducted of individuals or of organizations, such as businesses, universities, or government agencies. Surveys of individuals are referred to as demographic surveys. Surveys of organizations are often referred to as establishment surveys. An example of an establishment survey used in Indicators is the Higher Education Research and Development Survey.

Surveys may be longitudinal or cross-sectional. In a longitudinal survey, the same sample members are surveyed repeatedly over time. The primary purpose of longitudinal surveys is to investigate changes over time. The Survey of Doctorate Recipients is a sample survey of individuals who received research doctorates from U.S. institutions. The survey was originally designed to produce cross-sectional estimates, but the data have also been adapted by researchers to conduct longitudinal studies. Indicators uses results from this survey to analyze the careers of doctorate holders.

Cross-sectional surveys provide a snapshot at a given point in time. When conducted periodically, cross-sectional surveys produce repeated snapshots of a population, also enabling analysis of how the population changes over time. However, because the same individuals or organizations are not included in each survey cycle, cross-sectional surveys cannot, in general, track changes for specific individuals or organizations. National and international assessments of student achievement in K–12 education, such as those discussed in Chapter 1, are examples of repeated cross-sectional surveys. Most of the surveys cited in Indicators are conducted periodically, although the frequency with which they are conducted varies.

Surveys can be self- or interviewer-administered, and they can be conducted using a variety of modes (e.g., postal mail, telephone, the Web, e-mail, or in person). Many surveys are conducted using more than one mode. The NSCG is an example of a multimode survey. It is conducted primarily via the Web; potential participants who do not respond to the questionnaire are contacted via telephone.

Some of the data in Indicators come from administrative records (data collected for the purpose of administering various programs). Examples of data drawn directly from administrative records in Indicators include patent data from the records of government patent offices; bibliometric data on publications in S&E journals, compiled from information collected and published by the journals themselves; and data on foreign S&E workers temporarily in the United States, drawn from the administrative records of immigration agencies.

Many of the establishment surveys that Indicators uses depend heavily, although indirectly, on administrative records. Universities and corporations that respond to surveys about their R&D activities often use administrative records developed for internal management or income tax reporting purposes to respond to these surveys.