This document reports results from a workshop on "Methodological Advances and the Human Capital Initiative" held 12 July 1996 at the National Science Foundation. The views and comments contained in this document are not necessarily those of the National Science Foundation, but are instead exclusively those of the workshop participants. For further information or additional copies of this report, contact Cheryl L. Eavey, National Science Foundation, at (703) 306-1729, or CEAVEY@NSF.GOV.
The Human Capital Initiative offers researchers a unique opportunity to address important substantive issues, ones that often require new ways of conceptualizing or combining data, new or modified methods of analysis or new formal models for relating the constructs of interest to observable data. The HCI thus provides researchers a vehicle for advancing the measurement, methodological, and statistical components of their disciplines. Such advances, made in the context of one or more disciplines, open up issues not previously amenable to empirical or theoretical analysis.
The Methodology, Measurement, and Statistics (MMS) Program of the National Science Foundation invites proposals that embed advances in methodology, data analysis, and/or formal modeling within the context of well-justified substantive research issues, as well as more generally. MMS recognizes that methodological developments relevant for human capital issues may require substantial background, both substantively and methodologically. MMS thus encourages collaborations across the social, behavioral, economic, and statistical sciences. Proposals for conferences and/or workshops on methodological topics appropriate for addressing HCI issues also are welcome.
In order to stimulate discussion of methodological needs in human capital research and to identify potential areas of research, the Methodology, Measurement, and Statistics Program convened a workshop on 12 July 1996 to address the topic "Methodological Advances and the Human Capital Initiative." Workshop discussions were informed by and built upon the agenda for HCI research as described in the various NSF brochures and announcements. The document, Investing in Human Resources: A Strategic Plan for the Human Capital Initiative, outlines a strategy for HCI research "designed to increase understanding of the nature and causes of existing problems and to evaluate the effectiveness of policies aimed at improving the human resources of America's citizens" (NSF, 1994). In addition to formulating research agendas for HCI's six substantive areas, the report also briefly suggests data and methodological needs. Data needs identified include the extension of longitudinal data sets, the collection of data from multiple sources, and embedded studies that merge alternative forms of empirical analysis. Methodological needs identified include expanded methodologies for dynamic modeling and models that link micro-level behavior of individuals with macro-level institutions and environments.
MMS workshop participants agreed with the focus on understanding causal relations and on the importance of the data and methodological needs identified in the strategic plan. Indeed, the centrality of longitudinal data for addressing many important human capital questions led to long discussions of design and analysis issues relevant for studies based on panel data. Ultimately, discussions converged on the following topics:
1) Design and Analysis of Longitudinal Data
2) Data Integration
3) Establishment Surveys
4) Data Collection Procedures in Survey Research
5) Survey Research and Measurement Issues
6) Methods for Linking Diverse Approaches to
Understanding Behavior
Examples of specific methodological questions related to each of these topics are given below. These topics are intended to be illustrative of some important areas of methodological and/or statistical research relevant to the MMS Program and consistent with the goals and strategic plan of the Human Capital Initiative.
Most longitudinal studies approach design issues that follow initial sample selection on an ad hoc basis; thus, we have limited cumulative knowledge and systematic study to bring to bear on new investigations. How and when should attrition be modelled? Should data always be gathered at equally-spaced time intervals? As data are gathered over time, the units of measurement change in a dynamic fashion. Designs for continued data collection inevitably require choices that have serious implications for analysis and inference. MMS welcomes proposals addressing questions of design in longitudinal studies.
Beyond questions of design, the analysis of longitudinal data invariably leads to specific methodological problems. Advances on the topics identified below would enhance the value of longitudinal data for addressing complex human capital questions.
Nonparametric methods. Large longitudinal data sets, some containing hundreds of thousands of person-years in observations, often are useful for addressing issues related to HCI. For example, major datasets on labor issues, such as the National Longitudinal Survey (NLS) and the Panel Study of Income Dynamics (PSID), can be used to study the number and duration of poverty spells for individuals with different levels of education. Motivated by the sensitivity of results to specific functional form assumptions, recent research has developed less restrictive procedures for use in large samples. These include classical smoothing procedures, neural networks, and Bayesian hierarchical models. MMS welcomes research that further refines and applies such nonparametric methods to human capital issues.
Discrete choice. Many individual decisions in the human capital accumulation process are discrete; for example, decisions to leave or return to school, fertility, etc. Recent advances have made it possible to study simple models of sequences of discrete choice over time, such as labor market participation. Methods for inference in structural models of dynamic decision making are a promising line of research for understanding the human capital investment process.
Individual, cohort, and age effects. Several important policy questions address changes over time in individuals' responses to opportunities available to them, such as propensities to invest in education. Separating secular changes from individual heterogeneity and changes of the life cycle raise specific methodological questions. The ability to address these policy questions is determined, in large part, by advances in methods that address these issues.
Small-area data estimation. As analysts strive to explain the decisions made by economic agents, there is increasing pressure to move from macro to micro, and ultimately, individual-level scales of analyses. Moving to higher levels of resolution often requires estimating small-area attribute values (for example of households or places of work) from larger units of analysis. We are just beginning to understand how to perform these estimations.
Estimating missing values. A particularly pressing problem is the estimation of missing values for individual-level data that generates samples with large numbers of zeros due to privacy concerns issues and/or missing measurements.
Boundary value estimation and/or transformations. The artificial truncation of a spatial process presents particular concerns of the value of the recorded measurements in the areal units at the boundary. This is similar to the problem of truncation in event history analysis. Corrections that have been proposed are arbitrary in theory and computationally intensive.
Extraction and exploration techniques. As different kinds of agencies move to collect data in a geo-referenced framework, researchers will have to deal with the computational and storage burden this referencing entails. Even when researchers have no interest in maintaining the geo- referencing, they may require techniques to extract the data from the data set. Further, faced with the volume of data described above, they may need to develop new tools for data visualization as a means of exploring such data and developing preliminary research questions.
"Establishment surveys," in which the units studied are organizations, such as workplaces, schools, hospitals, or agencies of the government, are natural vehicles for addressing such questions. In such surveys, one or more individual informants provide data on behalf of the establishment. Establishment survey data are sometimes integrated with individual-level data on organizational members, with archival data on the places, or with industries that constitute an establishment's setting or competitive environment.
Establishment surveys have long been used as components of systems of national accounts, for the estimation of population totals such as employment or output levels. Scholars in the social sciences are now turning to establishment surveys for different purposes -- to develop, for example, knowledge about work organizations, school processes and effects, the development and diffusion of human resources practices, and innovation, as well as to study schools and work organizations as contexts for learning and skills acquisition.
The methodological literature on establishment surveys is much less extensive than that on surveys of individuals, and many methodological problems involved in such studies have been little-studied or are poorly understood. Other problems result from the changing purposes of establishment surveys and the changing nature of organizational phenomena. Problems requiring attention include, but are not limited to, the following:
Such effects of design and context might be considered as introducing their own components of variance, which are part of the uncertainty of the resulting information, but which is not captured in estimates of sampling standard error. That is, if we consider several surveys which made different detailed design choices, those surveys would produce estimates that are much more variable than would be expected due to sampling errors alone.
Policy decisions typically are concerned with questions that are broader than the particularities of a single data collection design. For example, policy makers want to know how social class is related to reading achievement, not how social class measured in a specific way is related to a score on a particular set of achievement test items. The standard method of calculating uncertainty in information and policy analyses is based on sampling uncertainty; but for the reasons outlined above, sampling standard errors alone provide an underestimate of the uncertainty in the data and in summaries produced from it.
Studies that provide insight about more realistic estimates of uncertainty of information produced by human capital research would be highly desirable. Such studies might include systematic investigations of variations in procedures and models for reasonable distribution of variation in those procedures. Studies of actual replications that have already been conducted might serve as "natural experiments." Ideas for general methods that might be broadly applicable would be particularly interesting.
Questions include, for example, whether the current concept of a household adequately captures the diverse living arrangements of Americans, including the phenomena of blended families, children in joint custody, or even the homeless. How can researchers model appropriately the multiple ethnic identities of Americans and discover and test the salience of particular categories? How do concepts and classification schemes taken from existing data systems affect the research process of the secondary data user? What new concepts and classification systems need to be developed to meet the needs of the Human Capital Initiative? How can or should one develop common measurements to capture the experience, for example, at 'home' and 'work'? What measurement schemes are required for units of analysis at different levels of aggregation, for example, poor people versus poor neighborhoods?
This project uses the Gibbs sampler to develop and implement estimation strategies that will enable researchers to obtain robust estimates of parameters and appropriate intervals in applications of hierarchical models with dichotomous outcomes in small-sample, social research settings. Guidelines for proper implementation and use of these strategies will be developed through analyses of a series of simulated data sets and through analyses of the data from two studies: A multi-site evaluation of a dropout prevention initiative, and an NSF-funded study of the effects of different mathematics.
SBR-9631387: "Project to Revise the Historical Labor Statistics of the
United States"
Susan B. Carter, University of California/Riverside
This project will revise Chapter D (Labor) of the United States Census Bureau's Historical Statistics of the United States. Historical Statistics is a massive, two-volume compendium of 54 chapters on topics touching all of the social, behavioral, humanistic, and natural sciences. This award assists in a major collaborative effort to produce an updated, revised, expanded, and electronically-accessible "millennial edition" of Historical Statistics. In addition to a completed revision of Chapter D, this project will develop a protocol for the revision of the remaining chapters of Historical Statistics.
SBR-9515136: "Improving Within-School and School-Community
Systemic Linkages for At-Risk Students"
Kenneth K. Wong, University of Chicago
Larry Hedges, University of Chicago
This project investigates empirically the impact of recent federal reform initiatives legislated by Title I of the Elementary and Secondary Education Act on the narrowing of the achievement gap between educationally at-risk students and their more advantaged peers. The project makes use of the comprehensive, Congressionally mandated Prospects data files, which consist of standardized reading and math achievement scores for a nationally-representative sample of nearly 40,000 students, and detailed information regarding the students themselves, and their schools, classrooms, and families. The investigators' analyses will generate national estimates of the extent and intensity of these reform activities, and will produce empirically-based paradigms for the improvement of federal Title I programs and the schools that serve at-risk students.
SBR-9423018: "Causal Inference Applied to Income Effects"
Donald B. Rubin, Harvard University
Guido Imbens, Harvard University
The objective of this project is to measure validly the treatment effects of giving additional income to low and middle income families. The study uses the Massachusetts State Lottery as a natural experiment in which some families are randomly assigned additional income and some are not. Subjects in both the treatment and control group will be surveyed by mail and by phone. The use of this natural experiment will allow the researchers to make valid inferences about the effects of additional income on these families using a rigorous definition of causality. The data from the surveys will be linked to earnings records from the Social Security Administration.
Carl Amrhein
Department of Geography
University of Toronto
Cheryl Eavey
Methodology, Measurement, and
Statistics Program
National Science Foundation
Stephen Fienberg
Department of Statistics
Carnegie Mellon University
John Geweke
Department of Economics
University of Minnesota
Larry Hedges
Department of Education
University of Chicago
Charles Manski
Department of Economics
University of Wisconsin
Peter Marsden
Department of Sociology
Harvard University
John Sprague
Department of Political Science
Washington University
Thomas Wallsten
Department of Psychology
University of North Carolina, Chapel Hill
- Research on methodological aspects of new or existing procedures for data collection; research to evaluate or compare existing data bases and data collection procedures; and the collection of unique databases with cross disciplinary implications, especially when paired with developments in measurement or methodology.
- The methodological infrastructure of social and behavioral research.
Up-to-date information on the program, including recent awards lists and announcements of special funding opportunities, is available on the MMS Home Page:
The Foundation provides awards for research and education in the sciences and engineering. The awardee is wholly responsible for the conduct of such research and preparation of the results for publication. The Foundation, therefore, does not assume responsibility for the research findings or their interpretation.
The Foundation welcomes proposals from all qualified scientists and engineers and strongly encourages women, minorities, and persons with disabilities to compete fully in any of the research and education related programs described here. In accordance with federal statutes, regulations, and NSF policies, no person on grounds of race, color, age, sex, national origin, or disability shall be excluded from participation in, be denied the benefits of, or be subject to discrimination under any program or activity receiving financial assistance from the National Science Foundation.
Facilitation Awards for Scientists and Engineers with Disabilities (FASED) provide funding for special assistance or equipment to enable persons with disabilities (investigators and other staff, including student research assistants) to work on NSF projects. See the program announcement or contact the program coordinator at (703) 306-1636.
The National Science Foundation has TDD (Telephonic Device for the Deaf) capability, which enables individuals with hearing impairment to communicate with the Foundation about NSF programs, employment, or general information. To access NSF TDD dial (703) 306-0090; for FIRS, 1-800-877-8339.