Award Abstract # 1926424
RIDIR: Collaborative Research: Bayesian analytical tools to improve survey estimates for subpopulations and small areas

NSF Org: SES
Division of Social and Economic Sciences
Recipient: PRESIDENT AND FELLOWS OF HARVARD COLLEGE
Initial Amendment Date: August 2, 2019
Latest Amendment Date: August 2, 2019
Award Number: 1926424
Award Instrument: Standard Grant
Program Manager: Joseph Whitmeyer
jwhitmey@nsf.gov
 (703)292-7808
SES
 Division of Social and Economic Sciences
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: September 1, 2019
End Date: August 31, 2022 (Estimated)
Total Intended Award Amount: $310,434.00
Total Awarded Amount to Date: $310,434.00
Funds Obligated to Date: FY 2019 = $310,434.00
History of Investigator:
  • Stephen Ansolabehere (Principal Investigator)
    sda@gov.harvard.edu
Recipient Sponsored Research Office: Harvard University
1033 MASSACHUSETTS AVE STE 3
CAMBRIDGE
MA  US  02138-5366
(617)495-5501
Sponsor Congressional District: 05
Primary Place of Performance: Harvard University
1737 Cambridge Street
Cambridge
MA  US  02138-3016
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): LN53LCFJFL45
Parent UEI:
NSF Program(s): Data Infrastructure
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 829400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

In this project, a set of tools will be built for in-depth analysis of survey data, making use of and extending statistical methods for estimation for small subgroups. Classical methods for surveys are focused on aggregate population-level estimates but we can learn much more using small-area estimation. The goal of this project is to build a user-accessible platform for modeling and visualizing survey data that would give estimates for arbitrary subgroups of the population, along with visualization tools to display estimates of interest. The model would be fit in Stan, a state-of-the-art open-source platform for Bayesian inference, and implemented for the Cooperative Congressional Election Survey (CCES). An example of the sort of analysis that could be performed using these methods is a study of how demographic gaps in voting vary by age, education, and state.

The statistical method of multilevel regression and poststratification (MRP) allows inferences for narrow slices of the population. In the terminology of survey methods, MRP is "model-based" in that it uses regression to do partial pooling (smoothing) for small areas and demographic slices, and it is "design-based" in adjusting for variables such as age, sex, ethnicity, and education that are predictive of inclusion in the sample. One reason for extracting inferences for population subgroups using a flexible tool rather than one-time analyses is that key variables can change over time. Multilevel modeling gives the flexibility to adjust for large numbers of predictors, which makes poststratification more effective. As a bonus, this modeling and adjustment enables extraction of estimates of average survey responses for small slices of the population, which can correspond to the very sorts of inferences that consumers particularly want, and which typically are unavailable from surveys without huge sample sizes.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The rise of large sample surveys has opened door to small area estimation using individual-level survey data.  Social sciences have relied primarily on aggregate data models, such as ecological regression, ecological inference, and neighborhood models, to estimate social and political behavior of different types of people at the municipal, county, district, and even state-levels.  Bayesian methods for small area estimation, such as multiple regression post-stratification (MRP), incorporate individual-level survey data with aggregate data to improve the accuracy of estimates that use aggregate data alone and the precision of estimates that use survey data alone.  This grant has developed the application of MRP to the Cooperative Election Study (CES), an NSF-sponsored survey and one of the largest surveys in behavioral, political, and social sciences.

 Specifically, this grant has sought to solve three technical challenges in developing MRP and similar tools and, in doing so, develop Bayesian weights for one of the most widely used surveys in social sciences, the CES.   The first challenge is one of defining the target population to which the CES or any survey can be calibrated.  There is no publicly available database that defines the population to which survey data can be weighted in terms of the five key demographic characteristics to which surveys are weighted, i.e., age, gender, race and ethnicity, education, and area.   This challenge arises because the tables extracted from the Census do not allow a complete cross-tabulation of all of five target variables to which surveys are calibrated. This project developed a public-use dataset to which any survey can be weighted.  In addition, this project developed a method of synthetic weighting technique that generates the population targets and associated computer programs.  These data and programs are public-use and can be deployed in conjunction with any survey. They are distributed through the Dataverse (https://dataverse.org/).

A second challenge is the lack of publicly-available code for the implementation of MRP for commonly used surveys, such as the CES.  Past applications of this method have developed code that is specific to a particular research project and is proprietary.   This grant supported the development of publicly-available computer programs, distributed through the CES project website (https://cces.gov.harvard.edu/explore) and also archived and distributed through the Dataverse.  This code allows researchers to specify the aggregate data structure to which the survey data are weighted and the specification of the aggregate and individual-level models.

Third, MRP-generated survey weights have not been available for public use. The CES, the American National Election Study, and the General Social Survey are distributed with conventional survey weights.  Such weights are designed to make samples representative of a national population, but such weights are not valid at areas, such as counties or states. This grant has supported the development of externally-validated weights generated using MRP that allow for efficient estimation at the Congressional District (CD), state, and national levels. Validation of the weights was accomplished by comparing the resulting estimates to known quantities from actual voting behavior at the level of the CD and state.  These weights are distributed through the CES website and are archived and distributed through the Dataverse.

The value of these three technical developments is to allow researchers to use surveys and aggregate data to provide valid and efficient small area estimates of social and political behaviors.   These data are already being used to examine who votes and where there are (or are not) political divisions among demographic groups in local areas, such as counties, CDs, and states.  Research from this grant has appeared in the American Political Science Review and other prominent journals.

In addition to addressing the substantial technical challenges in Bayesian weighting methods, this grant has provided valuable career development to young researchers.  It has allowed for training in Bayesian survey methods and weighting techniques to a dozen social scientists, who are using these methods in their PhD thesis work. 


Last Modified: 12/28/2022
Modified by: Stephen Ansolabehere

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page