
NSF Org: |
SES Division of Social and Economic Sciences |
Recipient: |
|
Initial Amendment Date: | August 2, 2019 |
Latest Amendment Date: | August 2, 2019 |
Award Number: | 1926424 |
Award Instrument: | Standard Grant |
Program Manager: |
Joseph Whitmeyer
jwhitmey@nsf.gov (703)292-7808 SES Division of Social and Economic Sciences SBE Directorate for Social, Behavioral and Economic Sciences |
Start Date: | September 1, 2019 |
End Date: | August 31, 2022 (Estimated) |
Total Intended Award Amount: | $310,434.00 |
Total Awarded Amount to Date: | $310,434.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 (617)495-5501 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1737 Cambridge Street Cambridge MA US 02138-3016 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Data Infrastructure |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.075 |
ABSTRACT
In this project, a set of tools will be built for in-depth analysis of survey data, making use of and extending statistical methods for estimation for small subgroups. Classical methods for surveys are focused on aggregate population-level estimates but we can learn much more using small-area estimation. The goal of this project is to build a user-accessible platform for modeling and visualizing survey data that would give estimates for arbitrary subgroups of the population, along with visualization tools to display estimates of interest. The model would be fit in Stan, a state-of-the-art open-source platform for Bayesian inference, and implemented for the Cooperative Congressional Election Survey (CCES). An example of the sort of analysis that could be performed using these methods is a study of how demographic gaps in voting vary by age, education, and state.
The statistical method of multilevel regression and poststratification (MRP) allows inferences for narrow slices of the population. In the terminology of survey methods, MRP is "model-based" in that it uses regression to do partial pooling (smoothing) for small areas and demographic slices, and it is "design-based" in adjusting for variables such as age, sex, ethnicity, and education that are predictive of inclusion in the sample. One reason for extracting inferences for population subgroups using a flexible tool rather than one-time analyses is that key variables can change over time. Multilevel modeling gives the flexibility to adjust for large numbers of predictors, which makes poststratification more effective. As a bonus, this modeling and adjustment enables extraction of estimates of average survey responses for small slices of the population, which can correspond to the very sorts of inferences that consumers particularly want, and which typically are unavailable from surveys without huge sample sizes.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The rise of large sample surveys has opened door to small area estimation using individual-level survey data. Social sciences have relied primarily on aggregate data models, such as ecological regression, ecological inference, and neighborhood models, to estimate social and political behavior of different types of people at the municipal, county, district, and even state-levels. Bayesian methods for small area estimation, such as multiple regression post-stratification (MRP), incorporate individual-level survey data with aggregate data to improve the accuracy of estimates that use aggregate data alone and the precision of estimates that use survey data alone. This grant has developed the application of MRP to the Cooperative Election Study (CES), an NSF-sponsored survey and one of the largest surveys in behavioral, political, and social sciences.
Specifically, this grant has sought to solve three technical challenges in developing MRP and similar tools and, in doing so, develop Bayesian weights for one of the most widely used surveys in social sciences, the CES. The first challenge is one of defining the target population to which the CES or any survey can be calibrated. There is no publicly available database that defines the population to which survey data can be weighted in terms of the five key demographic characteristics to which surveys are weighted, i.e., age, gender, race and ethnicity, education, and area. This challenge arises because the tables extracted from the Census do not allow a complete cross-tabulation of all of five target variables to which surveys are calibrated. This project developed a public-use dataset to which any survey can be weighted. In addition, this project developed a method of synthetic weighting technique that generates the population targets and associated computer programs. These data and programs are public-use and can be deployed in conjunction with any survey. They are distributed through the Dataverse (https://dataverse.org/).
A second challenge is the lack of publicly-available code for the implementation of MRP for commonly used surveys, such as the CES. Past applications of this method have developed code that is specific to a particular research project and is proprietary. This grant supported the development of publicly-available computer programs, distributed through the CES project website (https://cces.gov.harvard.edu/explore) and also archived and distributed through the Dataverse. This code allows researchers to specify the aggregate data structure to which the survey data are weighted and the specification of the aggregate and individual-level models.
Third, MRP-generated survey weights have not been available for public use. The CES, the American National Election Study, and the General Social Survey are distributed with conventional survey weights. Such weights are designed to make samples representative of a national population, but such weights are not valid at areas, such as counties or states. This grant has supported the development of externally-validated weights generated using MRP that allow for efficient estimation at the Congressional District (CD), state, and national levels. Validation of the weights was accomplished by comparing the resulting estimates to known quantities from actual voting behavior at the level of the CD and state. These weights are distributed through the CES website and are archived and distributed through the Dataverse.
The value of these three technical developments is to allow researchers to use surveys and aggregate data to provide valid and efficient small area estimates of social and political behaviors. These data are already being used to examine who votes and where there are (or are not) political divisions among demographic groups in local areas, such as counties, CDs, and states. Research from this grant has appeared in the American Political Science Review and other prominent journals.
In addition to addressing the substantial technical challenges in Bayesian weighting methods, this grant has provided valuable career development to young researchers. It has allowed for training in Bayesian survey methods and weighting techniques to a dozen social scientists, who are using these methods in their PhD thesis work.
Last Modified: 12/28/2022
Modified by: Stephen Ansolabehere
Please report errors in award information by writing to: awardsearch@nsf.gov.