Email Print Share

News Release 97-023

Solution Found to Long-Standing Inconsistencies in Data Analysis


March 24, 1997

This material is available primarily for archival purposes. Telephone numbers or other contact information may be out of date; please see current contact information at media contacts.

Final Exam Question: 40% of children in a high school participated in a special college preparation program, and 40% of students from that high school went on to college. For 50 bonus points, what fraction of participants in the college prep program went on to college?

This is a trick question. Until now, the only way to be sure of the answer would be to violate confidentiality laws and track down the individual students.

Now, a National Science Foundation (NSF)-supported political scientist has a solution to a long-standing, consequential problem in social science methodology: how to learn about the behavior of individuals when the only information available is on groups.

The solution may have been found in a statistical method developed by Gary King, professor of government at Harvard University. His new algorithm for computer software is reported in a recently published book by Princeton University Press, A Solution To The Ecological Inference Problem: Reconstructing Individual Behavior From Aggregate Data.

King's new method may have a significant impact on a range of research problems, such as epidemiological studies of radon and lung cancer, market research on consumer behavior and implementation of the Voting Rights Act. The American Political Science Association has selected King to receive its Gosnell Award "for the best methodological work in political science in 1995-96" for his research on this subject.

"I expect Gary King's solution will contribute to the production of more accurate, insightful data analysis in a variety of research studies, leading to more informed policy-making and better understanding of our economy and society," Frank Scioli, director of NSF's political science research program, says.

Inferring individual behavior from statistics recorded about groups, known as the "ecological inference problem," was originally posed over 75 years ago. It was the first statistical problem encountered in the new field of political science. Scholars soon recognized the same problem in numerous other areas, and since then researchers have pursued a solution.

"Ecological inference is required whenever surveys are unavailable, unreliable or too expensive," says King. "Surveys cannot address most historical questions unless they are conducted then and there. They are also unreliable for studying controversial issues, such as racial politics, since respondents do not always report their opinions and behaviors accurately."

The ecological inference problem was originally raised in 1919 by scholars seeking to know how women, who were about to have the vote nationwide, would decide to cast their ballots. Although women had voted in some state elections, and these data were available, the secret ballot and the ecological inference problem prevented analysts from distinguishing the votes of women from the remaining (male) votes in the same electoral precincts.

The United States and other governments produce enormous quantities of statistical data on aggregates such as towns, cities, congressional districts and census blocks. A solution to the ecological inference problem will give researchers and public policy makers the ability to better analyze data and learn about individual behavior.

King tested his method with data sets of groups for which the individual behaviors were known. He made more than 16,000 comparisons between his estimates and the known individuals' behaviors. NSF provided the support to gather the data and to develop methods for its analysis.

-NSF-

Applications of the Ecological Inference Solution

Several research areas may benefit from the ecological inference solution developed by Gary King, professor of government at Harvard University, with the support of the National Science Foundation.

For more information on his research, see http://gking.harvard.edu.

  • In marketing, researchers know the fraction of married people in each zip code area (from census data), and the number of refrigerators purchased, but need to know what fraction of married people purchase refrigerators.

  • In epidemiology, information is available at the county level on degrees of radon exposure, and the number of people who have lung cancer, but researchers need to know the fraction of individuals with high radon exposure who are diagnosed with lung cancer.

  • In historical research, it is known where working class people lived in Nazi Germany, and the areas that voted for the Nazi party, but scholars need to make ecological inferences to learn about the fraction of working class voters (and others) who voted for the Nazis.

  • In education, researchers who wish to assess the value of school choice programs have measures at the school level, such as the dropout rate or the percent who attend college, as well as on the proportion of each private school's students who paid with a voucher. Because of privacy rules, researchers must make ecological inferences to learn about the fraction of voucher students who attend college, or the fraction of non-voucher students who drop out.

  • Ecological inferences are required in several areas of public policy, such as implementing the Voting Rights Act, where courts have required estimates of the degree to which minority groups vote differently than whites.

  • Elected officials need to make ecological inferences when they attempt to determine the policy preferences of different groups of their constituents.

Media Contacts
George Chartier, NSF, (703) 292-8070, email: gchartie@nsf.gov

Program Contacts
Frank P. Scioli, NSF, (703) 292-8762, email: fscioli@nsf.gov

The U.S. National Science Foundation propels the nation forward by advancing fundamental research in all fields of science and engineering. NSF supports research and people by providing facilities, instruments and funding to support their ingenuity and sustain the U.S. as a global leader in research and innovation. With a fiscal year 2023 budget of $9.5 billion, NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and institutions. Each year, NSF receives more than 40,000 competitive proposals and makes about 11,000 new awards. Those awards include support for cooperative research with industry, Arctic and Antarctic research and operations, and U.S. participation in international scientific efforts.

mail icon Get News Updates by Email 

Connect with us online
NSF website: nsf.gov
NSF News: nsf.gov/news
For News Media: nsf.gov/news/newsroom
Statistics: nsf.gov/statistics/
Awards database: nsf.gov/awardsearch/

Follow us on social
Twitter: twitter.com/NSF
Facebook: facebook.com/US.NSF
Instagram: instagram.com/nsfgov