All Images
News Release 15-084
Limiting 'false discoveries'
University of Pennsylvania research, published in Science, helps develop algorithm aimed at improving data mining
This material is available primarily for archival purposes. Telephone numbers or other contact information may be out of date; please see current contact information at media contacts.
![model of a half-million biomarkers related to a type of brain cancer](/news/mmg/media/images/1111-allen-glio-network-lg_f1bdb434-8889-4153-9f7a-e58f047491ad_f.jpg)
This network model shows a half-million biomarkers related to the type of brain cancer known as glioblastoma. The lines represent "conditionally dependent" connections between biomarkers. New algorithms provide greater confidence when making data-driven hypotheses.
Credit: G. Allen/Rice University
Download the high-resolution JPG version of the image. (1.2 MB)
Use your mouse to right-click (Mac users may need to Ctrl-click) the link above and choose the option that will save the file or target to your computer.
![illustration showing relationships between pieces of information based on data mining](/news/mmg/media/images/450876_actual_4783406b-2f27-42ec-81ec-cabc6db853de_f.jpg)
Mining for correlations between millions of pieces of information can reveal vital relationships or predict future outcomes, such as risk factors for a disease or structures of new chemical compounds. New data mining tools, developed in part by Aaron Roth, an assistant professor at the University of Pennsylvania, are helping to prove the legitimacy of data-based hypotheses.
Credit: NSF
Download the high-resolution JPG version of the image. (574.3 KB)
Use your mouse to right-click (Mac users may need to Ctrl-click) the link above and choose the option that will save the file or target to your computer.
![Visualization of daily Wikipedia edits](/news/mmg/media/images/viegas-useractivityonwikipedia_f519c9b3-6c60-40b4-98d6-ee63a75284b9_f.jpg)
Visualization of daily Wikipedia edits created by IBM. At multiple terabytes in size, the text and images of Wikipedia are an example of big data. Generating hypotheses based on massive datasets can be challenging. New algorithms, created in part by Aaron Roth of the University of Pennsylvania, will help provide researchers with greater confidence when drawing conclusions from data.
Credit: Fernanda B. Viegas / CC BY 2.0 Wikimedia Commons
Download the high-resolution GIF version of the image. (428.7 KB)
Use your mouse to right-click (Mac users may need to Ctrl-click) the link above and choose the option that will save the file or target to your computer.
![Aaron Roth](/news/mmg/media/images/roth-aaron_3dbc9ee9-ef94-46d5-8418-086f6547509e_f.jpg)
Aaron Roth, an assistant professor in the Department of Computer and Information Science in the University of Pennsylvania's School of Engineering and Applied Science, is developing new data mining algorithm aimed at limiting false discoveries and combating science's reproducibility problem.
Credit: University of Pennsylvania
Download the high-resolution JPG version of the image. (11.6 MB)
Use your mouse to right-click (Mac users may need to Ctrl-click) the link above and choose the option that will save the file or target to your computer.
![Aug. 7 Science cover](/news/mmg/media/images/science august 7 2015_f_92115ffd-18f7-4399-bae6-1ce45ec8c731.jpg)
The research by Roth and his collaborators is featured in the Aug. 7, 2015 issue of Science. Ruins of the Serapeo market in Pozzuoli, Italy. The hard caprock of Italy's Campi Flegrei geothermal system is made up of a fiber-reinforced rock formed by the mixture of lime with regional volcanic ash known as pozzolana. The caprock shares many physical properties with Roman concrete, helping to explain the rock strains and strengths in the region.
Credit: Photo Copyright Roger Ressmeyer/CORBIS