
NSF Org: |
OIA OIA-Office of Integrative Activities |
Recipient: |
|
Initial Amendment Date: | September 20, 2010 |
Latest Amendment Date: | September 20, 2010 |
Award Number: | 1028394 |
Award Instrument: | Standard Grant |
Program Manager: |
Stephen Meacham
smeacham@nsf.gov (703)292-7599 OIA OIA-Office of Integrative Activities O/D Office Of The Director |
Start Date: | October 1, 2010 |
End Date: | September 30, 2016 (Estimated) |
Total Intended Award Amount: | $1,999,503.00 |
Total Awarded Amount to Date: | $1,999,503.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
160 ALDRICH HALL IRVINE CA US 92697-0001 (949)824-7295 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
160 ALDRICH HALL IRVINE CA US 92697-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CDI TYPE II |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.083 |
ABSTRACT
Problems in many science and engineering fields center on the study of large, complex systems consisting of interconnected elements with different attributes and function. Such systems are naturally represented as networks, with a global structure consisting of both the topology of connections among network elements and the distribution of attributes (or functions) inherent in the elements themselves. The power of the network representation lies in its ability to express many apparently different kinds of systems within a common formal framework, allowing for cross-application of computational and analytical techniques across fields. While the promise of such cross-applicability is great, advancement has been hindered by the difficulty of bridging the substantive gulf between different areas of research. The goal of this research is to leverage recent developments in three such areas, namely computer, social, and biological networks, to realize the potential of an integrated, interdisciplinary approach to the study of systems with complex network structure.
Our research focuses specifically on the interaction between network topology and attributes. Central problems we will address include: the modeling of interactions between network topology and element attributes or function; the characterization of unknown network structure from imperfect and incomplete data; and the development of associated algorithms which will scale efficiently to large systems. Computational efficiency is an important dimension of our work, as we are dealing with the measurement and analysis of massive network data sets. We will use the techniques we develop to address important problems in three application domains: computer networks and security (e.g., methods for detection of malicious behavior on the Internet); online social networks (e.g., the reproduction of social stratification in online environments); and biological networks (e.g., biological signatures of disease). The result will be a unified collection of methods, software tools, and data sets that will enable and accelerate development in these research areas.
The intellectual merit of this work lies in the joint analysis of network topology and function in attribute-rich networks across fields. The project will lead to the development of practical techniques and methodologies for data collection and analysis that can be applied to many substantively distinct problems. Dissemination of results will be achieved through research publications, publicly available software and data sets, and communication with relevant practitioner communities. The research will be integrated with curriculum development and student advising and will promote interdisciplinary training of students. The project will promote diversity, not only through the synthesis of the research team, but also through enhancing the understanding of phenomena such as segregation and attitude polarization in online environments.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Social, biological, and computer networks have many similarities that make it possible to develop computational and statistical methods that are applicable across all three domains. Our research has made a number of advances in this area. These include a new class of statistical network models that directly capture features known to be relevant to network function in social, biological, and computer networks, but not previously amenable to statistical modeling. We also developed and disseminated software tools implementing these new models, which are freely available to the public. Our work has also adapted network analytic techniques originally developed in our research on social networks to biological network analysis, with applications including target selection for novel enzymes with potential biotechnological significance
One challenge in the study of large networks (such as those arising from online social networking platforms like Facebook) is working with limited data. Principled strategies for network data are critical to addressing this problem, and our work on this project has already had a transformational impact in this area. At a time where most measurement of online social networks was performed in a heuristic way, we introduced a principled method for evaluating quality and adequacy of link-trace samples (the state-of-the-art approach to network sampling), making it possible to determine when a sampling process can be terminated without compromising data quality. We provided the methods for leveraging multiple overlapping networks while sampling, allowing measurement of systems in which no one network is sufficiently well-connected for link-trace sampling to work. We also improved the efficiency of the sampling process by introducing the first principled basis for stratified sampling using link-trace methods, and elucidated the relative efficiency of competing methods for sampling on real-world networks. Unlike many other methods, ours can be used to produce probability samples of nodes in networks without a high-quality "seed" sample to start from; moreover, we have also introduced techniques that allow network structure itself to be estimated using these samples. Our work has helped to launch a large and growing body of work on design estimation using random walk methods that spans the social science, computer science, and engineering literatures, with applications to everything from the study of online social networks to the fast algorithms for approximate calculation of relational features in distributed database and communication systems. We made our network sampling software (including crawlers and estimators) as well as the samples of popular online social networks we collected publicly available; more than 5,000 researchers have already used the latter.
The project also made contributions in network construction, i.e., the problem of generating synthetic graphs that exhibit certain desired characteristics. Applications of network construction methods include generating synthetic graphs that resemble real world, for the purpose of simulation or anonymization. Within the well-known dk-series framework for characterizing graphs (i) we developed a novel, flexible, and efficient way to construct 2K graphs and (ii) we proved the hardness of 3K-construction -- a problem that was previously open. We also extended the dk-series construction framework to incorporate node attributes in addition to network structure, and directed graphs. We made our software publicly available (for example, we incorporated the 2K construction algorithms in the popular networkX python library).
Function in many social and biological networks is intimately related to dynamics - the way that networks evolve through time. An important outcome of our research has been the discovery that the mechanisms driving dynamics in some networks follow regular, cyclic patterns. Thus, the speed with which information diffuses through a communication network, or the robustness of such a network to disruption, can systematically vary over the course of the day, week, or year. Identifying and understanding these patterns can inform strategies for effective information dissemination, and may have implications for cybersecurity. We have also exploited these regularities in aggregated mobile communication data to develop new techniques for characterizing cellular traffic in urban areas. These new methods provide an extremely cost-effective way of tracking changes in social or economic activity within cities, and can also be used by cellular providers to provision their networks, or by smart cities to provide new services, such as ride-sharing recommendations.
Last Modified: 12/29/2016
Modified by: Athina Markopoulou
Please report errors in award information by writing to: awardsearch@nsf.gov.