
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | March 3, 2015 |
Latest Amendment Date: | November 5, 2015 |
Award Number: | 1464104 |
Award Instrument: | Standard Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 1, 2015 |
End Date: | April 30, 2019 (Estimated) |
Total Intended Award Amount: | $150,294.00 |
Total Awarded Amount to Date: | $174,319.00 |
Funds Obligated to Date: |
FY 2016 = $24,025.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
3640 COLONEL GLENN HWY DAYTON OH US 45435-0002 (937)775-2425 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
3640 Colonel Glenn Highway Dayton OH US 45435-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CNS, CSR-Computer Systems Research |
Primary Program Source: |
01001617DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This CSR-CRII project responds to the sudden rise of Web robot (a.k.a. Web crawler) traffic on Web systems around the world - from approximately 20% of all requests a decade ago to over 60% today. Because present Web systems' optimizations assume that the traffic serviced exhibit human-like patterns that robots do not, present robot activity on the Web may silently degrade performance, energy efficiency, and scalability of Web systems. As the Web continues to evolve towards a social platform where individuals upload extemporaneous thoughts and observations that only carry instantaneous value to organizations, and where the Internet of Things concept is expected to introduce millions of devices that collect data from the Web and submit requests to online services automatically, robot traffic will only rapidly increase in volume and intensity. For this reason, it is essential that we understand the impact of Web robot traffic on modern Web systems and devise technologies capable of mitigating their impact on system performance, energy efficiency, and scalability.
This effort will synthesize our present understanding of robot traffic with machine learning tools, statistical analysis, and data science methods not previously considered in the context of Web traffic analysis and user behavioral modeling. It will improve our ability to understand the impact of robot traffic on Web systems by: (i) devising automatic methods to classify robots by their functionality and by the demands they impose; and (ii) develop novel robot traffic generators, tailored to a specific profile of robot types that can test how a system reacts to robot traffic of varying intensity and functional type mixtures. The project will also explore a prototype robot-resilient caching system that could lead to immediate performance payoffs for existing Web systems. The project will result in preliminary analytical models, empirical results, and prototype analysis software leading to longer-term research endeavors. Recent data from Web systems that provide services across many Web domains are immediately available for the project.
The results of the project potentially may transform the way Web systems from single servers to large clouds are designed and optimized mitigating performance, energy efficiency, and the financial cost of servicing robots. Students to work on this project will be strategically recruited to broaden participation. Educational activities will provide students useful yet infrequently taught traffic analysis and Web systems security fostering stronger ties between knowledge engineering and cybersecurity student and research communities.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project has advanced our understanding about the shape, nature, and implications of web robot traffic on web systems. Specific contributions of this project include a new method based on fuzzy methods to identify benign and malicious web robot traffic, an extension of fuzzy rough set theory to automtically select the ideal features to identify web robots customized to a particular web server, the creation of a web robot traffic generator able to produce streams of synthetic robots to perform web systems performance and capacity planning, intelligent caching systems for web servers that mitigate the negative impact web robot traffic has on them, and the relaization that request type analysis may be sufficient to identify hidden web robot traffic streams on a web server. The project applied our developments in web crawling and characterization to carry out a large scale, successful comprehensive crawl of the English language Tor dark web that to the best of our knowledge has ever performed. This crawl led to important insights into the structure, nature of content, and relationships behind the types of information stored on the dark web. Our work on automatic feature selection for web robot traffic was reported on by techXplore.
The project has supported 6 graduate students and 5 undergraduate, including two female PhD students and one female undergraduate student. Two MS students graduated with funding from this project. Both students are now data scientists at Cisco Systems and LexisNexis Special Services, respectively. Another MS student transferred to our PhD program and has moved onto a new project. Yet another MS student, who is now a DoD SMART Fellow, transitioned to a new project before graduating. He is now a PhD student at Perdue. Both funded PhD students are minority women. One of these students were funded under this effort for her first years of her PhD program before transferring. Her publications under this project led to multiple machine learning internships, including at NEC Research Labs and Amazon. These are exceptional outcomes for PhD students at Wright State University. Both women PhD students are Anita Borg scholars of the Grace Hopper Celebration for Women in Computing and CRA-W grad cohort scholars. They are also regular participants of Ohio Celebration of Women in Computing events. One of the five undergraduate students supported by this project have moved on into graduate studies, another is completing a BS honors thesis with the intention of continuing to graduate school. The remaining students landed excellent research engineering positions in the greater Dayton, OH area.
Last Modified: 04/22/2019
Modified by: Derek E Doran
Please report errors in award information by writing to: awardsearch@nsf.gov.