Award Abstract # 1464104
CRII: CSR: Towards Understanding and Mitigating the Impact of Web Robot Traffic on Web Systems

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: WRIGHT STATE UNIVERSITY
Initial Amendment Date: March 3, 2015
Latest Amendment Date: November 5, 2015
Award Number: 1464104
Award Instrument: Standard Grant
Program Manager: Marilyn McClure
mmcclure@nsf.gov
 (703)292-5197
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 1, 2015
End Date: April 30, 2019 (Estimated)
Total Intended Award Amount: $150,294.00
Total Awarded Amount to Date: $174,319.00
Funds Obligated to Date: FY 2015 = $150,294.00
FY 2016 = $24,025.00
History of Investigator:
  • Derek Doran (Principal Investigator)
    derek.doran@wright.edu
Recipient Sponsored Research Office: Wright State University
3640 COLONEL GLENN HWY
DAYTON
OH  US  45435-0002
(937)775-2425
Sponsor Congressional District: 10
Primary Place of Performance: Wright State University
3640 Colonel Glenn Highway
Dayton
OH  US  45435-0001
Primary Place of Performance
Congressional District:
10
Unique Entity Identifier (UEI): NPT2UNTNHJZ1
Parent UEI:
NSF Program(s): Special Projects - CNS,
CSR-Computer Systems Research
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 8228, 9178, 9251
Program Element Code(s): 171400, 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This CSR-CRII project responds to the sudden rise of Web robot (a.k.a. Web crawler) traffic on Web systems around the world - from approximately 20% of all requests a decade ago to over 60% today. Because present Web systems' optimizations assume that the traffic serviced exhibit human-like patterns that robots do not, present robot activity on the Web may silently degrade performance, energy efficiency, and scalability of Web systems. As the Web continues to evolve towards a social platform where individuals upload extemporaneous thoughts and observations that only carry instantaneous value to organizations, and where the Internet of Things concept is expected to introduce millions of devices that collect data from the Web and submit requests to online services automatically, robot traffic will only rapidly increase in volume and intensity. For this reason, it is essential that we understand the impact of Web robot traffic on modern Web systems and devise technologies capable of mitigating their impact on system performance, energy efficiency, and scalability.

This effort will synthesize our present understanding of robot traffic with machine learning tools, statistical analysis, and data science methods not previously considered in the context of Web traffic analysis and user behavioral modeling. It will improve our ability to understand the impact of robot traffic on Web systems by: (i) devising automatic methods to classify robots by their functionality and by the demands they impose; and (ii) develop novel robot traffic generators, tailored to a specific profile of robot types that can test how a system reacts to robot traffic of varying intensity and functional type mixtures. The project will also explore a prototype robot-resilient caching system that could lead to immediate performance payoffs for existing Web systems. The project will result in preliminary analytical models, empirical results, and prototype analysis software leading to longer-term research endeavors. Recent data from Web systems that provide services across many Web domains are immediately available for the project.

The results of the project potentially may transform the way Web systems from single servers to large clouds are designed and optimized mitigating performance, energy efficiency, and the financial cost of servicing robots. Students to work on this project will be strategically recruited to broaden participation. Educational activities will provide students useful yet infrequently taught traffic analysis and Web systems security fostering stronger ties between knowledge engineering and cybersecurity student and research communities.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Derek Doran and Swapna Gokhale "An Integrated Method for Real-time and Offline Web Robot Detection" Expert Systems , v.33 , 2016 , p.592 10.1111/exsy.12184
Derek Doran and Swapna Gokhale "An Integrated Method for Real-Time and Offline Web Robot Detection" Expert Systems , 2016 10.1111/exsy.12184
Kyle Brown and Derek Doran "Contrasting Web Robot and Human Behaviors with Network Models" Journal of Communications , 2018
Kyle Brown and Derek Doran "Realistic Traffic Generation for Web Robots" IEEE International Conference on Machine Learning and Applications , 2017
Mahdieh Zabihimayvan and Derek Doran "Some (Non)-Universal Properties of Web Robot Traffic" IEEE Conference on Information Sciences and Systems , 2018
Mahdieh Zabihi, Reza Sadeghi and Derek Doran "A Soft Computing Approach for Benign and Malicious Web Robot Detection" Expert Systems with Applications , 2017
Nathan Rude and Derek Doran "Request Type Prediction for Web Robot and Internet of Things Traffic" IEEE International Conference on Machine Learning and Applications , 2015
Nathan Rude and Derek Doran "Request Type Prediction for Web Robot and Internet of Things Traffic" IEEE Intl. Conference on Machine Learning and Applications , 2015 10.1109/ICMLA.2015.53
Ning Xie and Kyle Brown and Nathan Rude and Derek Doran "A Soft Computing Prefetcher to Mitigate Cache Degradation by Web Robots" Intl. Symposium on Neural Networks , 2017
N. Rude and D, Doran "Request Type Prediction for Web Robot and Internet of Things Traffic" Proc. of Intl. Conference on Machine Learning and Applications , 2015 , p.988

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has advanced our understanding about the shape, nature, and implications of web robot traffic on web systems. Specific contributions of this project include a new method based on fuzzy methods to identify benign and malicious web robot traffic, an extension of fuzzy rough set theory to automtically select the ideal features to identify web robots customized to a particular web server, the creation of a web robot traffic generator able to produce streams of synthetic robots to perform web systems performance and capacity planning, intelligent caching systems for web servers that mitigate the negative impact web robot traffic has on them, and the relaization that request type analysis may be sufficient to identify hidden web robot traffic streams on a web server. The project applied our developments in web crawling and characterization to carry out a large scale, successful comprehensive crawl of the English language Tor dark web that to the best of our knowledge has ever performed. This crawl led to important insights into the structure, nature of content, and relationships behind the types of information stored on the dark web. Our work on automatic feature selection for web robot traffic was reported on by techXplore.

The project has supported 6 graduate students and 5 undergraduate, including two female PhD students and one female undergraduate student. Two MS students graduated with funding from this project. Both students are now data scientists at Cisco Systems and LexisNexis Special Services, respectively. Another MS student transferred to our PhD program and has moved onto a new project. Yet another MS student, who is now a DoD SMART Fellow, transitioned to a new project before graduating. He is now a PhD student at Perdue. Both funded PhD students are minority women. One of these students were funded under this effort for her first years of her PhD program before transferring. Her publications under this project led to multiple machine learning internships, including at NEC Research Labs and Amazon. These are exceptional outcomes for PhD students at Wright State University. Both women PhD students are Anita Borg scholars of the Grace Hopper Celebration for Women in Computing and CRA-W grad cohort scholars. They are also regular participants of Ohio Celebration of Women in Computing events. One of the five undergraduate students supported by this project have moved on into graduate studies, another is completing a BS honors thesis with the intention of continuing to graduate school. The remaining students landed excellent research engineering positions in the greater Dayton, OH area. 

 


Last Modified: 04/22/2019
Modified by: Derek E Doran

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page