NSF Award Search: Award # 1703454

Award Abstract # 1703454

SaTC: CORE: Medium: Collaborative: Taming Web Content Through Automated Reduction in Browser Functionality

NSF Org:	CNS Division Of Computer and Network Systems
Recipient:	NORTHEASTERN UNIVERSITY
Initial Amendment Date:	July 5, 2017
Latest Amendment Date:	July 5, 2017
Award Number:	1703454
Award Instrument:	Standard Grant
Program Manager:	Daniela Oliveira doliveir@nsf.gov (703)292-0000 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2017
End Date:	February 28, 2022 (Estimated)
Total Intended Award Amount:	$387,098.00
Total Awarded Amount to Date:	$387,098.00
Funds Obligated to Date:	FY 2017 = $387,098.00
History of Investigator:	Engin Kirda (Principal Investigator) ek@ccs.neu.edu
Recipient Sponsored Research Office:	Northeastern University 360 HUNTINGTON AVE BOSTON MA US 02115-5005 (617)373-5600
Sponsor Congressional District:	07
Primary Place of Performance:	Northeastern University 360 Huntington Avenue MA US 02115-5000
Primary Place of Performance Congressional District:	07
Unique Entity Identifier (UEI):	HLTMVS2JZBS6
Parent UEI:
NSF Program(s):	Secure &Trustworthy Cyberspace
Primary Program Source:	01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	025Z, 7434, 7924
Program Element Code(s):	806000
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Web-based applications executed via web browsers are ubiquitous in everyday life. They underlie our banking, communications, shopping, social networking, tax payments, insurance transactions, and health care interactions. Unfortunately, malicious actors can take advantage of vulnerabilities in web browsers to exploit the user's computer. The consequences of a web browser attack can be severe: web content can execute arbitrary code on the victim's machine. This research project studies how web applications use the features provided by web browsers and how user systems can be protected by restricting unnecessary browser features.

This project addresses web browser security by reducing the browser feature footprint, thereby reducing the browser attack surface and mitigating many classes of attacks. The researchers are building a feature-instrumented browser that reports what functionality is used by a web application. Then, they leverage that information to automatically identify when web applications diverge from their expected behavior and attack the user's browser. To enable users to use the most up-to-date browsers, while protecting them from unnecessary and risky browser features, the research team is building a system to decouple features from the browser.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In this project, NEU, ASU, and NCSU investigated techniques for selectively reducing the functionality in modern browsers. Browsers, since their introduction, have started to support a large set of functionality that is accessible by web pages that are being rendered. Unfortunately, this has attracted malicious actors that have started to use security vulnerabilities in all the parts of the browser to successfully launch attacks. Although modern browsers today support many different features, not all of this functionality is always required, or used, by end users.

During this project, we published our work in highly venues, including USENIX, RAID, CCS, IMC, IEEE Security and Privacy, WWW, ISC, and SAC. We also presented our work at the Mozilla Security Research Summit 2019 and Brave Research Summit 2018, communicating this way directly to browser vendors about our recent research results.

Specifically, we built a dynamic analysis framework hosted inside Chrome’s JavaScript Engine V8, the JS engine of the Chrome browser, that logs native function or property accesses during any JS execution. We call this VisibleV8 (VV8) At less than 600 lines (only 67 of which modify V8’s existing behavior), our patches are lightweight and have been maintained from Chrome versions 63 through 72 without difficulty. VV8 consistently outperforms equivalent inline instrumentation, and it intercepts accesses impossible to instrument inline. This comprehensive coverage allows us to isolate and identify 46 JavaScript namespace artifacts used by JS code in the wild to detect automated browsing platforms and to discover that 29% of the Alexa top 50k sites load content which actively probes these artifacts. We released all of our code related to this project here: https://github.com/wspr-ncsu/visiblev8

We used VisibleV8 in a number of follow-up research projects, such as one published at IMC 2021 to study JavaScript obfuscation techniques in the wild. Our work relies on a simple, but powerful observation: if dynamic analysis of a script’s behavior (specifically, how it interacts with browser APIs) reveals browser API feature usage that cannot be reconciled with static analysis of the script’s source code, then that behavior is obfuscated. To quantify and test this observation, we create a hybrid analysis platform using instrumented Chromium to log all browser API accesses by the scripts executed when a user visits a page. We filter the API access traces from our dynamic analysis through a static analysis tool that we developed in order to quantify how much and what kind of functionality is hidden on the web. When applying this methodology across the Alexa top 100k domains, we discover that 95.90% of the domains we successfully visited contain at least one script which invokes APIs that cannot be resolved from static analysis.

We also used VisibleV8 to study the reproducibility of web measurements. At The Web Conference (WWW) 2021 we investigate how key measurements differ when using naive crawling tool defaults vs. careful attempts to match “real” users across the Tranco top 25k web domains. We find web privacy and security measurements significantly affected by vantage point and browser configuration. We conclude that unless researchers ensure their web measurement tools match real world user experience, the research community is likely missing important signals systematically. For example, we find browser configuration alone causing shifts in 19% of known ad and tracking domains encountered and altering the loading frequency of up to 10% of distinct JavaScript code units executed. We find network vantage points having similar, though less dramatic, effects on the same web metrics. To ensure reproducibility, we carefully document our methodology and publish both our code and collected data.

In November 2021, we performed an empirical analysis of browser features evolution and aimed to evaluate browser fingerprintability. By analyzing 33 Google Chrome, 31 Mozilla Firefox, and 33 Opera major browser versions released through 2016 to 2020, we discovered that all of these browsers have unique feature sets which makes them different from each other. By comparing these features to the fingerprinting APIs presented in literature that have appeared in this field, we were able to conclude that all of these browser versions are uniquely fingerprintable. Our results show an alarming trend that browsers are becoming more fingerprintable over time because newer versions contain more fingerprintable APIs compared to older ones.

Overall, the results of this research project have made a significant contribution to improving the security of the web. Furthermore, we expect that the results of this research, which were made open-source, will be used by future researchers to continue improving the security of the web for all.

Last Modified: 07/01/2022
Modified by: Engin Kirda

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error