Award Abstract # 0721667
SDCI Data New: A Modular Software Framework for Evaluation, Testing, and Cross-Fertilization of Authorship Attribution Techniques

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: DUQUESNE UNIVERSITY OF THE HOLY SPIRIT
Initial Amendment Date: August 27, 2007
Latest Amendment Date: August 27, 2007
Award Number: 0721667
Award Instrument: Standard Grant
Program Manager: Kevin Thompson
kthompso@nsf.gov
 (703)292-4220
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 15, 2007
End Date: July 31, 2011 (Estimated)
Total Intended Award Amount: $212,000.00
Total Awarded Amount to Date: $212,000.00
Funds Obligated to Date: FY 2007 = $212,000.00
History of Investigator:
  • Patrick Juola (Principal Investigator)
    juola@mathcs.duq.edu
Recipient Sponsored Research Office: Duquesne University
600 FORBES AVENUE
PITTSBURGH
PA  US  15282
(412)396-1537
Sponsor Congressional District: 12
Primary Place of Performance: Duquesne University
600 FORBES AVENUE
PITTSBURGH
PA  US  15282
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): NGYSJ2L1LZX3
Parent UEI: R8GSG4QZV989
NSF Program(s): SOFTWARE DEVELOPEMENT FOR CI
Primary Program Source: app-0107 
Program Reference Code(s): 9216, HPCC
Program Element Code(s): 768300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

A Modular Software Framework for Evaluation, Testing, and Cross-Fertilization of Authorship Attribution Techniques

PI: Patrick M. Juola, PhD, Duquesne University

Documents do not speak only of their contents; to a trained eye, they can also say much about their author. The field of ``authorship attribution'' in humanities scholarship has been attending to this for centuries, trying to determine how and to what accuracy the author of a document can be determined. Recent developments in corpus linguistics have shown it to be possible to make these determinations automatically by ``non-traditional'' methods, essentially statistical investigations of the words, phrases, layout, and other features of the document.

Unfortunately, the current state-of-the-art is a confused collection of proposed methods, with little guidance about which methods work, why they work, and under what conditions they work best. We are addressing this by developing a modular software framework (using a theoretical model proposed by Juola[23]) to perform this task in a modular design that permits easy swapping of functional components in cross-combination.

By applying a rigorous testing method to the resulting set of (novel) combinations, the project is establishing accuracy benchmarks for various techniques (under the various testing conditions), finding new combinations resulting in improved techniques, and creating "best practices."

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Juola, Patrick; John Noecker, Jr; Mike Ryan, & Sandy Speer "JGAAP4.0 -- A Revised Authorship Attribution Tool" Digital Humanities 2009 , 2009
Juola, Patrick & Mike Ryan "`Authorship Attribution, Similarity, and Noncommutative Divergence Measures" Chicago Colloquium on Digital Humanities and Computer Science (DHCS) , 2009
Noecker, John & Patrick Juola "An Empirical Study of Linear Separability on Authorship Attribution Feature Spaces" Chicago Colloquium on Digital Humanities and Computer Science (DHCS) , 2008
Noecker Jr., John, Mike Ryan, Patrick Juola, Amanda Sgroi, Stacey Levine, & Benjamin Wells "Close Only Counts in Horseshoes and\ldots Authorship Attribution?" Digital Humanities 2009 , 2009
Noecker Jr, John & Patrick Juola "Cosine Distance Nearest-Neighbor Classification for Authorship Attribution" Digital Humanities 2009 , 2009
Patrick Juola "Authorship Attribution : What mixture of experts says we still don't know" Proceedings of American Association for Corpus Linguistics 2008 (Provo, UT USA) , 2008
Patrick Juola "JGAAP : A System for Comparative Evaluation of Authorship Attribution" Chicago Colloquium on Digital Humanities and Computer Science (DHCS) , 2008
Ryan, Michael & Patrick Juola "Authorship Attribution, The Large and Small Effect Sizes of Divergence as Classification" Digital Humanities 2009 , 2009

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page