Award Abstract # 1633130
BIGDATA: IA: Acting on Actionable Intelligence: A Learning Analytics Methodology for Student Success Efficacy Studies

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: SAN DIEGO STATE UNIVERSITY FOUNDATION
Initial Amendment Date: September 15, 2016
Latest Amendment Date: October 20, 2021
Award Number: 1633130
Award Instrument: Standard Grant
Program Manager: Finbarr Sloane
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2016
End Date: August 31, 2022 (Estimated)
Total Intended Award Amount: $1,096,196.00
Total Awarded Amount to Date: $1,096,196.00
Funds Obligated to Date: FY 2016 = $1,096,196.00
History of Investigator:
  • Richard Levine (Principal Investigator)
    rlevine@mail.sdsu.edu
  • Juanjuan Fan (Co-Principal Investigator)
  • Bernie Dodge (Co-Principal Investigator)
Recipient Sponsored Research Office: San Diego State University Foundation
5250 CAMPANILE DR
SAN DIEGO
CA  US  92182-1901
(619)594-5731
Sponsor Congressional District: 51
Primary Place of Performance: San Diego State University
5500 Campanile Drive
San Diego
CA  US  92182-7720
Primary Place of Performance
Congressional District:
51
Unique Entity Identifier (UEI): H59JKGFZKHL7
Parent UEI: H59JKGFZKHL7
NSF Program(s): Project & Program Evaluation
Primary Program Source: 04001617DB NSF Education & Human Resource
Program Reference Code(s): 7433, 8083, 8244
Program Element Code(s): 726100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The research supported by this project will study how instructors, administrators, and education researchers take advantage of rich student and student performance data collected by the university. The data will be used in the development of a new statistical model that will identify students in need of help and the sort of help that they need. The system is built upon statistical models that are used in personalized medicine to determine the best medical interventions for an individual patient. The research will be carried out by an interdisciplinary team from statistics and data science, institutional research, instructional technology, and information technology and they will develop a learning analytics methodology to automate the tasks of data collection and processing, data visualizations and summaries, data analysis, and scientific reporting in student success efficacy studies. As part of this development, the concept of individualized treatment effects is introduced as a method to assess the effectiveness of interventions and/or instructional regimes and provide personalized feedback to students.

More specifically the research goal of the project is to develop and test new statistical methods for analyzing large sets of student data. The data sets to be analyzed and tested arise from administrative student data collected by San Diego State University. Additionally, the research will develop new methods of data cleaning for the student information system and learning management system data collected by the university to make the entire analysis procedures more efficient. The technical contribution is to utilize a new random forest of interaction trees machine learning method that enables the analysis of treatment effects for individuals and for subgroups (e.g., testing the success of a pedagogical or other intervention for both individual students and for specific subgroups of students). The results of the statistical analysis will be displayed as dashboards to report the findings for the assessment of intervention strategies in improving student retention and performance.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 19)
Autenriech, M. and Levine, R.A. and Fan, J. and Guarcello, M. "STACKED ENSEMBLE LEARNING FOR PROPENSITY SCORE METHODS IN OBSERVATIONAL STUDIES" Journal of educational data mining , v.13 , 2021 https://doi.org/https://doi.org/10.5281/zenodo.5048425 Citation Details
Beemer, Joshua and Spoon, Kelly and Fan, Juanjuan and Stronach, Jeanne and Frazee, James P. and Bohonak, Andrew J. and Levine, Richard A. "Assessing Instructional Modalities: Individualized Treatment Effects for Personalized Learning" Journal of Statistics Education , v.26 , 2018 10.1080/10691898.2018.1426400 Citation Details
Beemer, Joshua and Spoon, Kelly and He, Lingjun and Fan, Juanjuan and Levine, Richard A. "Ensemble Learning for Estimating Individualized Treatment Effects in Student Success Studies" International Journal of Artificial Intelligence in Education , v.28 , 2018 10.1007/s40593-017-0148-x Citation Details
Calhoun, Peter and Levine, Richard A. and Fan, Juanjuan "Repeated measures random forests (RMRF): Identifying factors associated with nocturnal hypoglycemia" Biometrics , v.77 , 2020 https://doi.org/10.1111/biom.13284 Citation Details
Calhoun, Peter and Su, Xiaogang and Nunn, Martha and Fan, Juanjuan "Constructing Multivariate Survival Trees: The MST Package for R" Journal of Statistical Software , v.83 , 2018 10.18637/jss.v083.i12 Citation Details
Guarcello, Maureen A. and Levine, Richard A. and Beemer, Joshua and Frazee, James P. and Laumakis, Mark A. and Schellenberg, Stephen A. "Balancing Student Success: Assessing Supplemental Instruction Through Coarsened Exact Matching" Technology, Knowledge and Learning , v.22 , 2017 10.1007/s10758-017-9317-0 Citation Details
He, L and Levine, R. A. and Fan, J. and Beemer, J. and Stronach, J. "Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research" Practical assessment, research & evaluation , v.23 , 2018 Citation Details
He, Lingjun and Levine, Richard A. and Bohonak, Andrew J. and Fan, Juanjuan and Stronach, Jeanne "Predictive Analytics Machinery for STEM Student Success Studies" Applied Artificial Intelligence , v.32 , 2018 10.1080/08839514.2018.1483121 Citation Details
Hillis, Tristan and Guarcello, Maureen A. and Levine, Richard A. and Fan, Juanjuan "Causal inference in the presence of missing data using a random forestbased matching algorithm" Stat , v.10 , 2021 https://doi.org/10.1002/sta4.326 Citation Details
Levine, Richard A. and Rivera, Patricia E. and He, Lingjun and Fan, Juanjuan and Bresciani Ludvick, Marilee J. "A learning analytics case study: On class sizes in undergraduate writing courses" Stat , v.12 , 2023 https://doi.org/10.1002/sta4.527 Citation Details
Li, Luo and Levine, Richard A. and Fan, Juanjuan "Causal effect random forest of interaction trees for learning individualized treatment regimes with multiple treatments in observational studies" Stat , v.11 , 2022 https://doi.org/10.1002/sta4.457 Citation Details
(Showing: 1 - 10 of 19)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

A primary project aim was to develop an analytics infrastructure for student success efficacy studies, within which actionable information is delivered to key stakeholders.  Student success efficacy studies are at the center of assessments of pedagogical innovations and intervention strategies. Given the wealth of data now available on students from student information system (SIS) and learning management system (LMS) databases, efficacy studies may also identify and characterize student subgroups that will benefit from student success programs, significantly increasing the odds of successfully progressing through a course or degree program. Such assessments are invaluable to instructors, advisors, and administrators as strategic plans, resource allocation strategies, and curricular maps are developed with an eye on student success and retention.

Predictive analytics methods: Student success efficacy studies fall under an observational or quasi-experimental study setting.  Consider study on the effectiveness of a tutoring center.  Students will choose to visit the center (“treatment”), and how often (“treatment dose”).  Without a random treatment assignment, we cannot easily control for selection biases as in a randomized controlled trial: differences in performance among students who never visit the center, or visit only before each exam, or visit every week may be due to the characteristics of students voluntarily electing for any of these strategies.

The goal is to estimate an individualized treatment effect (ITE): assessing the impact of treatment on each student. Students typically have access to many success programs; not just a tutoring center, but also for example Supplemental Instruction, instructor/TA office hours, and recitation sections.  Our goal then is to identify an optimal treatment regime: which success programs should a student attend, and how often, to optimize success in a course or degree program.  The problem is confounded by the massive, disparate, and statistically complex data collected on students.  The data includes demographics, academic performance, learning management, participation in co-curricular and extra-curricular activities, course evaluations, survey data, and countless data stores at a college and department level.

In this project, we developed novel ensemble learning methods for analyzing observational study data to estimate optimal treatment regimes. These methods optimally combine model predictions (of student success) to characterize successful student subgroups, quantify the impact of treatment for individual students and student subgroups, identify student subgroups that will benefit from a treatment regime, quantify the impact of specific data inputs, and draw causal inferences in the presence of missing data.  We characterized and contrasted our methods with the current state-of-the-art, showing in extensive simulation studies that we outperformed competitors relative to ITE prediction accuracy, variable importance ranking, selection bias (both in predictions and variable selection), and optimal treatment regime designation.  We developed user-friendly software for implementing our new methods, including tutorials.

Educational assessment and learning analytics applications: We applied our machine learning machinery to evaluate Supplemental Instruction programs, Math/Stat Learning Center, and active-problem-solving sections, for large enrollment STEM courses.  We also studied placement exams into core STEM courses, developed a university admissions scoring model to rank students and predict show rates, and predicted enrollment in large section STEM courses.  We provided dashboards for advisors to study STEM program migration and determine courses, grades in those courses, and the semester by which the courses should be taken to maximize success in a STEM program.

Propagation plan centered around a Data Champions (DC) program; collaborative environment for evaluating intervention strategies for and discuss common goals in student success studies.  The PIs and grant funded students collaborated in the development and offering of the DC program; the following are resources derived:

-        Training materials: workshops on data resources and data security, survey design and the national survey of student engagement (NSSE), measuring learning dispositions, and identifying student success challenges

-        Program guides: expectations for data coaches working with DC teams; required funding structure to offer a DC program

-        Common data set: infrastructure for curating student information databases, including automated updates

-        Prototype dashboards/storyboards to deliver actionable information to key stakeholders

-        Templates for reporting results, with an emphasis on dashboards, visualizations, and training advisors/instructors in messaging to students

-        Publication of methods and applications in machine learning, learning analytics, and higher education journals; presentations at data science conferences

-        Funded seven Masters theses, five doctoral dissertations, and one post-doc in data science; collaborated on all grant-related publications.

-        DC project presentations at SDSU Library website; SDSU ASIR website includes illustrative dashboards

 

The DC program presents a model to train administrators and advisors on appropriate use of institutional data for evaluating student success initiatives and identifying student success challenges.  The projects provide decision-makers a means to objectively evaluate and refine strategic plans and programs, resource allocation strategies, and curricular maps.  The data-informed decision-making culture fostered on campus thus reduces costs, creates successful learning environments for students, and improves recruitment to and persistence in STEM programs.

 


Last Modified: 01/04/2023
Modified by: Richard A Levine

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page