
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 15, 2016 |
Latest Amendment Date: | October 20, 2021 |
Award Number: | 1633130 |
Award Instrument: | Standard Grant |
Program Manager: |
Finbarr Sloane
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2016 |
End Date: | August 31, 2022 (Estimated) |
Total Intended Award Amount: | $1,096,196.00 |
Total Awarded Amount to Date: | $1,096,196.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
5250 CAMPANILE DR SAN DIEGO CA US 92182-1901 (619)594-5731 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5500 Campanile Drive San Diego CA US 92182-7720 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Project & Program Evaluation |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The research supported by this project will study how instructors, administrators, and education researchers take advantage of rich student and student performance data collected by the university. The data will be used in the development of a new statistical model that will identify students in need of help and the sort of help that they need. The system is built upon statistical models that are used in personalized medicine to determine the best medical interventions for an individual patient. The research will be carried out by an interdisciplinary team from statistics and data science, institutional research, instructional technology, and information technology and they will develop a learning analytics methodology to automate the tasks of data collection and processing, data visualizations and summaries, data analysis, and scientific reporting in student success efficacy studies. As part of this development, the concept of individualized treatment effects is introduced as a method to assess the effectiveness of interventions and/or instructional regimes and provide personalized feedback to students.
More specifically the research goal of the project is to develop and test new statistical methods for analyzing large sets of student data. The data sets to be analyzed and tested arise from administrative student data collected by San Diego State University. Additionally, the research will develop new methods of data cleaning for the student information system and learning management system data collected by the university to make the entire analysis procedures more efficient. The technical contribution is to utilize a new random forest of interaction trees machine learning method that enables the analysis of treatment effects for individuals and for subgroups (e.g., testing the success of a pedagogical or other intervention for both individual students and for specific subgroups of students). The results of the statistical analysis will be displayed as dashboards to report the findings for the assessment of intervention strategies in improving student retention and performance.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
A primary project aim was to develop an analytics infrastructure for student success efficacy studies, within which actionable information is delivered to key stakeholders. Student success efficacy studies are at the center of assessments of pedagogical innovations and intervention strategies. Given the wealth of data now available on students from student information system (SIS) and learning management system (LMS) databases, efficacy studies may also identify and characterize student subgroups that will benefit from student success programs, significantly increasing the odds of successfully progressing through a course or degree program. Such assessments are invaluable to instructors, advisors, and administrators as strategic plans, resource allocation strategies, and curricular maps are developed with an eye on student success and retention.
Predictive analytics methods: Student success efficacy studies fall under an observational or quasi-experimental study setting. Consider study on the effectiveness of a tutoring center. Students will choose to visit the center (“treatment”), and how often (“treatment dose”). Without a random treatment assignment, we cannot easily control for selection biases as in a randomized controlled trial: differences in performance among students who never visit the center, or visit only before each exam, or visit every week may be due to the characteristics of students voluntarily electing for any of these strategies.
The goal is to estimate an individualized treatment effect (ITE): assessing the impact of treatment on each student. Students typically have access to many success programs; not just a tutoring center, but also for example Supplemental Instruction, instructor/TA office hours, and recitation sections. Our goal then is to identify an optimal treatment regime: which success programs should a student attend, and how often, to optimize success in a course or degree program. The problem is confounded by the massive, disparate, and statistically complex data collected on students. The data includes demographics, academic performance, learning management, participation in co-curricular and extra-curricular activities, course evaluations, survey data, and countless data stores at a college and department level.
In this project, we developed novel ensemble learning methods for analyzing observational study data to estimate optimal treatment regimes. These methods optimally combine model predictions (of student success) to characterize successful student subgroups, quantify the impact of treatment for individual students and student subgroups, identify student subgroups that will benefit from a treatment regime, quantify the impact of specific data inputs, and draw causal inferences in the presence of missing data. We characterized and contrasted our methods with the current state-of-the-art, showing in extensive simulation studies that we outperformed competitors relative to ITE prediction accuracy, variable importance ranking, selection bias (both in predictions and variable selection), and optimal treatment regime designation. We developed user-friendly software for implementing our new methods, including tutorials.
Educational assessment and learning analytics applications: We applied our machine learning machinery to evaluate Supplemental Instruction programs, Math/Stat Learning Center, and active-problem-solving sections, for large enrollment STEM courses. We also studied placement exams into core STEM courses, developed a university admissions scoring model to rank students and predict show rates, and predicted enrollment in large section STEM courses. We provided dashboards for advisors to study STEM program migration and determine courses, grades in those courses, and the semester by which the courses should be taken to maximize success in a STEM program.
Propagation plan centered around a Data Champions (DC) program; collaborative environment for evaluating intervention strategies for and discuss common goals in student success studies. The PIs and grant funded students collaborated in the development and offering of the DC program; the following are resources derived:
- Training materials: workshops on data resources and data security, survey design and the national survey of student engagement (NSSE), measuring learning dispositions, and identifying student success challenges
- Program guides: expectations for data coaches working with DC teams; required funding structure to offer a DC program
- Common data set: infrastructure for curating student information databases, including automated updates
- Prototype dashboards/storyboards to deliver actionable information to key stakeholders
- Templates for reporting results, with an emphasis on dashboards, visualizations, and training advisors/instructors in messaging to students
- Publication of methods and applications in machine learning, learning analytics, and higher education journals; presentations at data science conferences
- Funded seven Masters theses, five doctoral dissertations, and one post-doc in data science; collaborated on all grant-related publications.
- DC project presentations at SDSU Library website; SDSU ASIR website includes illustrative dashboards
The DC program presents a model to train administrators and advisors on appropriate use of institutional data for evaluating student success initiatives and identifying student success challenges. The projects provide decision-makers a means to objectively evaluate and refine strategic plans and programs, resource allocation strategies, and curricular maps. The data-informed decision-making culture fostered on campus thus reduces costs, creates successful learning environments for students, and improves recruitment to and persistence in STEM programs.
Last Modified: 01/04/2023
Modified by: Richard A Levine
Please report errors in award information by writing to: awardsearch@nsf.gov.