NSF Award Search: Award # 1815267

Award Abstract # 1815267

CDS&E: Machine Learning for Star Cluster Classification

NSF Org:	AST Division Of Astronomical Sciences
Recipient:	UNIVERSITY OF MASSACHUSETTS
Initial Amendment Date:	October 18, 2018
Latest Amendment Date:	October 18, 2018
Award Number:	1815267
Award Instrument:	Standard Grant
Program Manager:	Nigel Sharp nsharp@nsf.gov (703)292-4905 AST Division Of Astronomical Sciences MPS Directorate for Mathematical and Physical Sciences
Start Date:	November 1, 2018
End Date:	October 31, 2021 (Estimated)
Total Intended Award Amount:	$251,741.00
Total Awarded Amount to Date:	$251,741.00
Funds Obligated to Date:	FY 2019 = $251,741.00
History of Investigator:	Daniela Calzetti (Principal Investigator) calzetti@astro.umass.edu Subhransu Maji (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Massachusetts Amherst 101 COMMONWEALTH AVE AMHERST MA US 01003-9252 (413)545-0698
Sponsor Congressional District:	02
Primary Place of Performance:	University of Massachusetts Amherst 710 North Pleasant Street Amherst MA US 01003-9305
Primary Place of Performance Congressional District:	02
Unique Entity Identifier (UEI):	VGJHK59NMPK9
Parent UEI:	VGJHK59NMPK9
NSF Program(s):	EXTRAGALACTIC ASTRON & COSMOLO
Primary Program Source:	01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	1206, 8084
Program Element Code(s):	121700
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.049

ABSTRACT

This is a pilot project to compare and test human visual inspection versus machine learning (ML) and computer vision (CV) methods for identifying young star clusters (YSC) in high resolution images of galaxies. It is the first step towards exploring and optimizing the ML methods so as to build a tool capable of automatic search, classification, and shape measurement of YSC. This study will provide a launch-pad for the full project. It seems likely that ML tools will be the only viable way to handle the vast 'Big Data' databases becoming common in astronomy. An integrated educational component includes summer research for undergraduate students, including a valuable introduction to 'Big Data' issues.

Initial tests will be performed on the two closest galaxies to our own Milky Way (M31 and M33), and then extended to M51 and NGC628, which are further away from us. These are well-studied galaxies for which high-fidelity catalogs already exist, which are available for comparison and calibration. The chosen test galaxies have very different cluster populations, and thus represent key testbeds to validate both the standard (human-based) approach and the future ML approach being developed. ML/CV algorithms to be explored and tested on these images include very deep convolutional neural networks, which will be adapted to provide collective classifications of star clusters. The human-based approach is currently the 'industry standard', and its validation will provide a more secure footing for future investigations of the physics of star formation in external galaxies.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Pérez, Gustavo and Messa, Matteo and Calzetti, Daniela and Maji, Subhransu and Jung, Dooseok E. and Adamo, Angela and Sirressi, Mattia "StarcNet: Machine Learning for Star Cluster Identification" The Astrophysical Journal , v.907 , 2021 https://doi.org/10.3847/1538-4357/abceba Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of this project was to produce a Machine Learning algorithm to reduce human annotation effort in identifying star clusters in nearby galaxies. The standard process involves human classification of sources in multi-color high resolution images of galaxies, after an initial selection of the sources performed with automatic tools. These tools identify potential sources, but also include many contaminants (stars, background galaxies, image artifacts). The contaminants can represent between 50% and 70% of the total initial source catalogs, depending on the parameters used for the automatic selection. Hence, there is a need for visual inspection of the automatic catalogs, in order to separate bona-fide clusters from contaminants. This part of securing star cluster catalogs is time consuming, often requiring weeks to months to complete a galaxy, depending on the richness of its cluster population. In addition, different human classifiers can reach different conclusions on the nature of a source. This is because beyond a few Mpc, star clusters are no longer resolved into individual stars, and often appear barely more extended than a point source (e.g., a star). Detailed visualization of the candidate is thus required for obtaining a classification. Because of this problem, Citizen Science approaches have not been successful when applied to galaxies beyond our Local Group.

Machine Learning provides an excellent approach to secure cluster populations while avoiding the bottleneck of human classification, or at least lightening the effort of the human classifiers. In addition to acceleratinng the pace of classifications, by reducing to several minutes to a few hours a task that normally takes weeks-to-months, Machine Learning offers the advantage of repeatability and homogeneity, once the algorithms are trained. Finally, machines do not get tired, while humans do: informal tests have shown that classifications performed by humans late in a day are of worse quality than those performed earlier in the day.

The product of this project, StarcNet, is a custom multi-layer convolutional neural network (CNN) that ingests automatic catalogs of cluster candidates together with multi-band cut-outs of the galaxy images at the position of each candidate and runs them through three pathways operating at different resolutions(see Figure). Each pathway of the CNN consists of seven convolutional layers which are later connected to produce a prediction for the candidate.

Tests run on existing (human classified) star cluster catalogs demonstrate that StarcNet performs as well as human classifiers, with an overall accuracy of 68.6% - human classifiers are around 70%-75% for person-to-person agreement - when clusters are classified into one of four classes. For a binary classification task (cluster/non-cluster) the accuracy is significantly higher, around 86% for StarcNet, comparable to the 87% of humans.

Comparisons with other published CNN algorithms show that StarcNet outperforms them, at least in this one task. Finally, tests run on the training sets show that the current existing samples (of about 15,000 classified sources) are barely sufficient for training, and larger sets are more desirable than homogeneous sets.

The results from this project are published in:
Perez, G., Messa, M., Calzetti, D., Maji, S., Jung, D., Adamo, A., & Sirressi, M., `StarcNet: Machine Learning for Star CLuster Identification', 2021, The Astrophysical Journal, 907, 100.

The StarcNet software and training samples are released through Github and published in the Astrophysics Source Code Library, record ascl:2106.012.

Last Modified: 11/26/2021
Modified by: Daniela Calzetti

Image

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error