Active Learning For Planet Habitability Classification Under Extreme Class Imbalance

editorAstrobiology21 hours ago4 Views

Active Learning For Planet Habitability Classification Under Extreme Class Imbalance

Lower-triangular Pearson correlation matrix of the final feature set used in this study. Each cell shows the Pearson correlation coefficient between a pair of planetary or stellar properties, with the color scale ranging from −1 (strong negative correlation) to +1 (strong positive correlation). The diagonal elements represent self-correlations. To improve readability, only moderate to strong correlations (|r| ≥ 0.5) are annotated. — astro-ph.EP

The increasing size and heterogeneity of exoplanet catalogs have made systematic habitability assessment challenging, particularly given the extreme scarcity of potentially habitable planets and the evolving nature of their labels.

In this study, we explore the use of pool-based active learning to improve the efficiency of habitability classification under realistic observational constraints. We construct a unified dataset from the Habitable World Catalog and the NASA Exoplanet Archive and formulate habitability assessment as a binary classification problem.

A supervised baseline based on gradient-boosted decision trees is established and optimized for recall in order to prioritize the identification of rare potentially habitable planets.

This model is then embedded within an active learning framework, where uncertainty-based margin sampling is compared against random querying across multiple runs and labeling budgets. We find that active learning substantially reduces the number of labeled instances required to approach supervised performance, demonstrating clear gains in label efficiency.

To connect these results to a practical astronomical use case, we aggregate predictions from independently trained active-learning models into an ensemble and use the resulting mean probabilities and uncertainties to rank planets originally labeled as non-habitable. This procedure identifies a single robust candidate for further study, illustrating how active learning can support conservative, uncertainty-aware prioritization of follow-up targets rather than speculative reclassification.

Our results indicate that active learning provides a principled framework for guiding habitability studies in data regimes characterized by label imbalance, incomplete information, and limited observational resources.

R. I. El-Kholy, Z. M. Hayman

Comments: 19 pages, 9 figure, 2 tables
Subjects: Earth and Planetary Astrophysics (astro-ph.EP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG)
Cite as: arXiv:2602.23666 [astro-ph.EP] (or arXiv:2602.23666v1 [astro-ph.EP] for this version)
https://doi.org/10.48550/arXiv.2602.23666
Focus to learn more
Submission history
From: Reham El-Kholy PhD
[v1] Fri, 27 Feb 2026 04:18:11 UTC (398 KB)
https://arxiv.org/abs/2602.23666

Astrobiology,

Explorers Club Fellow, ex-NASA Space Station Payload manager/space biologist, Away Teams, Journalist, Lapsed climber, Synaesthete, Na’Vi-Jedi-Freman-Buddhist-mix, ASL, Devon Island and Everest Base Camp veteran, (he/him) 🖖🏻

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)

Leave a reply

Recent Comments

No comments to show.
Join Us
  • Facebook38.5K
  • X Network32.1K

Stay Informed With the Latest & Most Important News

[mc4wp_form id=314]
Categories

Advertisement

Loading Next Post...
Follow
Search Trending
Popular Now
Loading

Signing-in 3 seconds...

Signing-up 3 seconds...

This website stores cookies on your computer. These cookies are used to provide a more personalized experience and to track your whereabouts around our website in compliance with the European General Data Protection Regulation. If you decide to to opt-out of any future tracking, a cookie will be setup in your browser to remember this choice for one year.

Accept or Deny