Enhancing Physical Reliability of Machine Learning Potentials in Molecular Systems
ABSTRACT
The challenge of estimating molecular system energies is pivotal in machine learning potentials. Traditional methods often fall short due to limited and skewed datasets. Our work introduces a novel molecular representation learning method that effectively navigates beyond these data constraints. This method significantly enhances prediction accuracy under a limited dataset by utilizing physics-based parameter estimation intertwined with self-supervised learning techniques similar to masked language modeling. We propose diverse and innovative evaluation metrics, extending beyond the typical focus on energy or force accuracy. Our comprehensive experiments reveal this method's superior capability in identifying molecular structures and its potential in mapping unexplored chemical reaction pathways, signaling a breakthrough in machine learning potentials for molecular systems.