末成年小嫩xb,嫰bbb槡bbbb槡bbbb,免费无人区码卡密,成全高清mv电影免费观看

小數據分子性質的外推預測:量子力學輔助機器學習

材料科學極大地受益于機器學習和深度學習技術的進步。這些技術徹底改變了對分子性質的預測,促使傳統計算方法得以改變。機器學習/深度學習技術作為數據驅動材料科學領域中不可或缺的工具,其性能預測的準確性和速度都在逐步提高。
小數據分子性質的外推預測:量子力學輔助機器學習
Fig. 1 Overview of extrapolative prediction of molecular property?based on the range of molecular properties and the diversity of?molecular structures.
但在機器學習/深度學習技術中仍然存在一個關于其固有外推困難的基本矛盾,即對于超越現有數據的預測能力。數據驅動材料探索的主要目標是識別尚未在數據庫中出現的高性能分子/材料。因此,機器學習/深度學習模型必須具有僅從現有數據中推斷未知數據的能力。

小數據分子性質的外推預測:量子力學輔助機器學習

Fig. 2 Model description used for the benchmark.
然而,材料數據集通常由小型實驗結果組成,因而不可避免地會存在偏差。確定機器學習/深度學習模型能否克服這些偏差,并有效地推斷分子性質至關重要。
小數據分子性質的外推預測:量子力學輔助機器學習

Fig. 3 Evaluation methods for assessing interpolation and extrapolative performance.

來自日本東京大學工程學院電氣工程與信息系統系的Hajime Shimakawa等,提出了一個全面的基準來評估12種有機分子性質的外推性能。他們的大規模基準測試顯示,傳統的機器學習模型在屬性范圍和分子結構的訓練分布之外表現出顯著的性能下降,特別是對于小型數據屬性。

小數據分子性質的外推預測:量子力學輔助機器學習

Fig. 4 Evaluation results of the interpolation test using all data points of each dataset and extrapolation tests of property range and?molecular structure (cluster) at data size for interpolation Nin = 200 (50 for EBD) with RMSE relative to σall, where σall represents the?standard deviation of each dataset as listed in Table 1.?

為解決這一挑戰,他們引入了一個稱為QMex的量子力學描述符數據集,以及包含量子力學描述符和分子結構分類信息之間交互項的交互式線性回歸。基于QMex的交互式線性回歸在保持其可解釋性的同時,實現了最先進的外推性能。
小數據分子性質的外推預測:量子力學輔助機器學習
Fig. 5 Ratio of models ranking within the top three for each data size Nin.
他們的基準結果、QMex數據集和所提出的模型對于改進小型實驗數據集的外推預測,并發現超越現有候選材料的新材料/分子極具價值。該文近期發布于npj Computational Materials 10: 11 (2024).
小數據分子性質的外推預測:量子力學輔助機器學習

Fig. 6 Model performance comparison for extrapolation tests.

Editorial Summary

Extrapolative prediction of small-data molecular propertyQuantum mechanics-assisted machine learning

Materials science has greatly benefited from advancements in machine learning (ML) and deep learning (DL) techniques. These techniques have revolutionized the prediction of molecular properties, leveraging traditional computational approaches.ML/DL techniques continue to enhance the accuracy and speed of property prediction, serving as indispensable tools for data-driven materials science.?
小數據分子性質的外推預測:量子力學輔助機器學習

Fig. 7 Summary of ML/DL model selection for interpolation and?extrapolation of molecular property prediction.

However, a fundamental contradiction persists in ML/DL techniques regarding their inherent extrapolation difficulty, i.e., the ability to predict beyond the available data. The primary objective of data-driven materials exploration is to identify high-performance molecules/materials that are not yet represented in databases. Hence, ML/DL models must possess the capability to extrapolate unexplored data solely from the available data. However, materials datasets often consist of small experimental results, which inevitably carries biases. It is crucial to determine whether ML/DL models can overcome these biases and effectively extrapolate molecular properties.?
小數據分子性質的外推預測:量子力學輔助機器學習
Fig. 8 Model performance comparison between QMex-LR and QMex-ILR.
Hajime Shimakawa et al. from the Department of Electrical Engineering & Information Systems, School of Engineering, University of Tokyo, presented a comprehensive benchmark for assessing extrapolative performance across 12 organic molecular properties. Their large-scale benchmark revealed that conventional ML models exhibit remarkable performance degradation beyond the training distribution of property range and molecular structures, particularly for small-data properties. To address this challenge, they introduced a quantum-mechanical (QM) descriptor dataset, called QMex, and an interactive linear regression (ILR), which incorporates interaction terms between QM descriptors and categorical information pertaining to molecular structures. The QMex-based ILR achieved state-of-the-art extrapolative performance while preserving its interpretability. Their benchmark results, QMex dataset, and proposed model serve as valuable assets for improving extrapolative predictions with small experimental datasets and for the discovery of novel materials/molecules that surpass existing candidates. This article was recently published in npj Computational Materials 10: 11 (2024).
原文Abstract及其翻譯
Extrapolative prediction of small-data molecular property using quantum mechanics-assisted machine learning (量子力學輔助機器學習對小數據分子性質外推預測)
Hajime Shimakawa, Akiko Kumada & Masahiro Sato
Abstract Data-driven materials science has realized a new paradigm by integrating materials domain knowledge and machine-learning (ML) techniques. However, ML-based research has often overlooked the inherent limitation in predicting unknown data: extrapolative performance, especially when dealing with small-scale experimental datasets. Here, we present a comprehensive benchmark for assessing extrapolative performance across 12 organic molecular properties. Our large-scale benchmark reveals that conventional ML models exhibit remarkable performance degradation beyond the training distribution of property range and molecular structures, particularly for small-data properties. To address this challenge, we introduce a quantum-mechanical (QM) descriptor dataset, called QMex, and an interactive linear regression (ILR), which incorporates interaction terms between QM descriptors and categorical information pertaining to molecular structures. The QMex-based ILR achieved state-of-the-art extrapolative performance while preserving its interpretability. Our benchmark results, QMex dataset, and proposed model serve as valuable assets for improving extrapolative predictions with small experimental datasets and for the discovery of novel materials/molecules that surpass existing candidates.
摘要數據驅動材料科學通過整合材料領域知識和機器學習(ML)技術,實現了一種新的范式。然而,基于機器學習的研究往往忽略了其預測未知數據的固有局限性:即外推性能,特別是在處理小規模實驗數據集時。在這里,我們提出了一個全面的基準來評估12種有機分子性質的外推性能。我們的大規模基準測試顯示,傳統的機器學習模型在屬性范圍和分子結構的訓練分布之外表現出顯著的性能下降,特別是對小數據屬性。為解決這一挑戰,我們引入了一個稱為QMex的量子力學(QM)描述符數據集,以及包含量子力學描述符和分子結構分類信息之間交互項的交互式線性回歸(ILR)。基于QMex的交互式線性回歸在保持其可解釋性的同時,實現了最先進的外推性能。我們的基準結果、QMex數據集和所提出的模型對于改進小型實驗數據集的外推預測,并發現超越現有候選材料的新材料/分子極具價值。

原創文章,作者:計算搬磚工程師,如若轉載,請注明來源華算科技,注明出處:http://www.zzhhcy.com/index.php/2024/03/26/75f1454dda/

(0)

相關推薦

主站蜘蛛池模板: 赣榆县| 清徐县| 苍南县| 永昌县| 沐川县| 合山市| 建水县| 浦北县| 依安县| 龙游县| 宁河县| 青岛市| 莆田市| 徐水县| 新化县| 浮山县| 定安县| 顺平县| 兴国县| 迭部县| 公安县| 攀枝花市| 奉新县| 南宁市| 雷波县| 长子县| 余姚市| 太康县| 平邑县| 特克斯县| 疏勒县| 尖扎县| 酒泉市| 杂多县| 筠连县| 湟源县| 清远市| 怀来县| 秦皇岛市| 酉阳| 蓝田县|