质量分析

机器学习相关技术在以黄酮为特征的黄芪分类中的应用研究

展开
  • 1.中国食品药品检定研究院,北京 102629;
    2.北京市药品检验研究院,北京 102206
第一作者 石 岩 Tel:(010)53852081;E-mail:shiyan@nifdc.org.cn
李 宁 Tel:13811671528;E-mail:642781540@qq.com
* Tel:(010)53852020;E-mail:weifeng@nifdc.org.cn

收稿日期: 2023-07-19

  网络出版日期: 2024-06-20

Research on the application of machine learning related techniques in the classification of Astragali Radix characterized by flavonoids

Expand
  • 1. National Institutes for Food and Drug Control, Beijing 102629, China;
    2. Beijing Institute for Drug Control, Beijing 102206, China

Received date: 2023-07-19

  Online published: 2024-06-20

摘要

目的:建立以黄酮类成分为特征的栽培黄芪、半野生黄芪和野生黄芪的三分类模型,并且对自动机器学习技术和数据增强技术在药物分析领域中的应用进行探索和评价。方法:首先,对黄芪的黄酮类成分含量数据进行相关性分析、主成分分析,建立决策树和逻辑回归模型,根据模型分析黄酮类成分的重要性程度;然后,使用TVAE表格数据生成算法,根据真实数据生成600批虚拟数据,使用自动学习框架AutoGluon,num_bag_folds设为5,分别对64批真实数据和600批虚拟数据进行学习,得到2组共30个模型,依据准确率进行评估。结果:对机器学习模型的分析可知,芒柄花素、毛蕊异黄酮葡萄糖苷和刺芒柄花苷这3种黄酮类成分对于黄芪质量,尤其是来源等级的控制具有重要意义;2组共30个模型预测准确率表明,基于NeuralNet的模型和基于树模型的机器学习算法对于黄酮成分数据表征的黄芪而言分类效果最好;数据增强技术生成的虚拟数据与真实数据在所训练得到的模型准确率趋势方面基本一致。结论:机器学习相关技术在以黄酮为特征的黄芪分类中具有较好的应用价值。

本文引用格式

石岩, 李宁, 魏锋, 马双成 . 机器学习相关技术在以黄酮为特征的黄芪分类中的应用研究[J]. 药物分析杂志, 2024 , 44(5) : 866 -873 . DOI: 10.16155/j.0254-1793.2024.05.15

Abstract

Objective: To establish a three classification model for cultivated, semi-wild, and wild Astragali Radix characterized by flavonoids, and explore and evaluate the application of techniques of automated machine learning and data augmentation in the field of drug analysis. Methods: Firstly, correlation analysis and principal component analysis were conducted on the flavonoid content data of Astragali Radix, and models of decision tree and logistic regression were established to analyze the importance of flavonoid components based on the models. Then, using the AutoGluon framework with 5 as num_bag_folds, 2 sets of 30 models respectively through 64 batches of real data and 600 batches of virtual data generated based on real data with the TVAE table data generation algorithm for training were obtained, and these models were evaluated by accuracy. Results: The analysis of machine learning models, indicated that formononetin, campanulin and onospin played the important roles in the quality control of Astragali Radix, especially for the source grade control. The accuracy of model prediction showed that the models based on Neural Net and tree-model always had the best classification effect for Astragali Radix. The virtual data generated by data augmentation technique is basically consistent with the actual data in terms of the accuracy trend of the model training process. Conclusion: Related techniques of machine learning have good application value in the classification of Astragali Radix characterized by flavonoids.

参考文献

[1] 中华人民共和国药典2020年版. 一部 [S]. 2020:315
ChP 2020. Vol Ⅰ [S]. 2020:315
[2] DU HW, ZHAO XL, ZHANG AH, et al. Rapid separation, identification and analysis of Astragalus membranaceus Fisch. using liquid chromatography-tandem mass spectrometry[J]. J Chromatogr Sci, 2014, 52(3):226
[3] LI K, GAO F, WANG G, et al. Identification of cultured and natural Astragalus root based on monosaccharide mapping[J]. Molecules, 2015, 20(9):16466
[4] LEE SM, JEONG JS, KWON HJ, et al. Quantification of isoflavonoids and triterpene saponins in Astragali Radix, the root of Astragalus membranaceus, via reverse-phase high-performance liquid chromatography coupled with integrated pulsed amperometric detection[J]. J Chromatogr B, 2017, 1070:76
[5] 石岩,贾天颖,李向日,等.黄芪中多种黄酮类成分的测定研究[J]. 药物分析杂志,2022,42(7):1120
SHI Y, JIA TY, LI XR, et al. Quantification of flavonoid compounds in Astragali Radix[J]. Chin J Pharm Anal, 2022, 42(7):1120
[6] 张丽,钱大玮,卜凡淑,等.基于UPLC-MS的黄芪药材质量评价研究[J]. 药物分析杂志,2020,40(4):722
ZHANG L, QIAN DW, BU FS, et al. Study on quality evaluation of Astragali Radix base on UPLC-MS [J]. Chin J Pharm Anal, 2020, 40(4):722
[7] 赵晨光,李存玉,杨珊,等.基于道地产区蒙古黄芪的质量差异性分析[J]. 中国中药杂志,2020,45(13):3183
ZHAO CG, LI CY, YANG S, et al. Analysis of qualitu difference based on Astragalus membranaceus var. mongholicus in genuine region [J]. China J Chin Mater Med, 2020, 45(13):3183
[8] 裴文菡,何凡,程青松,等.中药黄芪质量评价方法的研究进展[J]. 中国现代应用药学,2020,37(5):620
PEI WH, HE F, CHENG QS, et al. Research progress on the quality evaluation methods of traditional Chinese medicine Astragali Radix[J]. Chin J Mod Appl Pharm, 2020, 37(5):620
[9] 李航.统计学习方法[M]. 第2版. 北京:清华大学出版社,2019:91
LI H. Statistical Learning Methods[M]. 2nd Ed. Beijing:Tsinghua University Press,2019:91
[10] HASTIE T, TIBSHIRANI R, FRIEDMAN J. The Elements of Statistical Learning:Data Mining, Inference, and Prediction[M]. 2nd Ed. New York:Springer, 2009:119
[11] GRON A. Hands-on Machine Learning with Scikit-learn, Keras & Tensorflow:Concepts, Tools, and Techniques to Build Intelligent Systems [M]. Sebastopol:O’REILLY, 2017:134
[12] 习近平. 在教育文化卫生体育领域专家代表座谈会上的讲话[J]. 中华人民共和国国务院公报, 2020 (28):7
XI JP. Speech at the symposium for representative of expert in the fields of education, culture, health, and sports[J]. Gazette of the State Council of the People’s Republic of China, 2020(28):7
[13] 黄璐琦.对中医药发展规律及特点的传承与创新认识[J]. 中医杂志,2022,63(17):1601
HUANG LQ. Inheritance and innovation of the development rules and characteristics of traditional Chinese medicine[J]. J Tradit Chin Med, 2022, 63(17):1601
文章导航

/