Ingredient Analysis

Prediction of the cultivation age of American ginseng based on random forest*

Expand
  • National Institutes for Food and Drug Control, Beijing 102629, China

Received date: 2021-07-14

  Online published: 2024-06-24

Abstract

Objective: To authenticate the cultivation of American ginseng (AG) by using a random forest (RF) algorithm based on the physicochemical properties of AG. Methods: Nine physicochemical properties measured from 106 batches of AG samples with ages ranging from 2-4 years constituted the data set. The features of the AG include five saponins (Rg1, Re, Rb1, Rd, and F11), the content of alcohol and aqueous extractives, the length and the weight of AG, which were used as the inputs of the machine learning model. The total data were divided randomly into a training set and a validation set at a ratio of 4:1. RF was employed to build the machine learning model, while multivariate linear regression (MLR) was used as a benchmark algorithm. The impurity of the features in RF and the coefficient in MLR were calculated to rank the importance of features. The most important features were selected as new inputs to build the modified model. Results: The preliminary results showed that RF had a better performance than the MLR. Feature importance analysis indicated that five features including length, weight, content of aqueous extractives, content of ethanol extractives, Rb1 had a higher contribution to the predictive models. After training on these five features, two modified models were obtained, which showed higher accuracy than the original models. The modified RF model outperformed other models with an MSE value of 0.017 and R2 value of 0.950 for the validation data set and it was acceptable for the authentication of the growth year of AG. Conclusion: The modified RF model built in this study is accurate enough and can be used as a valuable tool to predict the cultivation age of AG.

Cite this article

HU Xiao-wen, YAN Hua, WEI Feng, MA Shuang-cheng . Prediction of the cultivation age of American ginseng based on random forest*[J]. Chinese Journal of Pharmaceutical Analysis, 2022 , 42(8) : 1418 -1423 . DOI: 10.16155/j.0254-1793.2022.08.15

References

[1] XIONG H, ZHANG AH, ZHAO QQ, et al. Discovery of quality-marker ingredients of Panax quinquefolius driven by high-throughput chinmedomics approach[J].Phytomedicine, 2020, 74: 152928
[2] 唐艳,闫述模,汪静静,等. 基于UPLC 及多成分分析的西洋参质量评价[J].中国中药杂志, 2016, 41(9): 1678
TANG Y, YAN SM, WANG JJ, et al. Quality evaluation of American ginseng using UPLC coupled with multivariate analysis[J].Chin J Chin Mater Med, 2016, 41(9): 1678
[3] GB/T 36397—2018 西洋参分等质量[S].2018
GB/T 36397—2018 Grade Quality of American ginseng[S].2018
[4] LIANG J, CHEN L, GUO YH, et al. Simultaneous determination and analysis of major ginsenosides in wild American ginseng grown in Tennessee[J].Chem Biodiv, 2019, 16(7): e1900203
[5] YANG L, HOU A, ZHANG J, et al. Panacis Quinquefolii Radix: a review of the botany, phytochemistry, quality control, pharmacology, toxicology and industrial applications research progress[J].Front Pharmacol, 2020, 11: 1876
[6] 杨洁瑜,王自,侯惠婵,等. 人参和西洋参染色的快速检测研究[J].今日药学, 2021, 31(6): 438
YANG JY, WANG Z, HOU HC, et al. Rapid detection of illegal dyes in Panax ginseng and Panax quinquefolium[J].Pharm Today, 2021, 31(6): 438
[7] 中华人民共和国药典2020年版.一部[S].2020: 136
ChP 2020. Vol Ⅰ [S].2020: 136
[8] QIAO X, QU C, LUO Q, et al. UHPLC-qMS spectrum-effect relationships for Rhizoma Paridis extracts[J].J Pharm Biomed Anal, 2021, 194: 113770
[9] SU R, WU H, LIU X, et al. Predicting drug-induced hepatotoxicity based on biological feature maps and diverse classification strategies[J].Brief Bioinfor, 2021, 22(1): 428
[10] SUN X, CHEN P, COOK SL, et al. Classification of cultivation locations of Panax quinquefolius L. samples using high performance liquid chromatography-electrospray ionization mass spectrometry and chemometric analysis[J].Anal Chem, 2012, 84(8): 3628
[11] PARK SE, SEO SH, KIM EJ, et al. Metabolomic approach for discrimination of cultivation age and ripening stage in ginseng berry using gas chromatography-mass spectrometry[J].Molecules, 2019, 24(21): 3837
[12] 严华,张慧秀,白宗利,等. 人参属西洋参、人参和三七特征图谱[J].中国现代中药, 2019, 21(11): 1512
YAN H, ZHANG HX, BAI ZL, et al. Finger-print of Panax quinquefolium, Panax ginseng and Panax notoginseng[J].Mod Chin Med,2019, 21(11): 1512
[13] SUN LX, YANG HB, LI J, et al. In silico prediction of compounds binding to human plasma proteins by QSAR models[J].Chem Med Chem, 2018, 13(6): 572
[14] PEI J, ZHENG Z, MERZ KM, et al. Random forest refinement of the KECSA2 knowledge-based scoring function for protein decoy detection[J].J Chem Inf Model, 2019, 59(5): 1919
[15] HUANG SH, TUNG CW, FULOP F, et al. Developing a QSAR model for hepatotoxicity screening of the active compounds in traditional Chinese medicines[J].Food Chem Toxicol, 2015, 78: 71
[16] LIU Y, ZHANG Y, LIU D, et al. Prediction of ESRD in IgA nephropathy patients from an Asian Cohort: a random forest model[J].Kidney Blood Press Res, 2018, 43(6): 1852
[17] XIA YG, SONG Y, LIANG J, et al. Quality analysis of American ginseng cultivated in Heilongjiang using UPLC-ESI(-)-MRM-MS with chemometric methods[J].Molecules, 2018, 23(9): 2396
[18] ZHAO H, XU J, GHEBREZADIK H, et al. Metabolomic quality control of commercial Asian ginseng, and cultivated and wild American ginseng using (1)H NMR and multi-step PCA[J].J Pharm Biomed Anal, 2015, 114: 113
[19] HU X, YAN H, WANG X, et al. Machine learning methods to predict the cultivation age of Panacis Quinquefolii Radix[J].Chin Med, 2021, 16(1): 100
Outlines

/