Plant Gene and Trait 2024, Vol.15 http://genbreedpublisher.com/index.php/pgt © 2024 GenBreed Publisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved.
Plant Gene and Trait 2024, Vol.15 http://genbreedpublisher.com/index.php/pgt © 2024 GenBreed Publisher, registered at the publishing platform that is operated by Sophia Publishing Group, founded in British Columbia of Canada. All Rights Reserved. GenBreed Publisher is an international Open Access publisher specializing in plant protection, plant breeding, molecular genetics, proteomics and genetic diversity registered at the publishing platform that is operated by Sophia Publishing Group (SPG), founded in British Columbia of Canada. Publisher GenBreed Publisher Editedby Editorial Team of Plant Gene and Trait Email: edit@pgt.genbreedpublisher.com Website: http://genbreedpublisher.com/index.php/pgt Address: 11388 Stevenston Hwy, PO Box 96016, Richmond, V7A 5J5, British Columbia Canada Plant Gene and Trait (ISSN 1925-2013) is an open access, peer reviewed journal published online by GenBreed Publisher. The journal publishes articles that address the fundamental nature of genes and genomes at any level, either experimental or computational approaches, in plants as well as algae, including applications of novel techniques to plant biology and plant trait improvement. All papers chosen for publishing should be innovative research work in fields of plant genes or traits, plant protection, plant breeding, particular in the areas of functional genomics, genomic tools, genome technologies, transgene, genome sequencing analysis, molecular genetics, proteomics, genetic diversity, heterosis, genetic characteristics, genetic modification, genotype-phenotype relationships, stress resistance characteristics, QTL analysis, biochemistry, physiology and morphology. All the articles published in Plant Gene and Trait are Open Access, and are distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. GenBreed Publisher uses CrossCheck service to identify academic plagiarism through the world’s leading plagiarism prevention tool, iParadigms, and to protect the original authors’ copyrights.
Plant Gene and Trait (online), 2024, Vol. 15 ISSN 1925-2013 http://genbreedpublisher.com/index.php/pgt © 2024 GenBreed Publisher, an online publishing platform of Sophia Publishing Group. All Rights Reserved. Sophia Publishing Group (SPG), founded in British Columbia of Canada, is a multilingual publisher Latest Content Predicting Wheat Response to Drought Using Machine Learning Algorithms Weichang Wu Plant Gene and Trait, 2024, Vol. 15, No. 1, pp.1-7 Bioinformatics Identification and Expression Profiles of SBP Family Genes in Cucumber (Cucumis sativus L.) Dongju Gao, Qin Zhang, Taibai Xu, Peng Zhou, Wenjing Cheng, Weiwei Zhang Plant Gene and Trait, 2024, Vol. 15, No. 1, pp.8-14 Genetic Mechanisms of Crop Disease Resistance: New Advances in GWAS Cheng Jiang Plant Gene and Trait, 2024, Vol. 15, No. 1, pp.15-22 Implementing Genomic Selection in Sugarcane Breeding Programs: Challenges and Opportunities Kaiwen Liang Plant Gene and Trait, 2024, Vol. 15, No. 1, pp.23-32 Marker-Assisted Selection in Cassava: From Theory to Practice Wenzhong Huang, Zhongmei Hong Plant Gene and Trait, 2024, Vol. 15, No. 1, pp.33-43 Glycosyltransferases and Xylan Biosynthesis in Poplar: Genetic Regulation and Implications for Wood Quality Yongquan Lu Plant Gene and Trait, 2024, Vol. 15, No. 1, pp.44-51
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 1 Research Article Open Access Predicting Wheat Response to Drought Using Machine Learning Algorithms Weichang Wu Jiugu MolBreed SciTech Ltd., Zhuji, 311800, Zhejiang, China Corresponding email: 3397575099@qq.com Plant Gene and Trait, 2024, Vol.15, No.1 doi: 10.5376/pgt.2024.15.0001 Received: 10 Dec., 2023 Accepted: 25 Jan., 2024 Published: 15 Feb., 2024 Copyright © 2024 Wu, This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Wu W.C., 2024, Predicting wheat response to drought using machine learning algorithms, Plant Gene and Trait, 15(1): 1-7 (doi: 10.5376/pgt.2024.15.0001) Abstract With the intensification of global climate change, drought poses a serious threat to agricultural output, so it is essential to find accurate forecasting methods. Machine learning algorithms such as support vector machines, neural networks and random forests have been widely used in modeling and forecasting wheat drought response. By analyzing multidimensional data during plant growth, these algorithms are able to identify key growth indicators and drought response factors, providing a powerful tool to improve the cultivation and management of drought resistance in wheat. This review summarizes the research progress in using machine learning algorithms to predict wheat crop response to drought, highlights the potential of machine learning in predicting wheat drought response, and suggests directions for future research to further improve the prediction accuracy and applicability of wheat drought resistance. Keywords Wheat; Drought response; Machine learning algorithms; Growth index; Drought disaster 1 Introduction As one of the world's most important food crops, wheat plays an indispensable role in maintaining food security and safeguarding human survival, and its high-yield and high-quality production is essential to meet the needs of the world's growing population. However, wheat production faces challenges from a variety of environmental pressures, the most significant of which is drought, which not only directly affects the growth and development of wheat, but also leads to a sharp decline in production, and triggers a global food crisis, which in turn threatens global food security (Zhang et al., 2023). In recent years, with the rapid development and wide application of machine learning technology, the agricultural field has gradually begun to use these advanced algorithms and models to solve many problems in wheat production, especially in predicting the response of wheat crops to drought. Machine learning models have shown great potential (Ding et al., 2020, IT Manager World, 23(6): 188-189). However, despite some progress in research, there are still some challenges and problems in practical applications, such as limitations in data acquisition, bottlenecks in model accuracy, and so on. Therefore, further research is needed on how to better use machine learning algorithms to predict wheat's response to drought. The purpose of this study is to systematically investigate the prediction of wheat response to drought, and to analyze its application in agricultural research from the perspective of machine learning model. By summarizing the existing research results, evaluating the advantages and limitations of machine learning models in predicting the effects of drought on wheat yield and quality, and looking forward to future research, the aim is to provide theoretical support and guidance for improving wheat drought resistance and ensuring food security. 2 Response Mechanism of Wheat to Drought 2.1 Changes in physiological processes Wheat showed a variety of physiological process changes in arid environment to adapt to water restriction stress. In the face of drought stress, the physiological processes of wheat plants have been adjusted and changed in many aspects. In response to water stress, wheat adopted a series of water regulation strategies. Plants reduce transpiration by regulating stomatal opening and closing to reduce water loss (Zhang et al., 2019). At the same time, root morphology and structure change to enhance water absorption and utilization, including the deep penetration of roots into the soil and the increase of capillary roots (Figure 1).
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 2 Figure 1 Physiological changes in wheat and barley genotypes in response to drought stress (Adopted from Sallam et al., 2019) The response of wheat to drought also involves the regulation of plant growth and development. In a water-constrained environment, wheat plants grow at a slower rate, including reduced leaf growth, stem elongation, and root development, to conserve water and adjust the allocation of carbon resources. At this time, plants may selectively retain or cut back some metabolic processes related to growth and development in order to adapt to drought stress. The metabolic pathways of plants are also adjusted under drought conditions. Wheat may have increased the activity of the antioxidant enzyme system in response to oxidative damage due to drought. At the same time, hormone levels also change, for example, the accumulation of abscisic acid may affect plant growth and development and stress response. The adjustment and change of these physiological processes work together to coordinate plant adaptation and help wheat better cope with the challenges of arid environments. 2.2 Reaction at the molecular level The molecular responses and regulatory changes of wheat under drought stress are very complex. In order to adapt to drought environment, plants regulate gene expression and signal transduction networks through a variety of molecular mechanisms, which involve the regulation of expression of many key genes, especially genes regulating stress response and antioxidant response (Rijal et al., 2021). At the molecular level, wheat can regulate multiple pathways to adapt to drought stress, and ABA (abscisic acid) pathway is a key signaling mechanism in stress response pathway, which can regulate the expression of multiple stress response genes. By regulating gene expression, ABA pathway activates stress-responsive genes such as LEA protein gene family, which encode proteins that help maintain cell stability and improve wheat tolerance to drought. The ABA signaling pathway also triggers stomatal closure, which is critical for reducing water evaporation. Under drought conditions, wheat can limit water loss in this way and improve leaf water use efficiency. Wheat also increases the surface area of the root system by regulating the growth pattern of the root system and secretes root secretions, thereby improving the water absorption capacity in response to drought stress. Drought conditions can lead to oxidative stress, producing excessive reactive oxygen species, causing damage to plant cells, and wheat will enhance the antioxidant system by activating the ABA pathway to reduce oxidative damage.
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 3 Drought response of wheat also involves the regulation of expression of many stress-related genes. These genes may encode protective proteins, such as proline aminolyase and antioxidant enzymes, to alleviate oxidative damage caused by stress. In addition, the regulation of several transcription factors and signal transduction elements is also key, which can initiate or inhibit gene expression in specific pathways and regulate stress response networks within cells. 2.3 Restrictions and limitations of traditional research methods The traditional research methods often have some restrictions and limitations in analyzing the mechanism of wheat response to drought. For the complex biological processes related to drought stress, traditional research methods are difficult to analyze comprehensively and efficiently. Traditional biological and physiological research is limited to the study of specific biological processes or biomolecules, which leads to a potentially incomplete understanding of the overall mechanisms of drought stress response (Mwadzingeni et al., 2016). Because drought stress involves complex interactions at the molecular, cellular, and tissue levels, traditional biological approaches alone may not provide a full insight into the complexity of these interactions. Traditional experimental operations are also limited by time and space. Drought is a gradual process, and it may take a long time to monitor the physiological and molecular changes of wheat under different drought degrees. In addition, due to the limitation of experimental environment, it is difficult for traditional methods to completely simulate the complex natural drought environment. Traditional methods may also have the problem of insufficient detection sensitivity. Some molecular changes or interactions may require more sensitive instruments or techniques to accurately capture, and traditional methods may not be able to meet this need, resulting in some subtle but critical molecular changes being overlooked or masked. 3 Machine Learning Model 3.1 Selection and application of typical machine learning models In studies exploring wheat's response to drought, typical machine learning models are widely used to predict and explain its response mechanism, and selecting an appropriate machine learning model is crucial to understanding wheat's drought response mechanism. In addition to wheat, machine learning models are also applied to other plants, taking corn as an example. In the field of genomics and epigenomics, machine learning models can help analyze the genomic data of corn and identify gene functions, regulatory networks and biological pathways. The application of these models provides important clues for gene editing and breeding of corn. Machine learning also plays a key role in the prediction and control of maize diseases. By analyzing disease data, the model can quickly identify disease types and provide corresponding prevention and control suggestions to help farmers prevent and control diseases in time. For the analysis of corn growth and ecological environment, machine learning models can also predict corn yield and adaptability according to various factors such as climate, soil and growth conditions, providing an important reference for agricultural production. In corn crop management and precision agriculture, machine learning techniques can optimize agricultural decisions based on real-time data, such as providing guidance on water use and fertilization, to improve corn growth quality and agricultural yield. Common machine learning models include decision trees, support vector machines, random forests, neural networks, regression models, etc. (Cai et al., 2021). Decision tree model has attracted much attention because it is easy to understand and interpret. It gradually generates decision rules by branch selection of data set. Support vector machines (SVMS) classify and regression data by constructing hyperplanes and are suitable for complex and nonlinear data sets. Random forest is an integrated model based on multiple decision trees, which can efficiently process a large number of features and data sets. Neural networks mimic the connection patterns of human brain neurons and are suitable for processing complex and large-scale data, but require more data volume and computational resources. Regression models are often used to predict the response of continuous variables such as wheat growth or yield. In the application of these models, it is necessary to consider the selection and preprocessing of data features, the optimization of model parameters, the problems of overfitting and underfitting, and the interpretability of models. In addition, for wheat drought response prediction, it is usually necessary to integrate multiple machine learning models to improve the accuracy and robustness of the prediction.
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 4 3.2 The results and enlightenment of model experiment The model experiment has obtained many achievements and enlightenment in exploring the response mechanism of wheat to drought. These experiments provide an opportunity to gain insight into the mechanisms and characteristics of wheat drought response. Through model experiments, researchers are able to identify and understand the physiological, molecular level changes in wheat under drought conditions and its response to environmental stress. This provides us with methods and strategies for improving drought resistance of wheat in agricultural production (Ahmed and Hussain, 2022). The model experiments also revealed the effects of drought on wheat growth and yield. By simulating and predicting the growth status and yield changes of wheat under different drought conditions, we can better assess the impact of drought on wheat planting yield, and provide scientific basis and advice for wheat planting under drought conditions. The model experiments also provide tools and methods for predicting and evaluating wheat response to drought. By building machine learning models, the researchers were able to predict wheat growth, yield changes and its response to drought stress in different drought scenarios. This provides an important reference for wheat variety improvement and agricultural management in the future. In the field of soybean research, model experiments have helped to identify the factors that affect the yield and quality of soybean under different growth conditions. Based on the analysis of climate, soil, plant characteristics and other data, the model can accurately predict soybean growth and yield changes, which helps farmers optimize land management and planting methods, and improve soybean yield and quality. The model experiments also provided insights into soybean diseases and pests. The model can identify common soybean diseases and insect pests, and predict their spread path and impact degree. This prediction facilitates the early implementation of necessary control measures to protect soybean crops from diseases and pests. Model experiments also play an important role in soybean breeding. Through the analysis of genomics and epigenomics, the model can more comprehensively understand the genetic characteristics and growth patterns of soybean varieties, and provide more accurate data support for seed selection and breeding. 3.3 Application of machine learning model to wheat drought response Machine learning model plays an important role in the study of wheat drought response. A team of researchers built a convolutional neural network model by collecting physiological data and environmental parameters (such as soil moisture, air temperature, humidity, etc.) of wheat at different growth stages during the experiment. The model can predict the growth state of wheat under different drought levels. The research team processed and labeled the data set, and then designed a deep convolutional neural network to optimize the model through training and validation to improve the prediction accuracy. Through the application of convolutional neural network, the research team can more accurately predict the growth of wheat under drought conditions. This case shows us the application prospect of advanced machine learning technology in the agricultural field. Using deep learning models such as convolutional neural network to solve agricultural problems not only improves the efficiency of agricultural production, but also improves the efficiency of agricultural production. It also promotes the innovative application of science and technology in the field of agriculture. By processing a large amount of data, the machine learning model can accurately identify and predict the growth situation and yield changes of wheat under drought conditions. Through the analysis of multi-dimensional data such as environmental data, genetic information and growth indicators, the machine learning model can identify the key factors affecting wheat resistance to drought. This provides beneficial decision support for agricultural production (Ji and Li, 2019, Journal of Tonghua Normal University, 40(6): 73-77). Through the predictive power of the model, agricultural practitioners can better plan planting strategies, select wheat varieties adapted to drought conditions, and develop effective agricultural management practices to minimize the impact of drought on wheat yields.
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 5 Machine learning models have also played a role in exploring and explaining the mechanism of wheat drought response. Through the analysis and pattern recognition of a large number of data, these models are helpful to understand the ways and key factors of wheat response to drought, and provide new ideas and methods for further study of wheat drought resistance mechanism. 4 Challenges and Opportunities 4.1 Data quality and availability The success of machine learning algorithms depends heavily on the quality and availability of data. In predicting wheat response to drought, data quality directly affects the accuracy and reliability of the algorithm, and high-quality data can provide the diversity and representativeness required by the model, but in the field of agriculture, data quality often faces multiple challenges (Ambarwari et al., 2020). Data quality problems can result from errors in the data collection process, including but not limited to missing data, outliers, labeling errors, or inaccurate labeling. To solve these problems, data cleaning, standardization and correction are needed to ensure the integrity and accuracy of data. Data availability is also a challenge. Agricultural data often comes from multiple sources, and the format and standards are inconsistent, so it needs to be integrated and unified. In addition, some data may not be publicly available or shared, making data acquisition difficult. To solve the problem of data quality and availability, data preprocessing technology, feature engineering and data integration methods should be integrated. At the same time, it is necessary to strengthen the standardization of data collection and data sharing in order to make more extensive use of high-quality data for the training and optimization of machine learning models. Effective handling of data quality and availability issues will help improve the accuracy and practicality of machine learning algorithms in predicting wheat drought response. 4.2 Model generalization ability The model generalization ability of a machine learning algorithm is a key factor in evaluating its performance on new data, which refers to the model's performance on previously unseen data. For predicting wheat response to drought, the generalization ability of the model determines its applicability and reliability in real scenarios. If a model performs well only on training data, but poorly on new data, it indicates that the model is overfitting. Overfitting means that the model overadapts to the characteristics of training data, resulting in poor generalization ability on new data. On the contrary, if the model performs well on both training data and new data, it indicates that the model has strong generalization ability (Cao et al., 2021). For the prediction of wheat response to drought, the model generalization ability is affected by data quality, model complexity and training methods. In order to improve the model generalization ability, appropriate model evaluation methods, such as cross-validation and data set partitioning, should be adopted. In addition, techniques such as feature selection and model regularization also help reduce overfitting and improve the generalization ability of the model. When machine learning algorithm is applied to wheat drought response prediction, evaluating the generalization ability of the model is an important step to ensure the reliability and practicability of the model. A model with good generalization ability can predict wheat response to drought more accurately and provide more accurate guidance and decision support for agricultural production. 4.3 Comparison with the effect of traditional research methods The advantages and limitations of the machine learning algorithm and the traditional research methods were compared for wheat drought response. Traditional research methods focus on laboratory observation, physiological testing and statistical analysis when exploring wheat drought response. These methods are conducive to in-depth understanding of physiological processes, but limited by the size and complexity of data, it is difficult to fully capture the comprehensive impact of drought on wheat growth and yield. Relatively speaking, machine learning algorithms rely on large data sets and algorithm learning to perform well in processing
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 6 large-scale data and pattern recognition, and their efficient data processing capabilities enable more accurate prediction and analysis of wheat responses under different drought conditions (Feng et al., 2019). However, machine learning models also have certain limitations, such as high dependence on data quality and label accuracy, and relatively weak interpretability of their results. In practical research, the combination of traditional methods and machine learning methods may be a more ideal path. The complementary approach of traditional methods, which drill down into physiological processes, and machine learning methods, which can process large-scale data more quickly and accurately, can help to better understand wheat's response to drought. This comprehensive application can make up for the limitations of a single method, and provide more accurate and efficient strategies and guidance for agricultural production. In the future, with the continuous development of technology and the continuous optimization of methods, the combination of machine learning and traditional research methods may become an important direction of wheat drought response research. 4.4 Model improvement suggestion In the machine learning model for predicting wheat response to drought, continuous improvement and optimization of the model is an important step to improve the accuracy and practicability of the prediction. The feature selection of the model is one of the keys. Through in-depth understanding of the physiological and molecular response mechanism of wheat to drought, more representative features can be extracted to enhance the accurate prediction ability of the model to drought response. Suitable feature selection can reduce the complexity of the model and improve the generalization ability of the model. The optimization of the model needs to consider the algorithm parameters and model architecture. For wheat drought response prediction, different machine learning algorithms can be explored and their parameters adjusted, such as support vector machines, decision trees, neural networks, etc., to find a more suitable model for the problem. At the same time, adjusting the hyperparameters and network structure of the model, such as increasing the number of layers and adjusting the learning rate, can help improve the model performance (Sundararajan et al., 2021). Data quality and quantity are also critical to model improvement. Ensuring the accuracy and completeness of the data, while collecting more and more comprehensive sample data, can help the model to better capture the complex response relationship of wheat to drought and improve the prediction accuracy. Model improvement also requires continuous verification and evaluation. The stability and generalization ability of the model are verified by cross-validation, maintaining validation set and other methods, so as to determine whether the model improvement is effective. 5 Conclusion and Prospect By reviewing a large number of previous studies, we found several important conclusions. Machine learning models show remarkable potential in analyzing wheat's response to drought, and can accurately predict wheat growth and yield under drought conditions. Secondly, the study shows that machine learning algorithms can use multi-source data, such as soil properties, meteorological data, and remote sensing information, to provide a more comprehensive perspective for predicting wheat drought response. Most importantly, these models significantly improve the forecasting accuracy and efficiency compared with traditional methods, providing more forward-looking and accurate decision support for wheat agricultural production. In the future, researchers can focus their research on several aspects. First, we need to further optimize and improve the machine learning algorithm to improve the accuracy and stability of the model in predicting wheat drought response. And explore the method of multi-model fusion, combining the advantages of different algorithms to build more powerful prediction models. In addition, the improvement of data quality and usability is also a focus of future attention, including in-depth analysis of data quality and the use of more laboratory and field validation data to ensure the robustness and adaptability of the model.
Plant Gene and Trait 2024, Vol.15, No.1, 1-7 http://genbreedpublisher.com/index.php/pgt 7 At the same time, strengthening the research on the physiological and molecular links between drought and wheat growth and development will contribute to a deeper understanding of drought response mechanisms, so as to better optimize agricultural production strategies, and then apply machine learning technology to actual agricultural production to develop data-driven agricultural management measures to promote wheat drought resistance and improve yield and quality. These future research directions will push machine learning algorithms to play a more significant role in predicting wheat's response to drought, providing more reliable solutions to the challenges climate change poses to agriculture. Acknowledgments The author appreciates the feedback from two anonymous peer reviewers on the manuscript of this study. Conflict of Interest Disclosure The author affirms that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest. References Ahmed M.U., and Hussain I., 2022, Prediction of wheat production using machine learning algorithms in northern areas of Pakistan, Telecommunications Policy, 46(6): 102370. Ambarwari A., Adrian Q.J., and Herdiyeni Y., 2020, Analysis of the effect of data scaling on the performance of the machine learning algorithm for plant identification, Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(1): 117-122. https://doi.org/10.29207/resti.v4i1.1517 Cai H.H., Peng J., Liu W.Y., Luo D.F., Wang Y.Z., Bai J.D., and Bai Z.J., 2021, Inversion and mapping of soil pH valve based on in-situ hyperspectral data in cotton field, Shuitu Baochi Tongbao (Bulletin of Soil and Water Conservation), 41(4): 189-195. Cao J., Zhang Z., Luo Y., Zhang L., Zhang J., Li Z., and Tao F., 2019, Wheat yield predictions at a county and field scale with deep learning, machine learning, and google earth engine, European Journal of Agronomy, 123: 126204. https://doi.org/10.1016/j.eja.2020.126204 Feng P., Wang B., Liu D.L., and Yu Q., 2019, Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia, Agricultural Systems, 173: 303-316. Mwadzingeni L., Shimelis H., Dube E., Laing M.D., and Tsilo T.J., 2016, Breeding wheat for drought tolerance: progress and technologies, Journal of Integrative Agriculture, 15(5): 935-943. https://doi.org/10.1016/S2095-3119(15)61102-9 Rijal B., Baduwal P., Chaudhary M., Chapagain S., Khanal S., Khanal S., and Poudel P.B., 2021, Drought stress impacts on wheat and its resistance mechanisms, Malaysian Journal of Sustainable Agriculture, 5(2): 67-76. Sallam A., Alqudah A.M., Dawood M.F.A., Baenziger P.S., and Börner A., 2019, Drought stress tolerance in wheat and barley: advances in physiology, breeding and genetics research, Int. J. Mol. Sci., 20(13): 3137. https://doi.org/10.3390/ijms20133137 PMid:31252573 PMCid:PMC6651786 Sundararajan K., Garg L., Srinivasan K., Bashir A.K., Kaliappan J., Ganapathy G.P., Selvaraj S.K., and Meena T., 2021, A contemporary review on drought modeling using machine learning approaches, Computer Modeling in Engineering and Sciences, 128 (2): 447-487. https://doi.org/10.32604/cmes.2021.015528 Zhang B.Y., Li X., and Zhang X.L., 2023, Influences of drought events on ecological resilience of Larix principis-rupprechtii and Pinus tabulaeformis, Hebei Nongye Daxue Xuebao (Journal of Agricultural University of Hebei), 46(4): 65-73. Zhang J.B., Xue X.P., Li N., Li H.Y., Zhang L., and Song J.P., 2019, Effects of drought stress on physiological characteristics and dry matter production of winter wheat during water critical period, Shamo yu Lüzhou Qixiang (Desert and Oasis Meteorology), 13(3): 124-130.
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 8 Research Report Open Access Bioinformatics Identification and Expression Profiles of SBP Family Genes in Cucumber (Cucumis sativus L.) DongjuGao *, QinZhang*, Taibai Xu, Peng Zhou, Wenjing Cheng , Weiwei Zhang Department of Plant Science and Technology, Shanghai Vocational College of Agriculture and Forestry, Shanghai, 201699, Shanghai, China * These authors contributed equally to this work Co-Corresponding emails: chengwj@shafc.edu.cn; zhangww@shafc.edu.cn Plant Gene and Trait, 2024, Vol.15, No.1 doi: 10.5376/pgt.2024.15.0002 Received: 12 Dec., 2023 Accepted: 20 Jan., 2024 Published: 19 Feb., 2024 Copyright © 2024 Gao et al., This article was first published in Molecular Plant Breeding in Chinese, and here was authorized to translate and publish the paper in English under the terms of Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Preferred citation for this article: Gao D.J., Zhang Q., Xu T.B., Zhou P., Cheng W.J., and Zhang W.W., 2024, Bioinformatics identification and expression profiles of SBP family genes in cucumber (Cucumis sativus L.), Plant Gene and Trait, 15(1): 8-14 (doi: 10.5376/pgt.2024.15.0002) Abstract SQUAMOSA promoter binding protein (SBP), as a plant-specific transcription factor, plays an important role in plant growth development. In this study, 15 SBP genes were identified from the cucumber genome by bioinformatics methods, and the physicochemical property, gene structure, phylogeny, and expression of these genes in different tissues were analyzed. The results showed that 15 genes were distributed on 4 chromosomes, and divided into 6 groups. Genes in the same group had similar structure and conservative motifs. Expression analysis shows CsSBP9, CsSBP12, CsSBP10, CsSBP3, CsSBP8and CsSBP7are expressed in all tissues, and the other genes are expressed in specific tissues, suggesting that SBP genes play an important role in the growth development of cucumber at different stages. This study lays a foundation for the further identification of cucumber SBP gene function. Keywords Cucumber; SBP; Gene structure; Gene expression 1 Introduction Transcription factors play a crucial role in plant growth and development. Currently, more than 60 transcription factors have been reported in plants, among which the SBP (SQUAMOSA Promoter Binding Protein) is a plant-specific transcription factor. Goldfish SBP1 and SBP2 were the first identified SBP genes, and they were named SBP (Klein et al., 1996) because they can bind to the promoter of the floral meristem identity gene SQUAMOSA. The SBP protein contains a conserved SBP domain of about 79 amino acids, generally consisting of two zinc finger structures (Zn1 and Zn2) and a conserved nuclear localization signal (NLS) (Cardon et al., 1999). Many species have been identified for the SBP gene family, including 16 in Arabidopsis, 18 in rice, 20 in tea, and 32 in bamboo, respectively (Preston and Hileman, 2013; Pan et al., 2017; Wang et al., 2018). Further functional studies have shown that SBP genes play important roles in plant growth and development, hormone, and stress signal transduction. Arabidopsis SPL9 and SPL15 are involved in the transition from vegetative to reproductive growth (Schwarz et al., 2008); rice OsSPL10 regulates the initiation of epidermal hair development (Lan et al., 2019); Arabidopsis SPL8 participates in flower and root development by responding to gibberellin signaling and also affects seed production (Unteand, 2003; Zhang et al., 2007); Overexpression of VpSBP16 in grape can enhance transgenic plants' tolerance to salt and drought stress (Hou et al., 2018). Cucumber (Cucumis sativus L.) is one of the important vegetables in the world. It is a valuable model plant for studying sex differentiation for its abundant floral types. Although the SBP genes play important roles in plant growth and development, the identification of the SBP gene family in cucumber has not been reported. This study identified 15 SBP genes from the cucumber genome, and analyzed their chromosomal location, gene structure, conserved motifs, and evolutionary relationships. The expression profiles of the SBPgenes in different cucumber organs were also analyzed, in order to provide references for further research on the functions of the cucumber SBPgenes.
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 9 2 Results and Analysis 2.1 Identification of SBPgenes in cucumber Using the Arabidopsis thaliana SBP protein sequence as a reference, the cucumber genome database was searched with the help of bidirectional BLAST method to confirm the presence of structural domains in the obtained SBP protein sequences. Eventually, a total of 15 SBPgenes were identified in cucumber and named CsSBP1-CsSBP15 based on their chromosomal locations (Table 1). The longest ORF (Open reading frame) among them was CsSBP12, which was 3 096 bp in length, while the shortest gene was CsSBP15, which was 426 bp. The molecular weights ranged from 15.86 kDa (CsSBP15) to 114.64 kDa (CsSBP12), and the pI values ranged from 5.79 (CsSBP9) to 9.23 (CsSBP11). Table 1 Information of SBP gene family in cucumber Gene name Gene ID ORF (bp) Isoelectric point (pI) Molecular weight (MW) CsSBP1 Csa1G001450 1140 6.54 42116.83 CsSBP2 Csa1G015680 1653 7.85 60182.15 CsSBP3 Csa1G039890 1647 8.17 60369.26 CsSBP4 Csa1G051590 945 8.82 34844.13 CsSBP5 Csa1G074980 489 6.09 18146.83 CsSBP6 Csa3G117960 1023 7.06 38272.58 CsSBP7 Csa3G151350 1746 6 64916.31 CsSBP8 Csa3G567830 1035 6.73 38621.64 CsSBP9 Csa3G664550 3042 5.79 111376.7 CsSBP10 Csa3G809420 1149 8.82 41157.74 CsSBP11 Csa4G631590 609 9.23 22405.77 CsSBP12 Csa4G664590 3096 8.74 114642.8 CsSBP13 Csa6G094760 987 8.82 36105.68 CsSBP14 Csa6G109120 894 8.81 32717.16 CsSBP15 Csa6G517960 426 6.29 15856.36 The 15 SBP genes were distributed on cucumber chromosomes 1, 3, 4, and 6, with the highest number of genes presented on chromosomes 1 and 3, with five genes each. Chromosome 6 contained three genes, while chromosome 4 contained two genes (Figure 1). Figure 1 Gene locations of SBPgenes in cucumber Image caption: Scale bar on the left represents the length of the chromosome (bp) 2.2 Phylogenetic analysis of SBP protein To further clarify the evolutionary relationships among the members of the cucumber SBP gene family, a phylogenetic tree was constructed using SBP family members from Arabidopsis and cucumber (Figure 2). The
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 10 results showed that the phylogenetic tree could be mainly divided into six major groups (Class I, Class II, Class III, Class IV, Class V, and Class VI). Class II did not contain any cucumber SBP genes, while the other five groups included members from both species. Class I had the largest number of members, with six CsSBP members. Genes located on the same branch of the phylogenetic tree were closely related orthologous genes. Analysis showed that cucumber and Arabidopsis have direct orthologous genes, such as CsSBP7 and AT5G18830, CsSBP14 and AT1G02085. In addition, there were also a large number of paralogous genes within cucumber, such as CsSBP2 and CsSBP3, CsSBP1 and CsSBP4, and CsSBP5 and CsSBP11, suggesting that SBP genes exist in cucumber in the form of a large number of homologous genes. Figure 2 Phylogenetic analysis of SBPs in Arabidopsis and cucumber 2.3 Analysis of gene structure and conserved motifs of SBP proteins To clarify the relationship between the structure, function, and phylogenetic history of the cucumber SBP gene family, we analyzed the conserved motifs and gene structures of the members of this gene family (Figure 3). Protein clustering showed that the 15 SBPgenes were divided into six groups (I, II, III, IV, V, and VI) containing four, two, two, two, four, and one member, respectively. Conserved motif analysis showed that CsSBP had three conserved motifs, namely Motif-1, Motif-2, and Motif-3. Except for CsSBP7, all 14 other proteins contained these three motifs. Motif-1 was located at the position of the zinc finger Zn2, Motif-2 was located at the position of the zinc finger Zn1, and Motif-3 belonged to the nuclear localization signal domain. In addition, members of the same group had similar motif compositions. For example, members of Class I only contained Motif 1, 2, and 3; Motif-9 only appeared in Class II; members of Class IV, in addition to containing Motif-1, 2, and 3, also had Motif-6 and Motif-10. Analysis of the SBPgene structure showed similar results, with SBP genes in the same group having roughly the same number of exons and introns. For example, most members of Class I contained two exons, members of Class II contained ten exons, and members of Class III, Class IV, and Class V contained three exons.
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 11 Figure 3 Phylogenetic tree, motif analysis and gene structure of cucumber SBPgenes Image caption: (A) Phylogenetic analysis of 15 cucumber SBP proteins; (B) Conserved motifs analysis; (C) Gene structure analysis; (D) Conserved motif sequence 2.4 Expression analysis of SBPgenes in different tissues Based on cucumber transcriptome data, the tissue-specific expression of the 15 SBP genes was analyzed in 23 tissues (Figure 4). The results showed that six genes (CsSBP9, CsSBP12, CsSBP10, CsSBP3, CsSBP8 and CsSBP7) were expressed in all tissues, especially CsSBP9 and CsSBP12 from Class II, which had very high expression levels in all 23 tissues. Other genes showed relatively high expression levels in specific tissues, such as CsSBP14, which had higher expression levels in the ovary and fruit skin, and CsSBP15, which was mainly expressed in the reproductive organs and had the highest expression levels in leaves and petioles. These results suggested that SBP genes with high expression levels in specific tissues may play important roles in the development of specific organs.
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 12 Figure 4 Expression analysis of cucumber SBP genes in different tissues Image caption: 1: Roots (4 week seedlings); 2: Hypocotyls (4 week seedlings); 3: Cotyledons (4 week seedlings); 4: Euphylla (4 week seedlings); 5: Root; 6: Stem; 7: Spire; 8: Petiole (Spire); 9: Old leaves; 10: Petiole (Old leaf); 11: Tendril; 12: Female flowers; 13: Male flower bud; 14: Male flowers; 15: Unfertilized ovary; 16: Pericarp (Unfertilized ovary); 17: Pulp (Unfertilized ovary); 18: Pericarp (One week after pollination); 19: Pulp (One week after pollination); 20: Pericarp (Two weeks after pollination); 21: Pulp (Two weeks after pollination); 22: Pericarp (Three weeks after pollination); 23: Pulp (Three weeks after pollination) 3 Discussion In this study, we identified and analyzed the cucumber SBP gene family members at the whole-genome level using bioinformatics tools, including their chromosome positions, phylogenetic relationships, conserved motifs, gene structures, and expression patterns. We identified 15 cucumber SBP genes in this study, which is similar to the number of SBP members in Arabidopsis (16 members), despite the cucumber genome (367 Mb) being approximately three times larger than the Arabidopsis genome (125 Mb) (Huang et al., 2009). Whole-genome duplication (WGD) is common in angiosperms and can result in gene duplication and the potential for new gene functions. Studies have shown that Arabidopsis experienced three rounds of WGD, with the most recent two rounds (α and β) playing important roles in the rapid expansion of genes, following a whole-genome triplication (γ) event in the common ancestor of flowering plants (Cannon et al., 2004). However, cucumber lacks the two most recent WGD events (Huang et al., 2009), which may explain why the number of SBP genes identified in this study is not higher than that in Arabidopsis. Phylogenetic analysis revealed that cucumber and Arabidopsis SBP members could be classified into six classes (Figure 4), with Class II lacking cucumber SBP members, suggesting that the genes in Class II of Arabidopsis may have undergone independent evolutionary events. Typically, genes of the same type have similar gene structures and motif compositions. Transcription factor domains and motifs are often related to protein interactions, transcriptional activity, and DNA binding (Liu et al., 1999). Eight cucumber members contained the conserved motif Motif-5 (ALSLLS), which corresponds to the target sequence of mRNA156 in Arabidopsis (Rhoades et al., 2002). Expression analysis showed that CsSBP9 was expressed in all tissues, and its Arabidopsis homolog SPL14 (AT1G20980) was expressed in cotyledons, leaves, roots, and floral organs. CsSBP14 was mainly expressed in reproductive organs, particularly in fruit flesh and skin, and its Arabidopsis homolog SPL8 was also mainly expressed in inflorescences and siliques. This suggested that these genes have similar biological functions in different species (Stone et al., 2005) and may have undergone convergent evolution (Qian and Zhang, 2014). Our bioinformatics analysis of the cucumber SBP gene family provides a theoretical basis for future studies on the functions of SBP transcription factors. 4 Materials and Methods 4.1 Identification of SBPgenes in cucumber In this study, the cucumber SBP family genes were identified using a bidirectional BLAST approach. Firstly, the Arabidopsis SBP protein sequences were aligned to the cucumber genome using TBtools software (e-value, 1e-5)
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 13 to search for members of the cucumber SBP family (Chen et al., 2020). The cucumber genome file was obtained from the cucumber genome database (ftp://cucurbitgenomics.org/pub/cucurbit/genome/cucumber/Chinese_long/v2/), while the Arabidopsis SBP protein sequences were downloaded from TAIR (https://www.arabidopsis.org/index.jsp). Subsequently, the cucumber SBP obtained from the previous step were confirmed using BLASTP (e-value, 1e-5) in NCBI (https://www.ncbi.nlm.nih.gov/). The SBP functional domains were analyzed using SMART (http://smart.embl.de/) to confirm that the selected proteins were cucumber SBP proteins. The isoelectric point and molecular weight of the cucumber SBP proteins were analyzed using the ProtParam platform (https://web.expasy.org/compute_pi/). 4.2 Chromosomal localization and phylogenetic analysis TBtools was used to identify the location and distribution of cucumber SBPgenes on chromosomes. Phylogenetic analysis was then performed using the SBP protein sequences of cucumber and Arabidopsis thaliana. TheMEGA X software was used to construct a phylogenetic tree using the neighbor-joining (NJ) method with 1 000 bootstrap replicates. Beautification of the phylogenetic tree was performed using Evolview V3 (https://www.evolgenius.info//evolview/#login). 4.3 Analysis of gene structure and conserved protein motifs TBtools was used to identify the gene structure of cucumber SBP genes, while MEME 5.0.5 (http://meme-suite.org/tools/meme) was used to identify the conserved protein motifs of cucumber SBP proteins. 4.4 Expression profiling analysis To investigate the expression profile of cucumber SBP genes in different organs, transcriptome data of various cucumber tissues were obtained from the NCBI website (Accession number: SRP071224), and the analysis methods were followed as described by Wei et al. (2016). TBtools was used to generate a heatmap of the expression profiles of cucumber SBPgenes. Acknowledgments This study was supported by the Shanghai Science and Technology Innovation Action Plan in the Agriculture Field (20392001300), the Shanghai Natural Science Foundation (20ZR1439600), the Young Talents Project of Shanghai Agricultural and Forestry Vocational College (A2-0273-20-01-16), and the Internal Project of Shanghai Agricultural and Forestry Vocational College (KY2-0000-20-01). Authors’ Contributions GDJ, ZQ, and ZWW are the designers and conductors of this experiments. GDJ, ZQ, and XTB performed the data analysis and wrote the draft of the manuscript. ZP and CWJ participated in experimental design and data analysis. ZWW conceived and supervised the project, guided the experimental design, data analysis, manuscript writing, and revising. All authors read and approved the final manuscript. Conflict of Interest Disclosure The authors affirm that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest. References Cannon S.B., Mitra A., Baumgarten A., Young N.D., and May G., 2004, The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana, BMC Plant Biol., 4(1): 10. Cardon G., Höhmann S., Klein J., Nettesheim K., Saedler H., and Huijser P., 1999, Molecular characterisation of the Arabidopsis SBP-box genes, Gene, 237(1): 91-104. https://doi.org/10.1016/S0378-1119(99)00308-X PMid:10524240 Chen C., Chen H., Zhang Y., Thomas H.R., Frank M.H., He Y., and Xia R., 2020, TBtools: an integrative toolkit developed for interactive analyses of big biological data, Molecular Plant, 13(8): 1194-1202. https://doi.org/10.1016/j.molp.2020.06.009
Plant Gene and Trait 2024, Vol.15, No.1, 8-14 http://genbreedpublisher.com/index.php/pgt 14 Hou H., Jia H., Yan Q., and Wang X., 2018, Overexpression of a SBP-Box gene (VpSBP16) from Chinese wild Vitis species in Arabidopsis improves salinity and drought stress tolerance, International Journal of Molecular Sciences, 19(4): 940. https://doi.org/10.3390/ijms19040940 PMid:29565279 PMCid:PMC5979544 Huang S.W., Li R.Q., and Vossen V.D.E.A., 2009, The genome of the cucumber, Cucumis sativus L., Nature Genetics, 41(12): 1275-1281. Klein J., Saedler H., and Huijse P., 1996, A new family of DNA binding proteins includes putative transcriptional regulators of the Antirrhinum majus floral meristem identity gene SQUAMOSA, Molecular and General Genetics, 250(1): 7-16. Lan T., Zheng Y., Su Z., Yu S., Song H., Zheng X., Lin G., and Wu W., 2019, OsSPL10, a SBP-Box gene, plays a dual Role in salt tolerance and trichome formation in rice (Oryza sativa L.), G3: Genes, Genomes, Genetics, 9(12): 4107-4114. https://doi.org/10.1534/g3.119.400700 PMid:31611344 PMCid:PMC6893181 Liu L., White M.J., and MacRae T.H., 1999, Transcription factors and their genes in higher plants, European Journal of Biochemistry, 262(2): 247-257. https://doi.org/10.1046/j.1432-1327.1999.00349.x Pan F., Wang Y., Liu H., Wu M., Chu W., Chen D., and Xiang Y., 2017, Genome-wide identification and expression analysis of SBP-like transcription factor genes in Moso bamboo (Phyllostachys edulis), BMC Genomics, 18(1): 486. https://doi.org/10.1186/s12864-017-3882-4 Preston J.C., and Hileman L.C., 2013, Functional evolution in the plant SQUAMOSA-PROMOTER BINDING PROTEIN-LIKE (SPL) gene family, Frontiers in Plant Sci., 4: 80. https://doi.org/10.3389/fpls.2013.00080 Qian W., and Zhang J., 2014, Genomic evidence for adaptation by gene duplication, Genome Res., 24(8): 1356-1362. https://doi.org/10.1101/gr.172098.114 PMid:24904045 PMCid:PMC4120088 Rhoades M.W., Reinhart B.J., Lim L.P., Burge C.B., Bartel B., and Bartel D.P., 2002, Prediction of plant microRNA targets, Cell, 110(4): 513-520. https://doi.org/10.1016/S0092-8674(02)00863-2 PMid:12202040 Schwarz S., Grande A.V., Bujdoso N., Saedler H., and Huijser P., 2008, The microRNA regulated SBP-box genes SPL9 and SPL15 control shoot maturation in Arabidopsis, Plant Mol. Biol., 67(1-2): 183-195. https://doi.org/10.1007/s11103-008-9310-z PMid:18278578 PMCid:PMC2295252 Stone J.M., Liang X., Nekl E.R., and Stiers J.J., 2005, Arabidopsis AtSPL14, a plant-specific SBP-domain transcription factor, participates in plant development and sensitivity to fumonisin B1, The Plant Journal, 41(5): 744-754. https://doi.org/10.1111/j.1365-313X.2005.02334.x Unteand S.U., Anna-Marie S., Paolo P., Madhuri G., Dario L., Heinz S., and Peter H., 2003, SPL8, an SBP-Box gene that affects pollen sac development in Arabidopsis, Plant Cell, 15(4): 1009-1019. https://doi.org/10.1105/tpc.010678 PMid:12671094 PMCid:PMC152345 Wang P., Chen D., Zheng Y., Jin S., Yang J., and Ye N., 2018, Identification and expression analyses of SBP-Box genes reveal their involvement in abiotic stress and hormone response in tea plant (Camellia sinensis), International Journal of Molecular Sciences, 19(11): 3404. https://doi.org/10.3390/ijms19113404 PMid:30380795 PMCid:PMC6274802 Wei G., Tian P., Zhang F., Qin H., Miao H., Chen Q., Hu Z., Cao L., Wang M., Gu X., Huang S., Chen M., and Wang G., 2016, Integrative analyses of nontargeted volatile profiling and transcriptome data provide molecular insight into VOC diversity in cucumber plants (Cucumis sativus), Plant Physiol., 172(1): 603-618. https://doi.org/10.1104/pp.16.01051 Zhang Y., Schwarz S., Saedler H., and Huijser P., 2007, SPL8, a local regulator in a subset of gibberellin-mediated developmental processes in Arabidopsis, Plant Molecular Biology, 63(3): 429-439. https://doi.org/10.1007/s11103-006-9099-6 PMid:17093870
RkJQdWJsaXNoZXIy MjQ4ODYzMg==