After truncation and extension, sequences with fixed length were created

After truncation and extension, sequences with fixed length were created. were performed. We achieved overall sensitivity ARF3 of 81. 8%, precision of 64. 1% and area under the receiver operating characteristic curve (AUC) of 0. 728. == Conclusions == We have presented a reliable method for the identification of linear B cell epitope using antigens primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/. == Electronic supplementary material == The online version of this article (doi: 10. 1186/s12859-014-0414-y) contains supplementary material, which is accessible to authorized users. Keywords: B-cell, Linear epitope, Prediction, Multiple linear regression == Background == The humoral immune response is based on the amazing ability of antibodies to recognize and bind to antigens of intruding organisms, such as bacteria and viruses [1]. Antibodies hole specifically to a contiguous protein sequence of a protein known as the linear B-cell epitope or to a folded structure created by discontinuous amino acids known as the conformational B-cell epitope [2, 3]. Prediction of B-cell epitopes is critical to get immunological applications. Specifically, predicted peptides can be synthesized and can be used to replace the intact antigen molecules as reagents for detecting anti-protein antibodies in immunoassay [4], as immunogens for increasing anti-peptide antibodies to cross-react with the protein of interest [5], or in the development of synthetic peptide vaccines [6]. Although the majority of B-cell epitopes are conformational [7], most B-cell epitopes prediction methods concentrate on the easier linear epitopes [8]. Earliest linear B cell epitope prediction models were based on propensity profiling. Blythe and Flower [9] demonstrated that the propensity profiling methods cannot be used to reliably predict the epitope. Even the best propensity profiling method only yielded a success price marginally better than that produced randomly using a receiver operating characteristics (ROC) plot. Later, machine learning methods have been explored to improve the prediction performance [10-22]. However , many of these methods were developed on very small datasets (~872 epitopes and non-epitopes) with bad dataset that were randomly selected peptides instead of experimentally verified non-epitopes [23]. In this work, based on the antigens primary series information, a AZD5991 novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A large dataset called BEOD which was derived from BEOracle dataset [19] was used to train and test our model. It is worthwhile to note that all epitopes and non-epitopes of our BEOD dataset were experimentally verified. Nevertheless, experimental non-epitope data still have the potential to be epitopes due to flawed interpretation from the results AZD5991 or simple experimental errors [24]. Versions built on different subsets of such noisy bad dataset may produce very different results. In order to alleviate the noisy problem caused by the negative dataset and report a reliable prediction result of our model, we have performed 300 experiments utilizing 300 sub-datasets of which each negative sub-dataset was randomly selected from the BEOD bad dataset while each positive sub-dataset was the unchanged BEOD positive dataset. 10-fold cross-validation was used to evaluate the performance of our model. Our model produced average sensitivity (Sn) of 81. 8%, precision (P) of 64. 1% and area under the receiver operating characteristic curve (AUC) of 0. 728 AZD5991 over the 300 experiments. A web server EPMLR implementing linear B cell epitope prediction is available at: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/. == Results == == Sliding window size selection == To evaluate the effect of sliding window size n around the prediction performance, we conducted modelling trials on BEOD dataset using different windows sizes from 5 to 19, representing the range in which peptides can be synthesized relatively easily to get immune experiments. As shown in the Figure1, the F-measure value of 10-fold cross-validation test achieved its greatest value when the window size n was 15. Moreover, at 15 point, the F-measures obtained by the AZD5991 10-fold cross-validation test and the self-consistency test are very close to each other, which further validates the reliability from the performance using sliding windows size of 15. It is generally accepted the closer the F-measures obtained by the cross-validation.

Similar Posts