CardioBoost is a disease-specific machine learning classifier to predict the pathogenicity of rare (gnomAD Allele Frequency <=0.1%) missense variant in genes associated with cardiomyopathies and arrhythmias that outperforms existing genome-wide prediction tools.
The methods and evaluations are described fully in our following publication:
Zhang, X., Walsh, R., Whiffin, N. et al. Disease-specific variant pathogenicity prediction significantly improves variant interpretation in inherited cardiac conditions. Genet Med (2020). https://doi.org/10.1038/s41436-020-00972-3
The source code and data to reproduce our model development and validation analyses can be found on GitHub. The web app was built using Shiny.
Although we showed the benefits of the proposed model for gene-disease classification and its superiority over existing genome-wide machine learning tools, we emphasize that CardioBoost is not intended to use as a standalone clinical decision tool to replace the whole ACMG guidelines (Richards et al. 2015) for clinical variant interpretation. For example, a variant with higher than 90% pathogenicity score predicted by CardioBoost as pathogenic shouldn’t straightforwardly be interpreted as Pathogenic without integrating other lines of evidences. Therefore, in the context of inherited cardiac conditions, the clinically-relevant classification by CardioBoost is intended to use as the evidence PP3 within ACMG guidelines as a more reliable and accurate computational tool over genome-wide ones in supporting variant interpretation in Cardiomyopathies and Arrhythmias.
Cardioboost has been found to have higher accuracy than existing tools for classification of known variants in genes associated with ICCs. However, for some genes the training and test data remain sparse, and so estimates of performance for those genes have wide confidence intervals. The tool is not intended as a substitute for validated clinical interpretation approaches in any circumstance, and particular care should be taken in considering classifications of variants in genes where gold-standard data are sparse.
In particular, the genes associated with cardiomyopathies having sparse training and test data are: ACTC1,DES,GLA,LAMP2, MYL2,MYL3, PRKAG2 and PTPN11. Likewise, the following genes associated with inherited arrhythmia syndromes having spare training and test data: CALM1,CALM2 and CALM3. The confidence to evaluate the prediction performances on those genes is limited by the size of interpreted variants on those genes.
The following tables display the genes related to the conditions and only the genes with known pathogenic variants in our curated data sets would be included.
Gene Symbol | Ensemble Gene ID | Ensemble Transcript ID | Ensemble Protein ID |
---|---|---|---|
ACTC1 | ENSG00000159251 | ENST00000290378 | ENSP00000290378 |
DES | ENSG00000175084 | ENST00000373960 | ENSP00000363071 |
GLA | ENSG00000102393 | ENST00000218516 | ENSP00000218516 |
LAMP2 | ENSG00000005893 | ENST00000200639 | ENSP00000200639 |
LMNA | ENSG00000160789 | ENST00000368300 | ENSP00000357283 |
MYBPC3 | ENSG00000134571 | ENST00000545968 | ENSP00000442795 |
MYH7 | ENSG00000092054 | ENST00000355349 | ENSP00000347507 |
MYL2 | ENSG00000111245 | ENST00000228841 | ENSP00000228841 |
MYL3 | ENSG00000160808 | ENST00000395869 | ENSP00000379210 |
PLN | ENSG00000198523 | ENST00000357525 | ENSP00000350132 |
PRKAG2 | ENSG00000106617 | ENST00000287878 | ENSP00000287878 |
PTPN11 | ENSG00000179295 | ENST00000351677 | ENSP00000340944 |
SCN5A | ENSG00000183873 | ENST00000333535 | ENSP00000328968 |
TNNI3 | ENSG00000129991 | ENST00000344887 | ENSP00000341838 |
TNNT2 | ENSG00000118194 | ENST00000367318 | ENSP00000356287 |
TPM1 | ENSG00000140416 | ENST00000403994 | ENSP00000385107 |
Gene Symbol | Ensemble Gene ID | Ensemble Transcript ID | Ensemble Protein ID |
---|---|---|---|
CACNA1C | ENSG00000151067 | ENST00000399655 | ENSP00000382563 |
CALM1 | ENSG00000198668 | ENST00000356978 | ENSP00000349467 |
CALM2 | ENSG00000143933 | ENST00000272298 | ENSP00000272298 |
CALM3 | ENSG00000160014 | ENST00000291295 | ENSP00000291295 |
KCNH2 | ENSG00000055118 | ENST00000262186 | ENSP00000262186 |
KCNQ1 | ENSG00000053918 | ENST00000155840 | ENSP00000155840 |
SCN5A | ENSG00000183873 | ENST00000333535 | ENSP00000328968 |
Variant classification is based on the pathogenic probability predicted by CardioBoost. According to the ACMG guidelines, we use Pr>=0.9 as the high classification certainty threshold to classify variants. A variant with lower than 90% classification probability is considered as indeterminate with low classification confidence level. In short, a variant is classified given its predicted pathogenicity:
There are mainly three reasons that CardioBoost would not return any prediction:
The data provided here is available under the ODC Open Database License (ODbL) : you are free to share and modify the data provided here as long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the dataset under the same ODbL license.
The app is released under a GNU Lesser General Public License v2.1.