Using high-resolution variant frequencies to empower clinical genome interpretation

This web page contains a suite of tools to support the use of allele frequency information for the assessment of rare genetic variants in Mendelian disease.

Distinguishing disease-causing variants from benign bystanders is perhaps the principal challenge in contemporary clinical genetics. Rarity of an allele is widely recognized as a necessary (though not sufficient) criterion for variant pathogenicity, but the key question “how common is too common?” remains poorly answered for many diseases. Recent large reference datasets, such as from the Exome Aggregation Consortium (ExAC), provide new opportunities for robust and rigorous variant assessment.

The methods and mathematical derivations behind the calculators on these pages are described fully in our manuscript available here. The source code for the manuscript is available on GitHub, as is the source code for these calculators.

We provide four calculators:

  • calculate AF - works step by step through a framework of variant assessment. For a disease of interest the user inputs parameters that describe the genetic architecture of the condition, and the calculator computes the maximum expected allele frequency of a disease-causing variant in the general population (maximum credible population AF). In a second step, the calculator determinues the maximum tolerated allele count in a specific reference population (such as ExAC), based on the size of the population and at a user-specified confidence level.

  • calculate AC - performs the second part of the above work-flow, allowing the user to simply input a maximum credible population AF without redefining the genetic architecture in detail, intended as a time saving measure for returning users.

  • explore architecture - starts by computing a maximum credible population AF for a given genetic architecture, as above. However, it also allows you to fix the maximum population AF in order to find a genetic architecture that is compatible with the observed data. For example, under your initial assumptions about a condition you may find that a variant is reported to be too common, but that it would be compatible with disease under a model of substantially reduced penetrance.

  • inverse AF - begins with an observed allele count, and computes an associated threshold filter allele frequency for a variant. If the filter allele frequency of a variant is above the maximum credible population AF for a condition of interest, then that variant should be filtered (ie not considered a candidate causative variant). This corresponds to the “filter_AF” annotation in the ExAC dataset. ExAC returns the value for a 95% confidence - here the user can choose from a range of thresholds.


    Please report any issues using the issue tracker.

    If the app fails to load as expected, please check that you are not accessing via a proxy server.

    alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware



Maximum credible population AF:

Maximum tolerated reference AC:

Notes

This calculator works step by step through a framework of variant assessment. For a disease of interest the user inputs parameters that describe the genetic architecture of the condition, and the calculator computes the maximum expected allele frequency of a disease-causing variant in the general population (maximum credible population AF). In a second step, the calculator determinues the maximum tolerated allele count in a specific reference population (such as ExAC), based on the size of the population and at a user-specified confidence level.

Define genetic architecture to calculate the maximum credible population AF:

Inheritance - select the mode of inheritance

Prevalence - the prevalence of the condition, expressed as e.g. 1/1000 people (rather than 1/2000 chromosomes, for example).

Genetic and allelic heterogeneity - genetic heterogeneity is the maximum proportion of disease attributable to variation in a single gene, and allelic heterogeneity is the maximum proportion of variation within a gene that is attributable to a single allele. For recessive conditions it is important to define these terms separately, as they have distinct effects on the architecture. For dominant conditions these can be combined if convenient - e.g. if a condition is caused by 5 genes, each harbouring 10 pathogenic alleles of equal prevalence, the user can set genetic heterogeneity = 0.2 and allelic heterogeneity = 0.1, or it may be more convenient or intuititive to leave genetic heterogeneity at 1, and set allelic heterogeneity = 0.02, directly indicated that no single variant causes more than 2% of cases.

Penetrance - select a value in the range 0-1 to represent penetrance

Define reference sample to calculate the corresponding maximum tolerated sample AC:

For a given true population AF, the calculator provides an upper limit to the likely sample AC, depending on the size of the population and the desired confidence.

Confidence - select in the range 0.9 - 0.999. This value represents the probability of observing a sample AC ≤ the reported maximum AC. Increasing the confidence level increases the maximum AC that would be considered compatible with disease-causation. Defaults to 0.95.

Reference population size - We recommend using the number of alleles successfully sequenced at the site (often denoted AN in the vcf file) rather than the full population size to calculate an accurate maximum AC. The stringency of the approach depends the reference population size: the smaller population, the wider the effective confidence interval will be. Defaults to 121,412, representing a variant succesfully genotyped in the entire ExAC population.

The homepage contains a link to the references for the mathematical derivations of these computations.



alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware

Maximum tolerated reference AC:

Notes

For a given maximum credible population AF, this calculator determinues the maximum tolerated allele count in a specific reference population (such as ExAC), based on the size of the population and at a user-specified confidence level.

Maximum population AF - this can be calculated from the genetic architecture on the calculate AF tab, or input directly here (intended to save time for returning users).

Reference population size - we recommend using the number of individuals successfully sequenced at the site rather than the full population size to calculate an accurate maximum AC. The stringency of the approach depends the reference population size: the smaller population, the wider the effective confidence interval will be. Defaults to 121,412, representing a variant succesfully genotyped in the entire ExAC population.

Confidence - select in the range 0.9 - 0.999. This value represents the probability of observing a sample AC \(\le\) the reported maximum AC. Increasing the confidence level increases the maximum AC that would be considered compatible with disease-causation. Defaults to 0.95.

The homepage contains a link to the references for the mathematical derivations of these computations.



alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware




Notes

Here we start by computing a maximum credible population AF for a given genetic architecture, as described under the calculate AF tab. It also allows you to fix the maximum population AF in order to find a genetic architecture that is compatible with the observed data. For example, under your initial assumptions about a condition you may find that a variant is reported to be too common, but that it would be compatible with disease under a model of substantially reduced penetrance.

The calculator takes any three parameters, and returns the fourth. Currently implemented for dominant conditions only.

Prevalence - the prevalence of the condition, expressed as e.g. 1/1000 people (rather than 1/2000 chromosomes, for example).

Heterogeneity - combines genetic heterogeneity and allelic heterogeneity the maximum proportion of cases attributable to any single allele (in any gene).

Penetrance - select a value in the range 0-1 to represent penetrance.

Maximum credible population AF - likely calculated using the inverse AF function, to correspond to an actual observed AC in the reference sample.

The homepage contains a link to the references for the mathematical derivations of these computations.



alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware

Filter allele frequency:

Notes

This effectively reverses the calculate AC function: starting with an observed allele count, it computes an associated threshold filter allele frequency for a variant. Technically, this is the highest disease-specific maximum credible population AF for which the observed AC is not compatible with pathogenicity. More practically, If the filter allele frequency of a variant is above the maximum credible population AF for a condition of interest, then that variant should be filtered (ie not considered a candidate causative variant).

The filter allele frequency corresponds to the “filter_AF” annotation in the ExAC dataset. The value in ExAC was computed for a 95% confidence - here the user can choose from a range of thresholds.

Observed population AC - e.g. in ExAC.

Reference population size - we recommend using the number of alleles successfully sequenced at the site (often denoted AN) rather than the full population size. Defaults to 121,412, representing a variant succesfully genotyped in the entire ExAC population.

Confidence - select in the range 0.9 - 0.999. This value represents the probability of observing a sample AC \(\le\) the reported maximum AC. Increasing the confidence level increases the maximum AC that would be considered compatible with disease-causation. Defaults to 0.95.

The homepage contains a link to the references for the mathematical derivations of these computations.



alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware



Estimated penetrance:

Notes

This calculator estimates the penetrance of a disease-associated variant based on the prevalence of the disease, the prevalence of the variant amongst individuals with the disease, and the frequency of the variant in the population, according to the method described in Minikel et al, 2016, and further discussed in this blog.




alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware

This web app providess a suite of tools to support the use of allele frequency information for the assessment of rare genetic variants in Mendelian disease.

The methods and mathematical derivations behind the calculators on these pages are described fully in our manuscript available here. The source code for the manuscript, along with all data and code necessary to reproduce the analyses, is available on GitHub.

These pages were built using Shiny, and the source code is freely available here.

The app is released under a GNU Lesser General Public License v2.1.

Please report any issues using the issue tracker.



alleleFrequencyApp - a Shiny App for allele frequency calculations Copyright © 2016 James Ware