Distribution of Number of Rare Variants Appearing in Cases but Not Controls in Genome-wide Studies
MetadataShow full item record
Whole genome sequencing and whole exome sequencing are developing techniques to explore the associations between rare variants and complex diseases. The number of variants that are expected to appear in a randomly selected group that do not appear in a different group randomly selected from the same population has unknown mean and variance. Expressions for these quantities are derived here. Numerical values are calculated assuming that the frequency of a rare variant has a beta distribution using parameters estimated for four populations. Extensions to the number of variants that appear in r ( r >1) members of a randomly selected group with none in the comparison group are given. These calculations suggest that a genome wide study of rare variants would generate an extremely large number of false positives. Similarly, an exome wide search would also generate a smaller but still overwhelming number of false positives. A search restricted to variants in a specified gene would not generate excessive numbers of false positives. The expectations using the beta model fit a SNP database well when the underlying beta distribution was restricted to variant frequencies greater than 0.001.