An LD (linkage disequilibrium) profile is a look-up table containing the expected correlation between SNPs given the genetic distance between them. The use of an LD profile can increase the accuracy of results by taking into account the expected correlation between SNPs. This function aids the user in creating their own LD profile.

create_LDprofile(dist, x, bin_size, max_dist = NULL, beta_params = FALSE)

Arguments

dist

A numeric vector, or a list of numeric vectors, containing the genetic distance for each SNP.

x

A matrix of SNP values, or a list of matrices. Columns represent chromosomes; rows are SNP locations. Hence, the number of rows should equal the length of the dist vector. SNPs should all be biallelic.

bin_size

The size of each bin, in the same units as dist.

max_dist

Optional. The maximum genetic distance to be considered. If this is not supplied, it will default to the maximum distance in the dist vector.

beta_params

Optional. Beta parameters are calculated if this is set to TRUE. Default is FALSE.

Value

A data frame containing an LD profile that can be used by other statistics in this package.

Details

The input for dist and x can be lists. This allows multiple datasets to be used in the creation of the LD profile. For example, using all 22 autosomes from the human genome would involve 22 different distance vectors and SNP matrices. Both lists should be the same length and should correspond exactly to each other (i.e. the distances in each element of dist should go with the SNPs in the same element of x)

In the output, bins represent lower bounds. The first bin contains pairs where the genetic distance is greater than or equal to 0 and less than bin_size. The final bin contains pairs where the genetic distance is greater than or equal to max_dist-bin_size and less than max_dist. If the max_dist is not an increment of bin_size, it will be adjusted to the next highest increment. The final bin will be the bin that max_dist falls into. For example, if the max_dist is given as 4.5 and the bin_size is 1, the final bin will be 4. max_dist should be big enough to cover the genetic distances between pairs of SNPs within the window size given when the \(Z_{\alpha}\) statistics are run. Any pairs with genetic distances bigger than max_dist will be assigned the values in the maximum bin of the LD profile.

By default, Beta parameters are not calculated. To fit a Beta distribution to the expected correlations, needed for the Zalpha_BetaCDF and Zbeta_BetaCDF statistics, beta_params should be set to TRUE and the package 'fitdistrplus' must be installed.

Ideally, an LD profile would be generated using data from a null population with no selection, For example by using a simulation if the other population parameters are known. However, often these are unknown or complex, so generating an LD profile using the same data as is being analysed is acceptable, as long as the bins are large enough.

References

Jacobs, G.S., T.J. Sluckin, and T. Kivisild, Refining the Use of Linkage Disequilibrium as a Robust Signature of Selective Sweeps. Genetics, 2016. 203(4): p. 1807

See also

Examples

## load the snps example dataset data(snps) ## Create an LD profile using this data create_LDprofile(snps$cM_distances,as.matrix(snps[,3:12]),0.001)
#> bin rsq sd Beta_a Beta_b n #> 1 0.000 0.10226664 0.1318628 NA NA 59 #> 2 0.001 0.14412346 0.1745857 NA NA 51 #> 3 0.002 0.09538328 0.1122312 NA NA 41 #> 4 0.003 0.11193736 0.1303776 NA NA 28 #> 5 0.004 0.19393939 0.1963924 NA NA 11
## To get the Beta distribution parameter estimates, the fitdistrplus package is required if (requireNamespace("fitdistrplus", quietly = TRUE)==TRUE) { create_LDprofile(snps$cM_distances,as.matrix(snps[,3:12]),0.001,beta_params=TRUE) }
#> $start.arg #> $start.arg$shape1 #> [1] 0.5319322 #> #> $start.arg$shape2 #> [1] 4.347826 #> #> #> $fix.arg #> NULL #> #> $start.arg #> $start.arg$shape1 #> [1] 0.5237006 #> #> $start.arg$shape2 #> [1] 2.942187 #> #> #> $fix.arg #> NULL #> #> $start.arg #> $start.arg$shape1 #> [1] 0.7421819 #> #> $start.arg$shape2 #> [1] 6.309295 #> #> #> $fix.arg #> NULL #> #> $start.arg #> $start.arg$shape1 #> [1] 0.7818732 #> #> $start.arg$shape2 #> [1] 5.433496 #> #> #> $fix.arg #> NULL #> #> $start.arg #> $start.arg$shape1 #> [1] 1.098982 #> #> $start.arg$shape2 #> [1] 3.856677 #> #> #> $fix.arg #> NULL #>
#> bin rsq sd Beta_a Beta_b n #> 1 0.000 0.10226664 0.1318628 0.8232427 6.435184 59 #> 2 0.001 0.14412346 0.1745857 0.7323277 3.960080 51 #> 3 0.002 0.09538328 0.1122312 0.9962786 8.322913 41 #> 4 0.003 0.11193736 0.1303776 1.0432924 7.113517 28 #> 5 0.004 0.19393939 0.1963924 1.4733393 4.979831 11