Both MinimumDistance and PennCNV utilize the log R ratios (LRRs) and B allele frequencies (BAFs) from the Illumina 610 Quad array probes to infer de novo deletions. The LRR is a standardized estimate of the probe intensity, quantifying the total number of allele copies at the locus of interest. The BAF is a standardized estimate for the proportion of the B allele's contribution to the total probe intensity, assessing genotype at the probe of interest. The BAF is standardized so homozygous genotypes in copy neutral states (two allele copies) have BAFs of approximately zero or one (for AA and BB genotypes, respectively), and heterozygous AB genotypes yield BAFs roughly equal to 0.5. As a quality control step, we excluded triads where any sample (father, mother or child) with whole genome amplified DNA.
The PennCNV algorithm for detecting de novo DNA copy number aberrations is based on a hidden Markov model (HMM), jointly modeling the unknown copy number states in all three triad members (Wang et al. 2008). The state transition probabilities are based on observed LRRs and BAFs in the samples, and the population BAF. Maximum likelihood methods are employed to identify the most likely copy number states in the father, mother and offspring, and these are encoded as a three digit numerical code. A normal DNA copy number (2 alleles) is designated as a 3, a hemizygous deletion (one allele copy) is indicated as a 2, and homozygous deletion (zero allele copies) is indicated as a 1. Thus, de novo deletions in offspring with genotypic normal parents are encoded as triad state `332' (loss of one allele copy in the child) or `331' (loss of both alleles). PennCNV addresses genomic waves by incorporating the population GC content at each marker into the HMM. While the joint PennCNV HMM considers all possible copy number states including inherited deletions (e.g. `322'), MinimumDistance was developed specifically for detecting de novo copy number changes since the computational demands of the joint PennCNV HMM are substantial, and false positive identifications of de novo deletions remain a concern even when the recommended PennCNV quality control procedures including genomic wave correction are employed. This MinimumDistance approach is based on the “minimum distance" statistic, capturing differences in copy number estimates between the offspring and each of the parents at each locus, making it robust to genomic waves and other probe specific artifacts by design (Scharpf et al. 2012). In particular when the samples of the triad members are hybridized on the same plate (which is the highly recommended and commonly employed approach), MinimumDistance is an effective approach for reducing technical and experimental sources of noise which can generate false positives. Following genome-wide segmentation of these minimum distances by circular binary segmentation (an extremely fast procedure), final inference regarding de novo copy number events is based on a posterior calling step on the inferred candidate regions. This procedure is about an order of magnitude faster than the joint PennCNV HMM. MinimumDistance uses the same code for the triad copy number states, where `332' and `331' represent de novo loss of alleles in the proband. All analyses using the MinimumDistance algorithm were carried out in the statistical environment R (http://cran.r-project.org/) using the packages DNACopy, GenomicRanges, GWASTools, IRanges, MinimumDistance, all available as free software via the Bioconductor (http://www.bioconductor.org/). The results of the MinimumDistance calls were incorporated into a BED file format suitable for use in the UCSC Genome Browser and submitted to the FaceBase Hub in January, 2013. This file contained de novo deletions from subjects of European ancestry drawn from both the oral cleft study and the dental caries study as a control group (see Younkin et al. 2014). No personal information on individual subjects was included in this BED file, and both deletions and amplifications were included (coded to display as amplifications as red and deletions as blue).
Scharpf RB, Beaty TH, Schwender H, Younkin SG, Scott AF, Ruczinski I: Fast detection of de novo copy number variants from SNP arrays for case-parent trios. BMC Bioinformatics 2012, 13:330.
Wang K, Chen Z, Tadesse MG, Glessner J, Grant SFA, Hakonarson H, Bucan M, Li M: Modeling genetic inheritance of copy number variations. Nucleic Acids Res 2008, 36(21):e138
Younkin SG, Scharpf RB, Schwender H, Parker MM, Scott AF, Marazita ML, Beaty TH, Ruczinski I. A genome-wide study of de novo deletions identifies a candidate locus for non-syndromic isolated cleft lip/palate risk. BMC Genetics 2014 15:24.