-
Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips.
With the availability of single nucleotide polymorphism (SNP) marker chips, such as the Illumina BovineSNP50 BeadChip (50K), genomic evaluation has been routinely implemented in dairy cattle breeding. However, for an average dairy producer, total costs associated with the 50K chip are still too high to have all the cows genotyped and genomically evaluated. To study the accuracy of cheaper low-density chips, genotypes were simulated for 2 low-density chips, the Illumina Bovine3K BeadChip (3K) and BovineLD BeadChip (6K), according to their original marker maps. Simulated missing genotypes of the 50K chip were imputed using the programs Beagle and Findhap. Three genotype data sets were used to study imputation accuracy: the EuroGenomics data set, with 14,405 reference bulls (data set I); the smaller EuroGenomics data set, with 11,670 older reference bulls (data set II); and the data set of all genotyped German Holsteins, with 31,597 reference animals (data set III). Imputed genotypes were compared with their original ones to calculate allele error rate for validation animals in the 3 data sets. To evaluate the loss in accuracy of genomic prediction when using imputed genotypes, a genomic evaluation was conducted only for EuroGenomics data set II. Furthermore, combined genome-enhanced breeding values calculated from the original and imputed genotypes were compared. Allele error rate for EuroGenomics data set II was highest for the Findhap program on the 3K chip (3.3%) and lowest for the Beagle program on the 6K chip (0.6%). Across the data sets, Beagle was shown to be about 2 times as accurate as Findhap. Compared with the real 50K genotypes, the reduction in reliability of the genomic prediction when using the imputed genotypes was highest for Findhap on the 3K chip (5.3%) and lowest for Beagle on the 6K chip (1%) when averaged over the 12 evaluated traits. Differences in genome-enhanced breeding values of the original and imputed genotypes were largest for Findhap on the 3K chip, whereas Beagle on the 6K chip had the smallest difference. The low-density chip, 6K, gave markedly higher imputation accuracy and more accurate genomic prediction than the 3K chip. On the basis of the relatively small reduction in accuracy of genomic prediction, we would recommend the BovineLD 6K chip for large-scale genotyping as long as its costs are acceptable to breeders.
Segelke D
,Chen J
,Liu Z
,Reinhardt F
,Thaller G
,Reents R
... -
《-》
-
Assets of imputation to ultra-high density for productive and functional traits.
The aim of this study was to evaluate different-density genotyping panels for genotype imputation and genomic prediction. Genotypes from customized Golden Gate Bovine3K BeadChip [LD3K; low-density (LD) 3,000-marker (3K); Illumina Inc., San Diego, CA] and BovineLD BeadChip [LD6K; 6,000-marker (6K); Illumina Inc.] panels were imputed to the BovineSNP50v2 BeadChip [50K; 50,000-marker; Illumina Inc.]. In addition, LD3K, LD6K, and 50K genotypes were imputed to a BovineHD BeadChip [HD; high-density 800,000-marker (800K) panel], and with predictive ability evaluated and compared subsequently. Comparisons of prediction accuracy were carried out using Random boosting and genomic BLUP. Four traits under selection in the Spanish Holstein population were used: milk yield, fat percentage (FP), somatic cell count, and days open (DO). Training sets at 50K density for imputation and prediction included 1,632 genotypes. Testing sets for imputation from LD to 50K contained 834 genotypes and testing sets for genomic evaluation included 383 bulls. The reference population genotyped at HD included 192 bulls. Imputation using BEAGLE software (http://faculty.washington.edu/browning/beagle/beagle.html) was effective for reconstruction of dense 50K and HD genotypes, even when a small reference population was used, with 98.3% of SNP correctly imputed. Random boosting outperformed genomic BLUP in terms of prediction reliability, mean squared error, and selection effectiveness of top animals in the case of FP. For other traits, however, no clear differences existed between methods. No differences were found between imputed LD and 50K genotypes, whereas evaluation of genotypes imputed to HD was on average across data set, method, and trait, 4% more accurate than 50K prediction, and showed smaller (2%) mean squared error of predictions. Similar bias in regression coefficients was found across data sets but regressions were 0.32 units closer to unity for DO when genotypes were imputed to HD density. Imputation to HD genotypes might produce higher stability in the genomic proofs of young candidates. Regarding selection effectiveness of top animals, more (2%) top bulls were classified correctly with imputed LD6K genotypes than with LD3K. When the original 50K genotypes were used, correct classification of top bulls increased by 1%, and when those genotypes were imputed to HD, 3% more top bulls were detected. Selection effectiveness could be slightly enhanced for certain traits such as FP, somatic cell count, or DO when genotypes are imputed to HD. Genetic evaluation units may consider a trait-dependent strategy in terms of method and genotype density for use in the genome-enhanced evaluations.
Jiménez-Montero JA
,Gianola D
,Weigel K
,Alenda R
,González-Recio O
... -
《-》
-
Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips.
Genotype imputation is widely used as a cost-effective strategy in genomic evaluation of cattle. Key determinants of imputation accuracies, such as linkage disequilibrium patterns, marker densities, and ascertainment bias, differ between Bos indicus and Bos taurus breeds. Consequently, there is a need to investigate effectiveness of genotype imputation in indicine breeds. Thus, the objective of the study was to investigate strategies and factors affecting the accuracy of genotype imputation in Gyr (Bos indicus) dairy cattle. Four imputation scenarios were studied using 471 sires and 1,644 dams genotyped on Illumina BovineHD (HD-777K; San Diego, CA) and BovineSNP50 (50K) chips, respectively. Scenarios were based on which reference high-density single nucleotide polymorphism (SNP) panel (HDP) should be adopted [HD-777K, 50K, and GeneSeek GGP-75Ki (Lincoln, NE)]. Depending on the scenario, validation animals had their genotypes masked for one of the lower-density panels: Illumina (3K, 7K, and 50K) and GeneSeek (SGGP-20Ki and GGP-75Ki). We randomly selected 171 sires as reference and 300 as validation for all the scenarios. Additionally, all sires were used as reference and the 1,644 dams were imputed for validation. Genotypes of 98 individuals with 4 and more offspring were completely masked and imputed. Imputation algorithms FImpute and Beagle v3.3 and v4 were used. Imputation accuracies were measured using the correlation and allelic correct rate. FImpute resulted in highest accuracies, whereas Beagle 3.3 gave the least-accurate imputations. Accuracies evaluated as correlation (allelic correct rate) ranged from 0.910 (0.942) to 0.961 (0.974) using 50K as HDP and with 3K (7K) as low-density panels. With GGP-75Ki as HDP, accuracies were moderate for 3K, 7K, and 50K, but high for SGGP-20Ki. The use of HD-777K as HDP resulted in accuracies of 0.888 (3K), 0.941 (7K), 0.980 (SGGP-20Ki), 0.982 (50K), and 0.993 (GGP-75Ki). Ungenotyped individuals were imputed with an average accuracy of 0.970. The average top 5 kinship coefficients between reference and imputed individuals was a strong predictor of imputation accuracy. FImpute was faster and used less memory than Beagle v4. Beagle v4 outperformed Beagle v3.3 in accuracy and speed of computation. A genotyping strategy that uses the HD-777K SNP chip as a reference panel and SGGP-20Ki as the lower-density SNP panel should be adopted as accuracy was high and similar to that of the 50K. However, the effect of using imputed HD-777K genotypes from the SGGP-20Ki on genomic evaluation is yet to be studied.
Boison SA
,Santos DJ
,Utsunomiya AH
,Carvalheiro R
,Neves HH
,O'Brien AM
,Garcia JF
,Sölkner J
,da Silva MV
... -
《-》
-
Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation.
Genomic evaluations using genotypes from the Illumina Bovine3K BeadChip (3K) became available in September 2010 and were made official in December 2010. The majority of 3K-genotyped animals have been Holstein females. Approximately 5% of male 3K genotypes and between 3.7 and 13.9%, depending on registry status, of female genotypes had sire conflicts. The chemistry used for the 3K is different from that of the Illumina BovineSNP50 BeadChip (50K) and causes greater variability in the accuracy of the genotypes. Approximately 2% of genotypes were rejected due to this inaccuracy. A single nucleotide polymorphism (SNP) was determined to be not usable for genomic evaluation based on percentage missing, percentage of parent-progeny conflicts, and Hardy-Weinberg equilibrium discrepancies. Those edits left 2,683 of the 2,900 3K SNP for use in genomic evaluations. The mean minor allele frequencies (MAF) for Holstein, Jersey, and Brown Swiss were 0.32, 0.28, and 0.29, respectively. Eighty-one SNP had both a large number of missing genotypes and a large number of parent-progeny conflicts, suggesting a correlation between call rate and accuracy. To calculate a genomic predicted transmitting ability (GPTA) the genotype of an animal tested on a 3K is imputed to the 45,187 SNP included in the current genomic evaluation based on the 50K. The accuracy of imputation increases as the number of genotyped parents increases from none to 1 to both. The average percentage of imputed genotypes that matched the corresponding actual 50K genotypes was 96.3%. The correlation of a GPTA calculated from a 3K genotype that had been imputed to 50K and GPTA from its actual 50K genotype averaged 0.959 across traits for Holsteins and was slightly higher for Jerseys at 0.963. The average difference in GPTA from the 50K- and 3K-based genotypes across trait was close to 0. The evaluation system has been modified to accommodate the characteristics of the 3K. The low cost of the 3K has greatly increased genotyping of females. Prior to the availability of the 3K (August 2010), female genotyping accounted for 38.7% of the genotyped animals. In the past year, the portion of total genotypes from females across all chip types rose to 59.0%.
Wiggans GR
,Cooper TA
,Vanraden PM
,Olson KM
,Tooker ME
... -
《-》
-
Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle.
We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios.
Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K.
Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.
Khatkar MS
,Moser G
,Hayes BJ
,Raadsma HW
... -
《BMC GENOMICS》