-
Assets of imputation to ultra-high density for productive and functional traits.
The aim of this study was to evaluate different-density genotyping panels for genotype imputation and genomic prediction. Genotypes from customized Golden Gate Bovine3K BeadChip [LD3K; low-density (LD) 3,000-marker (3K); Illumina Inc., San Diego, CA] and BovineLD BeadChip [LD6K; 6,000-marker (6K); Illumina Inc.] panels were imputed to the BovineSNP50v2 BeadChip [50K; 50,000-marker; Illumina Inc.]. In addition, LD3K, LD6K, and 50K genotypes were imputed to a BovineHD BeadChip [HD; high-density 800,000-marker (800K) panel], and with predictive ability evaluated and compared subsequently. Comparisons of prediction accuracy were carried out using Random boosting and genomic BLUP. Four traits under selection in the Spanish Holstein population were used: milk yield, fat percentage (FP), somatic cell count, and days open (DO). Training sets at 50K density for imputation and prediction included 1,632 genotypes. Testing sets for imputation from LD to 50K contained 834 genotypes and testing sets for genomic evaluation included 383 bulls. The reference population genotyped at HD included 192 bulls. Imputation using BEAGLE software (http://faculty.washington.edu/browning/beagle/beagle.html) was effective for reconstruction of dense 50K and HD genotypes, even when a small reference population was used, with 98.3% of SNP correctly imputed. Random boosting outperformed genomic BLUP in terms of prediction reliability, mean squared error, and selection effectiveness of top animals in the case of FP. For other traits, however, no clear differences existed between methods. No differences were found between imputed LD and 50K genotypes, whereas evaluation of genotypes imputed to HD was on average across data set, method, and trait, 4% more accurate than 50K prediction, and showed smaller (2%) mean squared error of predictions. Similar bias in regression coefficients was found across data sets but regressions were 0.32 units closer to unity for DO when genotypes were imputed to HD density. Imputation to HD genotypes might produce higher stability in the genomic proofs of young candidates. Regarding selection effectiveness of top animals, more (2%) top bulls were classified correctly with imputed LD6K genotypes than with LD3K. When the original 50K genotypes were used, correct classification of top bulls increased by 1%, and when those genotypes were imputed to HD, 3% more top bulls were detected. Selection effectiveness could be slightly enhanced for certain traits such as FP, somatic cell count, or DO when genotypes are imputed to HD. Genetic evaluation units may consider a trait-dependent strategy in terms of method and genotype density for use in the genome-enhanced evaluations.
Jiménez-Montero JA
,Gianola D
,Weigel K
,Alenda R
,González-Recio O
... -
《-》
-
Genomic imputation and evaluation using high-density Holstein genotypes.
Genomic evaluations for 161,341 Holsteins were computed by using 311,725 of 777,962 markers on the Illumina BovineHD Genotyping BeadChip (HD). Initial edits with 1,741 HD genotypes from 5 breeds revealed that 636,967 markers were usable but that half were redundant. Holstein genotypes were from 1,510 animals with HD markers, 82,358 animals with 45,187 (50K) markers, 1,797 animals with 8,031 (8K) markers, 20,177 animals with 6,836 (6K) markers, 52,270 animals with 2,683 (3K) markers, and 3,229 nongenotyped dams (0K) with >90% of haplotypes imputable because they had 4 or more genotyped progeny. The Holstein HD genotypes were from 1,142 US, Canadian, British, and Italian sires, 196 other sires, 138 cows in a US Department of Agriculture research herd (Beltsville, MD), and 34 other females. Percentages of correctly imputed genotypes were tested by applying the programs findhap and FImpute to a simulated chromosome for an earlier population that had only 1,112 animals with HD genotypes and none with 8K genotypes. For each chip, 1% of the genotypes were missing and 0.02% were incorrect initially. After imputation of missing markers with findhap, percentages of genotypes correct were 99.9% from HD, 99.0% from 50K, 94.6% from 6K, 90.5% from 3K, and 93.5% from 0K. With FImpute, 99.96% were correct from HD, 99.3% from 50K, 94.7% from 6K, 91.1% from 3K, and 95.1% from 0K genotypes. Accuracy for the 3K and 6K genotypes further improved by approximately 2 percentage points if imputed first to 50K and then to HD instead of imputing all genotypes directly to HD. Evaluations were tested by using imputed actual genotypes and August 2008 phenotypes to predict deregressed evaluations of US bulls proven after August 2008. For 28 traits tested, the estimated genomic reliability averaged 61.1% when using 311,725 markers vs. 60.7% when using 45,187 markers vs. 29.6% from the traditional parent average. Squared correlations with future data were slightly greater for 16 traits and slightly less for 12 with HD than with 50K evaluations. The observed 0.4 percentage point average increase in reliability was less favorable than the 0.9 expected from simulation but was similar to actual gains from other HD studies. The largest HD and 50K marker effects were often located at very similar positions. The single-breed evaluation tested here and previous single-breed or multibreed evaluations have not produced large gains. Increasing the number of HD genotypes used for imputation above 1,074 did not improve the reliability of Holstein genomic evaluations.
VanRaden PM
,Null DJ
,Sargolzaei M
,Wiggans GR
,Tooker ME
,Cole JB
,Sonstegard TS
,Connor EE
,Winters M
,van Kaam JB
,Valentini A
,Van Doormaal BJ
,Faust MA
,Doak GA
... -
《-》
-
Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips.
With the availability of single nucleotide polymorphism (SNP) marker chips, such as the Illumina BovineSNP50 BeadChip (50K), genomic evaluation has been routinely implemented in dairy cattle breeding. However, for an average dairy producer, total costs associated with the 50K chip are still too high to have all the cows genotyped and genomically evaluated. To study the accuracy of cheaper low-density chips, genotypes were simulated for 2 low-density chips, the Illumina Bovine3K BeadChip (3K) and BovineLD BeadChip (6K), according to their original marker maps. Simulated missing genotypes of the 50K chip were imputed using the programs Beagle and Findhap. Three genotype data sets were used to study imputation accuracy: the EuroGenomics data set, with 14,405 reference bulls (data set I); the smaller EuroGenomics data set, with 11,670 older reference bulls (data set II); and the data set of all genotyped German Holsteins, with 31,597 reference animals (data set III). Imputed genotypes were compared with their original ones to calculate allele error rate for validation animals in the 3 data sets. To evaluate the loss in accuracy of genomic prediction when using imputed genotypes, a genomic evaluation was conducted only for EuroGenomics data set II. Furthermore, combined genome-enhanced breeding values calculated from the original and imputed genotypes were compared. Allele error rate for EuroGenomics data set II was highest for the Findhap program on the 3K chip (3.3%) and lowest for the Beagle program on the 6K chip (0.6%). Across the data sets, Beagle was shown to be about 2 times as accurate as Findhap. Compared with the real 50K genotypes, the reduction in reliability of the genomic prediction when using the imputed genotypes was highest for Findhap on the 3K chip (5.3%) and lowest for Beagle on the 6K chip (1%) when averaged over the 12 evaluated traits. Differences in genome-enhanced breeding values of the original and imputed genotypes were largest for Findhap on the 3K chip, whereas Beagle on the 6K chip had the smallest difference. The low-density chip, 6K, gave markedly higher imputation accuracy and more accurate genomic prediction than the 3K chip. On the basis of the relatively small reduction in accuracy of genomic prediction, we would recommend the BovineLD 6K chip for large-scale genotyping as long as its costs are acceptable to breeders.
Segelke D
,Chen J
,Liu Z
,Reinhardt F
,Thaller G
,Reents R
... -
《-》
-
Strategies for single nucleotide polymorphism (SNP) genotyping to enhance genotype imputation in Gyr (Bos indicus) dairy cattle: Comparison of commercially available SNP chips.
Genotype imputation is widely used as a cost-effective strategy in genomic evaluation of cattle. Key determinants of imputation accuracies, such as linkage disequilibrium patterns, marker densities, and ascertainment bias, differ between Bos indicus and Bos taurus breeds. Consequently, there is a need to investigate effectiveness of genotype imputation in indicine breeds. Thus, the objective of the study was to investigate strategies and factors affecting the accuracy of genotype imputation in Gyr (Bos indicus) dairy cattle. Four imputation scenarios were studied using 471 sires and 1,644 dams genotyped on Illumina BovineHD (HD-777K; San Diego, CA) and BovineSNP50 (50K) chips, respectively. Scenarios were based on which reference high-density single nucleotide polymorphism (SNP) panel (HDP) should be adopted [HD-777K, 50K, and GeneSeek GGP-75Ki (Lincoln, NE)]. Depending on the scenario, validation animals had their genotypes masked for one of the lower-density panels: Illumina (3K, 7K, and 50K) and GeneSeek (SGGP-20Ki and GGP-75Ki). We randomly selected 171 sires as reference and 300 as validation for all the scenarios. Additionally, all sires were used as reference and the 1,644 dams were imputed for validation. Genotypes of 98 individuals with 4 and more offspring were completely masked and imputed. Imputation algorithms FImpute and Beagle v3.3 and v4 were used. Imputation accuracies were measured using the correlation and allelic correct rate. FImpute resulted in highest accuracies, whereas Beagle 3.3 gave the least-accurate imputations. Accuracies evaluated as correlation (allelic correct rate) ranged from 0.910 (0.942) to 0.961 (0.974) using 50K as HDP and with 3K (7K) as low-density panels. With GGP-75Ki as HDP, accuracies were moderate for 3K, 7K, and 50K, but high for SGGP-20Ki. The use of HD-777K as HDP resulted in accuracies of 0.888 (3K), 0.941 (7K), 0.980 (SGGP-20Ki), 0.982 (50K), and 0.993 (GGP-75Ki). Ungenotyped individuals were imputed with an average accuracy of 0.970. The average top 5 kinship coefficients between reference and imputed individuals was a strong predictor of imputation accuracy. FImpute was faster and used less memory than Beagle v4. Beagle v4 outperformed Beagle v3.3 in accuracy and speed of computation. A genotyping strategy that uses the HD-777K SNP chip as a reference panel and SGGP-20Ki as the lower-density SNP panel should be adopted as accuracy was high and similar to that of the 50K. However, the effect of using imputed HD-777K genotypes from the SGGP-20Ki on genomic evaluation is yet to be studied.
Boison SA
,Santos DJ
,Utsunomiya AH
,Carvalheiro R
,Neves HH
,O'Brien AM
,Garcia JF
,Sölkner J
,da Silva MV
... -
《-》
-
Comparison of methods for the implementation of genome-assisted evaluation of Spanish dairy cattle.
The aim of this study was to evaluate methods for genomic evaluation of the Spanish Holstein population as an initial step toward the implementation of routine genomic evaluations. This study provides a description of the population structure of progeny tested bulls in Spain at the genomic level and compares different genomic evaluation methods with regard to accuracy and bias. Two bayesian linear regression models, Bayes-A and Bayesian-LASSO (B-LASSO), as well as a machine learning algorithm, Random-Boosting (R-Boost), and BLUP using a realized genomic relationship matrix (G-BLUP), were compared. Five traits that are currently under selection in the Spanish Holstein population were used: milk yield, fat yield, protein yield, fat percentage, and udder depth. In total, genotypes from 1859 progeny tested bulls were used. The training sets were composed of bulls born before 2005; including 1601 bulls for production and 1574 bulls for type, whereas the testing sets contained 258 and 235 bulls born in 2005 or later for production and type, respectively. Deregressed proofs (DRP) from January 2009 Interbull (Uppsala, Sweden) evaluation were used as the dependent variables for bulls in the training sets, whereas DRP from the December 2011 DRPs Interbull evaluation were used to compare genomic predictions with progeny test results for bulls in the testing set. Genomic predictions were more accurate than traditional pedigree indices for predicting future progeny test results of young bulls. The gain in accuracy, due to inclusion of genomic data varied by trait and ranged from 0.04 to 0.42 Pearson correlation units. Results averaged across traits showed that B-LASSO had the highest accuracy with an advantage of 0.01, 0.03 and 0.03 points in Pearson correlation compared with R-Boost, Bayes-A, and G-BLUP, respectively. The B-LASSO predictions also showed the least bias (0.02, 0.03 and 0.10 SD units less than Bayes-A, R-Boost and G-BLUP, respectively) as measured by mean difference between genomic predictions and progeny test results. The R-Boosting algorithm provided genomic predictions with regression coefficients closer to unity, which is an alternative measure of bias, for 4 out of 5 traits and also resulted in mean squared errors estimates that were 2%, 10%, and 12% smaller than B-LASSO, Bayes-A, and G-BLUP, respectively. The observed prediction accuracy obtained with these methods was within the range of values expected for a population of similar size, suggesting that the prediction method and reference population described herein are appropriate for implementation of routine genome-assisted evaluations in Spanish dairy cattle. R-Boost is a competitive marker regression methodology in terms of predictive ability that can accommodate large data sets.
Jiménez-Montero JA
,González-Recio O
,Alenda R
《-》