Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.
Genotyping sheep for genome-wide SNPs at lower density and imputing to a higher density would enable cost-effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low-density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50-475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single-breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%.
Bolormaa S
,Gore K
,van der Werf JH
,Hayes BJ
,Daetwyler HD
... -
《-》
Accuracy of genomic predictions for feed efficiency traits of beef cattle using 50K and imputed HD genotypes.
The accuracy of genomic predictions can be used to assess the utility of dense marker genotypes for genetic improvement of beef efficiency traits. This study was designed to test the impact of genomic distance between training and validation populations, training population size, statistical methods, and density of genetic markers on prediction accuracy for feed efficiency traits in multibreed and crossbred beef cattle. A total of 6,794 beef cattle data collated from various projects and research herds across Canada were used. Illumina BovineSNP50 (50K) and imputed Axiom Genome-Wide BOS 1 Array (HD) genotypes were available for all animals. The traits studied were DMI, ADG, and residual feed intake (RFI). Four validation groups of 150 animals each, including Angus (AN), Charolais (CH), Angus-Hereford crosses (ANHH), and a Charolais-based composite (TX) were created by considering the genomic distance between pairs of individuals in the validation groups. Each validation group had 7 corresponding training groups of increasing sizes ( = 1,000, 1,999, 2,999, 3,999, 4,999, 5,998, and 6,644), which also represent increasing average genomic distance between pairs of individuals in the training and validations groups. Prediction of genomic estimated breeding values (GEBV) was performed using genomic best linear unbiased prediction (GBLUP) and Bayesian method C (BayesC). The accuracy of genomic predictions was defined as the Pearson's correlation between adjusted phenotype and GEBV (), unless otherwise stated. Using 50K genotypes, the highest average achieved in purebreds (AN, CH) was 0.41 for DMI, 0.34 for ADG, and 0.35 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.21 for ADG, and 0.25 for RFI. Similarly, when imputed HD genotypes were applied in purebreds (AN, CH), the highest average was 0.14 for DMI, 0.15 for ADG, and 0.14 for RFI, whereas in crossbreds (ANHH, TX) it was 0.38 for DMI, 0.22 for ADG, and 0.24 for RFI. The of GBLUP predictions were greatly reduced with increasing genomic average distance compared to those from BayesC predictions. The results indicate that 50K genotypes, used with BayesC, are more effective for predicting GEBV in purebred cattle. Imputed HD genotypes found utility when dealing with composites and crossbreds. Formulation of a fairly large training set for genomic predictions in beef cattle should consider the genomic distance between the training and target populations.
Lu D
,Akanno EC
,Crowley JJ
,Schenkel F
,Li H
,De Pauw M
,Moore SS
,Wang Z
,Li C
,Stothard P
,Plastow G
,Miller SP
,Basarab JA
... -
《-》
Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.
Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic relationship matrix and the additive relationship matrix to a base at the time the breeds diverged, and regressed the genomic relationship matrix to account for sampling errors in estimating relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic relationship matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits.
Erbe M
,Hayes BJ
,Matukumalli LK
,Goswami S
,Bowman PJ
,Reich CM
,Mason BA
,Goddard ME
... -
《-》