-
Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction.
Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.
van den Berg I
,Meuwissen THE
,MacLeod IM
,Goddard ME
... -
《-》
-
Optimizing genomic prediction for Australian Red dairy cattle.
The reliability of genomic prediction is influenced by several factors, including the size of the reference population, which makes genomic prediction for breeds with a relatively small population size challenging, such as Australian Red dairy cattle. Including other breeds in the reference population may help to increase the size of the reference population, but the reliability of genomic prediction is also influenced by the relatedness between the reference and validation population. Our objective was to optimize the reference population for genomic prediction of Australian Red dairy cattle. A reference population comprising up to 3,248 Holstein bulls, 48,386 Holstein cows, 807 Jersey bulls, 8,734 Jersey cows, and 3,041 Australian Red cows and a validation population with between 208 and 224 Australian Red Bulls were used, with records for milk, fat, and protein yield, somatic cell count, fertility, and survival. Three different analyses were implemented: single-trait genomic best linear unbiased predictor (GBLUP), multi-trait GBLUP, and single-trait Bayes R, using 2 different medium-density SNP panels: the standard 50K chip and a custom array of variants that were expected to be enriched for causative mutations. Various reference populations were constructed containing the Australian Red cows and all Holstein and Jersey bulls and cows, all Holstein and Jersey bulls, all Holstein bulls and cows, all Holstein bulls, and a subset of the Holstein individuals varying the relatedness between Holsteins and Australian Reds and the number of Holsteins. Varying the relatedness between reference and validation populations only led to small changes in reliability. Whereas adding a limited number of closely related Holsteins increased reliabilities compared with within-breed prediction, increasing the number of Holsteins decreased the reliability. The multi-trait GBLUP, which considered the same trait in different breeds as correlated traits, yielded higher reliabilities than the single-trait GBLUP. Bayes R yielded lower reliabilities than multi-trait GBLUP and outperformed single-trait GBLUP for larger reference populations. Our results show that increasing the size of a multi-breed reference population may result in a reference population dominated by one breed and reduce the reliability to predict in other breeds.
van den Berg I
,MacLeod IM
,Reich CM
,Breen EJ
,Pryce JE
... -
《-》
-
Improved precision of QTL mapping using a nonlinear Bayesian method in a multi-breed population leads to greater accuracy of across-breed genomic predictions.
Genomic selection is increasingly widely practised, particularly in dairy cattle. However, the accuracy of current predictions using GBLUP (genomic best linear unbiased prediction) decays rapidly across generations, and also as selection candidates become less related to the reference population. This is likely caused by the effects of causative mutations being dispersed across many SNPs (single nucleotide polymorphisms) that span large genomic intervals. In this paper, we hypothesise that the use of a nonlinear method (BayesR), combined with a multi-breed (Holstein/Jersey) reference population will map causative mutations with more precision than GBLUP and this, in turn, will increase the accuracy of genomic predictions for selection candidates that are less related to the reference animals.
BayesR improved the across-breed prediction accuracy for Australian Red dairy cattle for five milk yield and composition traits by an average of 7% over the GBLUP approach (Australian Red animals were not included in the reference population). Using the multi-breed reference population with BayesR improved accuracy of prediction in Australian Red cattle by 2 - 5% compared to using BayesR with a single breed reference population. Inclusion of 8478 Holstein and 3917 Jersey cows in the reference population improved accuracy of predictions for these breeds by 4 and 5%. However, predictions for Holstein and Jersey cattle were similar using within-breed and multi-breed reference populations. We propose that the improvement in across-breed prediction achieved by BayesR with the multi-breed reference population is due to more precise mapping of quantitative trait loci (QTL), which was demonstrated for several regions. New candidate genes with functional links to milk synthesis were identified using differential gene expression in the mammary gland.
QTL detection and genomic prediction are usually considered independently but persistence of genomic prediction accuracies across breeds requires accurate estimation of QTL effects. We show that accuracy of across-breed genomic predictions was higher with BayesR than with GBLUP and that BayesR mapped QTL more precisely. Further improvements of across-breed accuracy of genomic predictions and QTL mapping could be achieved by increasing the size of the reference population, including more breeds, and possibly by exploiting pleiotropic effects to improve mapping efficiency for QTL with small effects.
Kemper KE
,Reich CM
,Bowman PJ
,Vander Jagt CJ
,Chamberlain AJ
,Mason BA
,Hayes BJ
,Goddard ME
... -
《-》
-
Estimation of genomic breeding values for residual feed intake in a multibreed cattle population.
Residual feed intake (RFI) is a measure of the efficiency of animals in feed utilization. The accuracies of GEBV for RFI could be improved by increasing the size of the reference population. Combining RFI records of different breeds is a way to do that. The aims of this study were to 1) develop a method for calculating GEBV in a multibreed population and 2) improve the accuracies of GEBV by using SNP associated with RFI. An alternative method for calculating accuracies of GEBV using genomic BLUP (GBLUP) equations is also described and compared to cross-validation tests. The dataset included RFI records and 606,096 SNP genotypes for 5,614 Bos taurus animals including 842 Holstein heifers and 2,009 Australian and 2,763 Canadian beef cattle. A range of models were tested for combining genotype and phenotype information from different breeds and the best model included an overall effect of each SNP, an effect of each SNP specific to a breed, and a small residual polygenic effect defined by the pedigree. In this model, the Holsteins and some Angus cattle were combined into 1 "breed class" because they were the only cattle measured for RFI at an early age (6-9 mo of age) and were fed a similar diet. The average empirical accuracy (0.31), estimated by calculating the correlation between GEBV and actual phenotypes divided by the square root of estimated heritability in 5-fold cross-validation tests, was near to that expected using the GBLUP equations (0.34). The average empirical and expected accuracies were 0.30 and 0.31, respectively, when the GEBV were estimated for each breed separately. Therefore, the across-breed reference population increased the accuracy of GEBV slightly, although the gain was greater for breeds with smaller number of individuals in the reference population (0.08 in Murray Grey and 0.11 in Hereford for empirical accuracy). In a second approach, SNP that were significantly (P < 0.001) associated with RFI in the beef cattle genomewide association studies were used to create an auxiliary genomic relationship matrix for estimating GEBV in Holstein heifers. The empirical (and expected) accuracy of GEBV within Holsteins increased from 0.33 (0.35) to 0.39 (0.36) and improved even more to 0.43 (0.50) when using a multibreed reference population. Therefore, a multibreed reference population is a useful resource to find SNP with a greater than average association with RFI in 1 breed and use them to estimate GEBV in another breed.
Khansefid M
,Pryce JE
,Bolormaa S
,Miller SP
,Wang Z
,Li C
,Goddard ME
... -
《-》
-
Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits.
Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.
Haile-Mariam M
,MacLeod IM
,Bolormaa S
,Schrooten C
,O'Connor E
,de Jong G
,Daetwyler HD
,Pryce JE
... -
《-》