-
Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection.
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
Calus MPL
,Goddard ME
,Wientjes YCJ
,Bowman PJ
,Hayes BJ
... -
《-》
-
Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
Multibreed models are currently used in traditional US Department of Agriculture (USDA) dairy cattle genetic evaluations of yield and health traits, but within-breed models are used in genomic evaluations. Multibreed genomic models were developed and tested using the 19,686 genotyped bulls and cows included in the official August 2009 USDA genomic evaluation. The data were divided into training and validation sets. The training data set comprised bulls that were daughter proven and cows that had records as of November 2004, totaling 5,331 Holstein, 1,361 Jersey, and 506 Brown Swiss. The validation data set had 2,508 Holstein, 413 Jersey, and 185 Brown Swiss bulls that were unproven (no daughter information) in November 2004 and proven by August 2009. A common set of 43,385 single nucleotide polymorphisms (SNP) was used for all breeds. Three methods of multibreed evaluation were investigated. Method 1 estimated SNP effects separately within breed and then applied those breed-specific SNP estimates to the other breeds. Method 2 estimated a common set of SNP effects from combined genotypes and phenotypes of all breeds. Method 3 solved for correlated SNP effects within each breed estimated jointly using a multitrait model where breeds were treated as different traits. Across-breed genomic predicted transmitting ability (GPTA) and within-breed GPTA were compared using regressions to predict the deregressed validation data. Method 1 worked poorly, and coefficients of determination (R(2)) were much lower using training data from a different breed to estimate SNP effects. Correlations between direct genomic values computed using training data from different breeds were less than 30% and sometimes negative. Across-breed GPTA from method 2 had higher R(2) values than parent average alone but typically produced lower R(2) values than the within-breed GPTA. The across-breed R(2) exceeded the within-breed R(2) for a few traits in the Brown Swiss breed, probably because information from the other breeds compensated for the small numbers of Brown Swiss training animals. Correlations between within-breed GPTA and across-breed GPTA ranged from 0.91 to 0.93. The multibreed GPTA from method 3 were significantly better than the current within-breed GPTA, and adjusted R(2) for protein yield (the only trait tested for method 3) were highest of all methods for all breeds. However, method 3 increased the adjusted R(2) by only 0.01 for Holsteins, ≤0.01 for Jerseys, and 0.01 for Brown Swiss compared with within-breed predictions.
Olson KM
,VanRaden PM
,Tooker ME
《-》
-
Predicting the effect of reference population on the accuracy of within, across, and multibreed genomic prediction.
Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.
van den Berg I
,Meuwissen THE
,MacLeod IM
,Goddard ME
... -
《-》
-
Optimizing genomic prediction for Australian Red dairy cattle.
The reliability of genomic prediction is influenced by several factors, including the size of the reference population, which makes genomic prediction for breeds with a relatively small population size challenging, such as Australian Red dairy cattle. Including other breeds in the reference population may help to increase the size of the reference population, but the reliability of genomic prediction is also influenced by the relatedness between the reference and validation population. Our objective was to optimize the reference population for genomic prediction of Australian Red dairy cattle. A reference population comprising up to 3,248 Holstein bulls, 48,386 Holstein cows, 807 Jersey bulls, 8,734 Jersey cows, and 3,041 Australian Red cows and a validation population with between 208 and 224 Australian Red Bulls were used, with records for milk, fat, and protein yield, somatic cell count, fertility, and survival. Three different analyses were implemented: single-trait genomic best linear unbiased predictor (GBLUP), multi-trait GBLUP, and single-trait Bayes R, using 2 different medium-density SNP panels: the standard 50K chip and a custom array of variants that were expected to be enriched for causative mutations. Various reference populations were constructed containing the Australian Red cows and all Holstein and Jersey bulls and cows, all Holstein and Jersey bulls, all Holstein bulls and cows, all Holstein bulls, and a subset of the Holstein individuals varying the relatedness between Holsteins and Australian Reds and the number of Holsteins. Varying the relatedness between reference and validation populations only led to small changes in reliability. Whereas adding a limited number of closely related Holsteins increased reliabilities compared with within-breed prediction, increasing the number of Holsteins decreased the reliability. The multi-trait GBLUP, which considered the same trait in different breeds as correlated traits, yielded higher reliabilities than the single-trait GBLUP. Bayes R yielded lower reliabilities than multi-trait GBLUP and outperformed single-trait GBLUP for larger reference populations. Our results show that increasing the size of a multi-breed reference population may result in a reference population dominated by one breed and reduce the reliability to predict in other breeds.
van den Berg I
,MacLeod IM
,Reich CM
,Breen EJ
,Pryce JE
... -
《-》
-
Including overseas performance information in genomic evaluations of Australian dairy cattle.
In dairy cattle, the rate of genetic gain from genomic selection depends on reliability of direct genomic values (DGV). One option to increase reliabilities could be to increase the size of the reference set used for prediction, by using genotyped bulls with daughter information in countries other than the evaluating country. The increase in reliabilities of DGV from using this information will depend on the extent of genotype by environment interaction between the evaluating country and countries contributing information, and whether this is correctly accounted for in the prediction method. As the genotype by environment interaction between Australia and Europe or North America is greater than between Europe and North America for most dairy traits, ways of including information from other countries in Australian genomic evaluations were examined. Thus, alternative approaches for including information from other countries and their effect on the reliability and bias of DGV of selection candidates were assessed. We also investigated the effect of including overseas (OS) information on reliabilities of DGV for selection candidates that had weaker relationships to the current Australian reference set. The DGV were predicted either using daughter trait deviations (DTD) for the bulls with daughters in Australia, or using this information as well as OS information by including deregressed proofs (DRP) from Interbull for bulls with only OS daughters in either single trait or bivariate models. In the bivariate models, DTD and DRP were considered as different traits. Analyses were performed for Holstein and Jersey bulls for milk yield traits, fertility, cell count, survival, and some type traits. For Holsteins, the data used included up to 3,580 bulls with DTD and up to 5,720 bulls with only DRP. For Jersey, about 900 bulls with DTD and 1,820 bulls with DRP were used. Bulls born after 2003 and genotyped cows that were not dams of genotyped bulls were used for validation. The results showed that the combined use of DRP on bulls with OS daughters only and DTD for Australian bulls in either the single trait or bivariate model increased the coefficient of determination [(R(2)) (DGV,DTD)] in the validation set, averaged across 6 main traits, by 3% in Holstein and by 5% in Jersey validation bulls relative to the use of DTD only. Gains in reliability and unbiasedness of DGV were similar for the single trait and bivariate models for production traits, whereas the bivariate model performed slightly better for somatic cell count in Holstein. The increase in R(2) (DGV,DTD) as a result of using bulls with OS daughters was relatively higher for those bulls and cows in the validation sets that were less related to the current reference set. For example, in Holstein, the average increase in R(2) for milk yield traits when DTD and DRP were used in a single trait model was 23% in the least-related cow group, but only 3% in the most-related cow group. In general, for both breeds the use of DTD from domestic sources and DRP from Interbull in a single trait or bivariate model can increase reliability of DGV for selection candidates.
Haile-Mariam M
,Pryce JE
,Schrooten C
,Hayes BJ
... -
《-》