-
Multibreed genomic evaluations using purebred Holsteins, Jerseys, and Brown Swiss.
Multibreed models are currently used in traditional US Department of Agriculture (USDA) dairy cattle genetic evaluations of yield and health traits, but within-breed models are used in genomic evaluations. Multibreed genomic models were developed and tested using the 19,686 genotyped bulls and cows included in the official August 2009 USDA genomic evaluation. The data were divided into training and validation sets. The training data set comprised bulls that were daughter proven and cows that had records as of November 2004, totaling 5,331 Holstein, 1,361 Jersey, and 506 Brown Swiss. The validation data set had 2,508 Holstein, 413 Jersey, and 185 Brown Swiss bulls that were unproven (no daughter information) in November 2004 and proven by August 2009. A common set of 43,385 single nucleotide polymorphisms (SNP) was used for all breeds. Three methods of multibreed evaluation were investigated. Method 1 estimated SNP effects separately within breed and then applied those breed-specific SNP estimates to the other breeds. Method 2 estimated a common set of SNP effects from combined genotypes and phenotypes of all breeds. Method 3 solved for correlated SNP effects within each breed estimated jointly using a multitrait model where breeds were treated as different traits. Across-breed genomic predicted transmitting ability (GPTA) and within-breed GPTA were compared using regressions to predict the deregressed validation data. Method 1 worked poorly, and coefficients of determination (R(2)) were much lower using training data from a different breed to estimate SNP effects. Correlations between direct genomic values computed using training data from different breeds were less than 30% and sometimes negative. Across-breed GPTA from method 2 had higher R(2) values than parent average alone but typically produced lower R(2) values than the within-breed GPTA. The across-breed R(2) exceeded the within-breed R(2) for a few traits in the Brown Swiss breed, probably because information from the other breeds compensated for the small numbers of Brown Swiss training animals. Correlations between within-breed GPTA and across-breed GPTA ranged from 0.91 to 0.93. The multibreed GPTA from method 3 were significantly better than the current within-breed GPTA, and adjusted R(2) for protein yield (the only trait tested for method 3) were highest of all methods for all breeds. However, method 3 increased the adjusted R(2) by only 0.01 for Holsteins, ≤0.01 for Jerseys, and 0.01 for Brown Swiss compared with within-breed predictions.
Olson KM
,VanRaden PM
,Tooker ME
《-》
-
Multibreed genomic evaluation for production traits of dairy cattle in the United States using single-step genomic best linear unbiased predictor.
Official multibreed genomic evaluations for dairy cattle in the United States are based on multibreed BLUP evaluation followed by single-breed estimation of SNP effects. Single-step genomic BLUP (ssGBLUP) allows the straight computation of genomic (G)EBV in a multibreed context. This work aimed to develop ssGBLUP multibreed genomic predictions for US dairy cattle using the algorithm for proven and young (APY) to compute the inverse of the genomic relationship matrix. Only purebred Ayrshire (AY), Brown Swiss (BS), Guernsey (GU), Holstein (HO), and Jersey (JE) animals were considered. A 3-trait model with milk (MY), fat (FY), and protein (PY) yields was applied using about 45 million phenotypes recorded from January 2000 to June 2020. The whole data set included about 29.5 million animals, of which almost 4 million were genotyped. All the effects in the model were breed specific, and breed was also considered as fixed unknown parent groups. Evaluations were done for (1) each single breed separately (single); (2) HO and JE together (HO_JE); (3) AY, BS, and GU together (AY_BS_GU); (4) all the 5 breeds together (5_BREEDS). Initially, 15k core animals were used in APY for AY_BS_GU and 5_BREEDS, but larger core sets with more animals from the least represented breeds were also tested. The HO_JE evaluation had a fixed set of 30k core animals, with an equal representation of the 2 breeds, whereas HO and JE single-breed analysis involved 15k core animals. Validation for cows was based on correlations between adjusted phenotypes and (G)EBV, whereas for bulls on the regression of daughter yield deviations on (G)EBV. Because breed was correctly considered in the model, BLUP results for single and multibreed analyses were the same. Under ssGBLUP, predictability and reliability for AY, BS, and GU were on average 7% and 2% lower in 5_BREEDS compared with single-breed evaluations, respectively. However, validation parameters for these 3 breeds became better than in the single-breed evaluations when 45k animals were included in the core set for 5_BREEDS. Evaluations for Holsteins were more stable across scenarios because of the greatest number of genotyped animals and amount of data. Combining AY, BS, and GU into one evaluation resulted in predictions similar to the ones from single breed, especially when using about 30k core animals in APY. The results showed that single-step large-scale multibreed evaluations are computationally feasible, but fine tuning is needed to avoid a reduction in reliability when numerically dominant breeds are combined. Having evaluations for AY, BS, and GU separated from HO and JE may reduce inflation of GEBV for the first 3 breeds.
Cesarani A
,Lourenco D
,Tsuruta S
,Legarra A
,Nicolazzi EL
,VanRaden PM
,Misztal I
... -
《-》
-
Differences among methods to validate genomic evaluations for dairy cattle.
Two methods of testing predictions from genomic evaluations were investigated. Data used were from the August 2006 and April 2010 official USDA genetic evaluations of dairy cattle. The training data set consisted of both cows and bulls that were proven (had own or daughter information) as of August 2006 and included 8,022, 1,959, and 1,056 Holsteins, Jerseys, and Brown Swiss, respectively. The validation data set consisted of bulls that were unproven as of August 2006 and were proven by April 2010 with 2,653, 411, and 132 Holsteins, Jerseys, and Brown Swiss for the production traits. Method 1 used the training animal's predicted transmitting ability (PTA) from August of 2006. Method 2 used the training animal's April 2010 PTA to estimate single nucleotide polymorphism effects. Both methods were tested using several regressions with the same validation animals. In both cases, the validation animals were tested using the deregressed April 2010 PTA. All traits that had genomic evaluations from the official USDA April 2010 genetic evaluations were tested. Results included bias, differences from expected regressions (calculated using selection intensities), and the coefficients of determination. The genomic information increased the predictive ability for most of the traits in all of the breeds. The 2 methods of testing resulted in some differences that would affect interpretation of results. The coefficient of determination was higher for all traits using method 2. This was the expected result as the data were not independent because evaluations of the validation bulls contributed to their sires' evaluations. The regression coefficients from method 2 were often higher than the regression coefficients from method 1. Many traits had regression coefficients that were higher than 2 standard deviations from the expected regressions when using method 2. This was partially due to the lack of independence of the training and validation data sets. Most traits did have some level of bias in the prediction equations, regardless of breed. The use of method 1 made it possible to evaluate the increased accuracy in proven first-crop bull evaluations by using genomic information. Proven first-crop bulls had an increase in accuracy from the addition of genomic information. It is advised to use method 1 for validation of genomic evaluations.
Olson KM
,Vanraden PM
,Tooker ME
,Cooper TA
... -
《-》
-
Multibreed genomic prediction using multitrait genomic residual maximum likelihood and multitask Bayesian variable selection.
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
Calus MPL
,Goddard ME
,Wientjes YCJ
,Bowman PJ
,Hayes BJ
... -
《-》
-
Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.
Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic relationship matrix and the additive relationship matrix to a base at the time the breeds diverged, and regressed the genomic relationship matrix to account for sampling errors in estimating relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic relationship matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits.
Erbe M
,Hayes BJ
,Matukumalli LK
,Goswami S
,Bowman PJ
,Reich CM
,Mason BA
,Goddard ME
... -
《-》