-
Modeling heterogeneous (co)variances from adjacent-SNP groups improves genomic prediction for milk protein composition traits.
Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls. Single-nucleotide polymorphisms (SNPs), from 50K SNP arrays, were grouped into non-overlapping genome segments. A segment was defined as one SNP, or a group of 50, 100, or 200 adjacent SNPs, or one chromosome, or the whole genome. Traditional univariate and bivariate genomic best linear unbiased prediction (GBLUP) models were also run for comparison. Reliabilities were calculated through a resampling strategy and using deterministic formula.
BayesAS models improved prediction reliability for most of the traits compared to GBLUP models and this gain depended on segment size and genetic architecture of the traits. The gain in prediction reliability was especially marked for the protein composition traits β-CN, κ-CN and β-LG, for which prediction reliabilities were improved by 49 percentage points on average using the MT-BayesAS model with a 100-SNP segment size compared to the bivariate GBLUP. Prediction reliabilities were highest with the BayesAS model that uses a 100-SNP segment size. The bivariate versions of our BayesAS models resulted in extra gains of up to 6% in prediction reliability compared to the univariate versions.
Substantial improvement in prediction reliability was possible for most of the traits related to milk protein composition using our novel BayesAS models. Grouping adjacent SNPs into segments provided enhanced information to estimate parameters and allowing the segments to have different (co)variances helped disentangle heterogeneous (co)variances across the genome.
Gebreyesus G
,Lund MS
,Buitenhuis B
,Bovenhuis H
,Poulsen NA
,Janss LG
... -
《-》
-
Genomic Prediction Using Multi-trait Weighted GBLUP Accounting for Heterogeneous Variances and Covariances Across the Genome.
Implicit assumption of common (co)variance for all loci in multi-trait Genomic Best Linear Unbiased Prediction (GBLUP) results in a genomic relationship matrix () that is common to all traits. When this assumption is violated, Bayesian whole genome regression methods may be superior to GBLUP by accounting for unequal (co)variance for all loci or genome regions. This study aimed to develop a strategy to improve the accuracy of GBLUP for multi-trait genomic prediction, using (co)variance estimates of SNP effects from Bayesian whole genome regression methods. Five generations (G1-G5, test populations) of genotype data were available by simulations based on data of 2,200 Danish Holstein cows (G0, reference population). Two correlated traits with heritabilities of 0.1 or 0.4, and a genetic correlation of 0.45 were generated. First, SNP effects and breeding values were estimated using BayesAS method, assuming (co)variance was the same for SNPs within a genome region, and different between regions. Region size was set as one SNP, 100 SNPs, a whole chromosome or whole genome. Second, posterior (co)variances of SNP effects were used to weight SNPs in construction of matrices. In general, region size of 100 SNPs led to highest prediction accuracies using BayesAS, and wGBLUP outperformed GBLUP at this region size. Our results suggest that when genetic architectures of traits favor Bayesian methods, the accuracy of multi-trait GBLUP can be as high as the Bayesian method if SNPs are weighted by the Bayesian posterior (co)variances.
Karaman E
,Lund MS
,Anche MT
,Janss L
,Su G
... -
《G3-Genes Genomes Genetics》
-
Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances.
Various models have been used for genomic prediction. Bayesian variable selection models often predict more accurate genomic breeding values than genomic BLUP (GBLUP), but GBLUP is generally preferred for routine genomic evaluations because of low computational demand. The objective of this study was to achieve the benefits of both models using results from Bayesian models and genome-wide association studies as weights on single nucleotide polymorphism (SNP) markers when constructing the genomic matrix (G-matrix) for genomic prediction. The data comprised 5,221 progeny-tested bulls from the Nordic Holstein population. The animals were genotyped using the Illumina Bovine SNP50 BeadChip (Illumina Inc., San Diego, CA). Weighting factors in this investigation were the posterior SNP variance, the square of the posterior SNP effect, and the corresponding minus base-10 logarithm of the marker association P-value [-log10(P)] of a t-test obtained from the analysis using a Bayesian mixture model with 4 normal distributions, the square of the estimated SNP effect, and the corresponding -log10(P) of a t-test obtained from the analysis using a classical genome-wide association study model (linear regression model). The weights were derived from the analysis based on data sets that were 0, 1, 3, or 5 yr before performing genomic prediction. In building a G-matrix, the weights were assigned either to each marker (single-marker weighting) or to each group of approximately 5 to 150 markers (group-marker weighting). The analysis was carried out for milk yield, fat yield, protein yield, fertility, and mastitis. Deregressed proofs (DRP) were used as response variables to predict genomic estimated breeding values (GEBV). Averaging over the 5 traits, the Bayesian model led to 2.0% higher reliability of GEBV than the GBLUP model with an original unweighted G-matrix. The superiority of using a GBLUP with weighted G-matrix over GBLUP with an original unweighted G-matrix was the largest when using a weighting factor of posterior variance, resulting in 1.7 percentage points higher reliability. The second best weighting factors were -log10 (P-value) of a t-test corresponding to the square of the posterior SNP effect from the Bayesian model and -log10 (P-value) of a t-test corresponding to the square of the estimated SNP effect from the linear regression model, followed by the square of estimated SNP effect and the square of the posterior SNP effect. In addition, group-marker weighting performed better than single-marker weighting in terms of reducing bias of GEBV, and also slightly increased prediction reliability. The differences between weighting factors and scenarios were larger in prediction bias than in prediction accuracy. Finally, weights derived from a data set having a lag up to 3 yr did not reduce reliability of GEBV. The results indicate that posterior SNP variance estimated from a Bayesian mixture model is a good alternative weighting factor, and common weights on group markers with a size of 30 markers is a good strategy when using markers of the 50,000-marker (50K) chip. In a population with gradually increasing reference data, the weights can be updated once every 3 yr.
Su G
,Christensen OF
,Janss L
,Lund MS
... -
《-》
-
Genomic prediction of breeding values using previously estimated SNP variances.
Genomic prediction requires estimation of variances of effects of single nucleotide polymorphisms (SNPs), which is computationally demanding, and uses these variances for prediction. We have developed models with separate estimation of SNP variances, which can be applied infrequently, and genomic prediction, which can be applied routinely.
SNP variances were estimated with Bayes Stochastic Search Variable Selection (BSSVS) and BayesC. Genome-enhanced breeding values (GEBV) were estimated with RR-BLUP (ridge regression best linear unbiased prediction), using either variances obtained from BSSVS (BLUP-SSVS) or BayesC (BLUP-C), or assuming equal variances for each SNP. Datasets used to estimate SNP variances comprised (1) all animals, (2) 50% random animals (RAN50), (3) 50% best animals (TOP50), or (4) 50% worst animals (BOT50). Traits analysed were protein yield, udder depth, somatic cell score, interval between first and last insemination, direct longevity, and longevity including information from predictors.
BLUP-SSVS and BLUP-C yielded similar GEBV as the equivalent Bayesian models that simultaneously estimated SNP variances. Reliabilities of these GEBV were consistently higher than from RR-BLUP, although only significantly for direct longevity. Across scenarios that used data subsets to estimate GEBV, observed reliabilities were generally higher for TOP50 than for RAN50, and much higher than for BOT50. Reliabilities of TOP50 were higher because the training data contained more ancestors of selection candidates. Using estimated SNP variances based on random or non-random subsets of the data, while using all data to estimate GEBV, did not affect reliabilities of the BLUP models. A convergence criterion of 10(-8) instead of 10(-10) for BLUP models yielded similar GEBV, while the required number of iterations decreased by 71 to 90%. Including a separate polygenic effect consistently improved reliabilities of the GEBV, but also substantially increased the required number of iterations to reach convergence with RR-BLUP. SNP variances converged faster for BayesC than for BSSVS.
Combining Bayesian variable selection models to re-estimate SNP variances and BLUP models that use those SNP variances, yields GEBV that are similar to those from full Bayesian models. Moreover, these combined models yield predictions with higher reliability and less bias than the commonly used RR-BLUP model.
Calus MP
,Schrooten C
,Veerkamp RF
《-》
-
Value of sharing cow reference population between countries on reliability of genomic prediction for milk yield traits.
Increasing the reliability of genomic prediction (GP) of economic traits in the pasture-based dairy production systems of New Zealand (NZ) and Australia (AU) is important to both countries. This study assessed if sharing cow phenotype and genotype data of NZ and AU improves the reliability of GP for NZ bulls. Data from approximately 32,000 NZ genotyped cows and their contemporaries were included in the May 2018 routine genetic evaluation of the Australian Dairy cattle in an attempt to provide consistent phenotypes for both countries. After the genetic evaluation, deregressed proofs of cows were calculated for milk yield traits. The April 2018 multiple across-country evaluation of Interbull was also used to calculate deregressed proofs for bulls on the NZ scale. Approximately 1,178 Jersey (Jer) and 6,422 Holstein (Hol) bulls had genotype and phenotype data. In addition to NZ cows, phenotype data of close to 60,000 genotyped Australian (AU) cows from the same genetic evaluation run as NZ cows were used. All AU and NZ females were genotyped using low-density SNP chips (<10K SNP) and were imputed first to 50K and then to ∼600K (referred to as high density; HD). We used up to 98,000 animals in the reference populations, both by expanding the NZ reference set (cow, bull, single breed to multi-breed set) and by adding AU cows. Reliabilities of GP were calculated for 508 Jer and 1,251 Hol bulls whose sires are not included in the reference set (RS) to ensure that real differences are not masked by close relationships. The GP was tested using 50K or high-density SNP chip using genomic BLUP in bivariate (considering country as a trait) or single trait models. The RS that gave the highest reliability for each breed were also tested using a hybrid GP method that combines expectation maximization with Bayes R. The addition of the AU cows to an NZ RS that included either NZ cows only, or cows and bulls, improved the reliability of GP for both NZ Hol and Jer validation bulls for all traits. Using single breed reference populations also increased reliability when NZ crossbred cows were added to reference populations that included only purebred NZ bulls and cows and AU cows. The full multi-breed RS (all NZ cows and bulls and AU cows) provided similar reliabilities in NZ Hol bulls, when compared with the single breed reference with crossbred NZ cows. For Jer validation bulls, the RS that included Jer cows and bulls and crossbred cows from NZ and Jer cows from AU was marginally better than the all-breed, all-country RS. In terms of reliability, the advantage of the HD SNP chip was small but captured more of the genomic variance than the 50K, particularly for Hol. The expectation maximization Bayes R GP method was slightly (up to 3 percentage points) better than genomic BLUP. We conclude that GP of milk production traits in NZ bulls improves by up to 7 percentage points in reliability by expanding the NZ reference population to include AU cows.
Haile-Mariam M
,MacLeod IM
,Bolormaa S
,Schrooten C
,O'Connor E
,de Jong G
,Daetwyler HD
,Pryce JE
... -
《-》