Impact of QTL minor allele frequency on genomic evaluation using real genotype data and simulated phenotypes in Japanese Black cattle.
Genetic variance that is not captured by single nucleotide polymorphisms (SNPs) is due to imperfect linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTLs), and the extent of LD between SNPs and QTLs depends on different minor allele frequencies (MAF) between them. To evaluate the impact of MAF of QTLs on genomic evaluation, we performed a simulation study using real cattle genotype data.
In total, 1368 Japanese Black cattle and 592,034 SNPs (Illumina BovineHD BeadChip) were used. We simulated phenotypes using real genotypes under different scenarios, varying the MAF categories, QTL heritability, number of QTLs, and distribution of QTL effect. After generating true breeding values and phenotypes, QTL heritability was estimated and the prediction accuracy of genomic estimated breeding value (GEBV) was assessed under different SNP densities, prediction models, and population size by a reference-test validation design.
The extent of LD between SNPs and QTLs in this population was higher in the QTLs with high MAF than in those with low MAF. The effect of MAF of QTLs depended on the genetic architecture, evaluation strategy, and population size in genomic evaluation. In genetic architecture, genomic evaluation was affected by the MAF of QTLs combined with the QTL heritability and the distribution of QTL effect. The number of QTL was not affected on genomic evaluation if the number of QTL was more than 50. In the evaluation strategy, we showed that different SNP densities and prediction models affect the heritability estimation and genomic prediction and that this depends on the MAF of QTLs. In addition, accurate QTL heritability and GEBV were obtained using denser SNP information and the prediction model accounted for the SNPs with low and high MAFs. In population size, a large sample size is needed to increase the accuracy of GEBV.
The MAF of QTL had an impact on heritability estimation and prediction accuracy. Most genetic variance can be captured using denser SNPs and the prediction model accounted for MAF, but a large sample size is needed to increase the accuracy of GEBV under all QTL MAF categories.
Uemoto Y
,Sasaki S
,Kojima T
,Sugimoto Y
,Watanabe T
... -
《BMC GENETICS》
Genomic prediction ability for beef fatty acid profile in Nelore cattle using different pseudo-phenotypes.
The aim of the present study was to compare the predictive ability of SNP-BLUP model using different pseudo-phenotypes such as phenotype adjusted for fixed effects, estimated breeding value, and genomic estimated breeding value, using simulated and real data for beef FA profile of Nelore cattle finished in feedlot. A pedigree with phenotypes and genotypes of 10,000 animals were simulated, considering 50% of multiple sires in the pedigree. Regarding to phenotypes, two traits were simulated, one with high heritability (0.58), another with low heritability (0.13). Ten replicates were performed for each trait and results were averaged among replicates. A historical population was created from generation zero to 2020, with a constant size of 2000 animals (from generation zero to 1000) to produce different levels of linkage disequilibrium (LD). Therefore, there was a gradual reduction in the number of animals (from 2000 to 600), producing a "bottleneck effect" and consequently, genetic drift and LD starting in the generation 1001 to 2020. A total of 335,000 markers (with MAF greater or equal to 0.02) and 1000 QTL were randomly selected from the last generation of the historical population to generate genotypic data for the test population. The phenotypes were computed as the sum of the QTL effects and an error term sampled from a normal distribution with zero mean and variance equal to 0.88. For simulated data, 4000 animals of the generations 7, 8, and 9 (with genotype and phenotype) were used as training population, and 1000 animals of the last generation (10) were used as validation population. A total of 937 Nelore bulls with phenotype for fatty acid profiles (Sum of saturated, monounsaturated, omega 3, omega 6, ratio of polyunsaturated and saturated and polyunsaturated fatty acid profile) were genotyped using the Illumina BovineHD BeadChip (Illumina, San Diego, CA) with 777,962 SNP. To compare the accuracy and bias of direct genomic value (DGV) for different pseudo-phenotypes, the correlation between true breeding value (TBV) or DGV with pseudo-phenotypes and linear regression coefficient of the pseudo-phenotypes on TBV for simulated data or DGV for real data, respectively. For simulated data, the correlations between DGV and TBV for high heritability traits were higher than obtained with low heritability traits. For simulated and real data, the prediction ability was higher for GEBV than for Yc and EBV. For simulated data, the regression coefficient estimates (b), were on average lower than 1 for high and low heritability traits, being inflated. The results were more biased for Yc and EBV than for GEBV. For real data, the GEBV displayed less biased results compared to Yc and EBV for SFA, MUFA, n-3, n-6, and PUFA/SFA. Despite the less biased results for PUFA using the EBV as pseudo-phenotype, the b estimates obtained for the different pseudo-phenotypes (Yc, EBV and GEBV) were very close. Genomic information can assist in improving beef fatty acid profile in Zebu cattle, since the use of genomic information yielded genomic values for fatty acid profile with accuracies ranging from low to moderate. Considering both simulated and real data, the ssGBLUP model is an appropriate alternative to obtain more reliable and less biased GEBVs as pseudo-phenotype in situations of missing pedigree, due to high proportion of multiple sires, being more adequate than EBV and Yc to predict direct genomic value for beef fatty acid profile.
Chiaia HLJ
,Peripolli E
,de Oliveira Silva RM
,Feitosa FLB
,de Lemos MVA
,Berton MP
,Olivieri BF
,Espigolan R
,Tonussi RL
,Gordo DGM
,de Albuquerque LG
,de Oliveira HN
,Ferrinho AM
,Mueller LF
,Kluska S
,Tonhati H
,Pereira ASC
,Aguilar I
,Baldi F
... -
《-》
Accuracy of prediction of simulated polygenic phenotypes and their underlying quantitative trait loci genotypes using real or imputed whole-genome markers in cattle.
More accurate genomic predictions are expected when the effects of QTL (quantitative trait loci) are predicted from markers in close physical proximity to the QTL. The objective of this study was to quantify to what extent whole-genome methods using 50 K or imputed 770 K SNPs (single nucleotide polymorphisms) could predict single or multiple QTL genotypes based on SNPs in close proximity to those QTL.
Phenotypes with a heritability of 1 were simulated for 2677 Hereford animals genotyped with the BovineSNP50 BeadChip. Genotypes for the high-density 770 K SNP panel were imputed using Beagle software. Various Bayesian regression methods were used to predict single QTL or a trait influenced by 42 such QTL. We quantified to what extent these predictions were based on SNPs in close proximity to the QTL by comparing whole-genome predictions to local predictions based on estimates of the effects of variable numbers of SNPs i.e. ±1, ±2, ±5, ±10, ±50 or ±100 that flanked the QTL.
Prediction accuracies based on local SNPs using whole-genome training for single QTL with the 50 K SNP panel and BayesC0 ranged from 0.49 (±1 SNP) to 0.75 (±100 SNPs). The minimum number of local SNPs for an accurate prediction is ±10 SNPs. Prediction accuracies that were based on local SNPs only were higher than those based on whole-genome SNPs for both 50 K and 770 K SNP panels. For the 770 K SNP panel, prediction accuracies were higher than 0.70 and varied little i.e. between 0.73 (±1 SNP) and 0.77 (±5 SNPs). For the summed 42 QTL, prediction accuracies were generally higher than for single QTL regardless of the number of local SNPs. For QTL with low minor allele frequency (MAF) compared to QTL with high MAF, prediction accuracies increased as the number of SNPs around the QTL increased.
These results suggest that with both 50 K and imputed 770 K SNP genotypes the level of linkage disequilibrium is sufficient to predict single and multiple QTL. However, prediction accuracies are eroded through spuriously estimated effects of SNPs that are distant from the QTL. Prediction accuracies were higher with the 770 K than with the 50 K SNP panel.
Hassani S
,Saatchi M
,Fernando RL
,Garrick DJ
... -
《-》
Impact of QTL properties on the accuracy of multi-breed genomic prediction.
Although simulation studies show that combining multiple breeds in one reference population increases accuracy of genomic prediction, this is not always confirmed in empirical studies. This discrepancy might be due to the assumptions on quantitative trait loci (QTL) properties applied in simulation studies, including number of QTL, spectrum of QTL allele frequencies across breeds, and distribution of allele substitution effects. We investigated the effects of QTL properties and of including a random across- and within-breed animal effect in a genomic best linear unbiased prediction (GBLUP) model on accuracy of multi-breed genomic prediction using genotypes of Holstein-Friesian and Jersey cows.
Genotypes of three classes of variants obtained from whole-genome sequence data, with moderately low, very low or extremely low average minor allele frequencies (MAF), were imputed in 3000 Holstein-Friesian and 3000 Jersey cows that had real high-density genotypes. Phenotypes of traits controlled by QTL with different properties were simulated by sampling 100 or 1000 QTL from one class of variants and their allele substitution effects either randomly from a gamma distribution, or computed such that each QTL explained the same variance, i.e. rare alleles had a large effect. Genomic breeding values for 1000 selection candidates per breed were estimated using GBLUP modelsincluding a random across- and a within-breed animal effect.
For all three classes of QTL allele frequency spectra, accuracies of genomic prediction were not affected by the addition of 2000 individuals of the other breed to a reference population of the same breed as the selection candidates. Accuracies of both single- and multi-breed genomic prediction decreased as MAF of QTL decreased, especially when rare alleles had a large effect. Accuracies of genomic prediction were similar for the models with and without a random within-breed animal effect, probably because of insufficient power to separate across- and within-breed animal effects.
Accuracy of both single- and multi-breed genomic prediction depends on the properties of the QTL that underlie the trait. As QTL MAF decreased, accuracy decreased, especially when rare alleles had a large effect. This demonstrates that QTL properties are key parameters that determine the accuracy of genomic prediction.
Wientjes YC
,Calus MP
,Goddard ME
,Hayes BJ
... -
《-》
Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations.
The potential of combining multiple populations in genomic prediction is depending on the consistency of linkage disequilibrium (LD) between SNPs and QTL across populations. We investigated consistency of multi-locus LD across populations using selection index theory and investigated the relationship between consistency of multi-locus LD and accuracy of genomic prediction across different simulated scenarios. In the selection index, QTL genotypes were considered as breeding goal traits and SNP genotypes as index traits, based on LD among SNPs and between SNPs and QTL. The consistency of multi-locus LD across populations was computed as the accuracy of predicting QTL genotypes in selection candidates using a selection index derived in the reference population. Different scenarios of within and across population genomic prediction were evaluated, using all SNPs or only the four neighboring SNPs of a simulated QTL. Phenotypes were simulated using different numbers of QTL underlying the trait. The relationship between the calculated consistency of multi-locus LD and accuracy of genomic prediction using a GBLUP type of model was investigated.
The accuracy of predicting QTL genotypes, i.e. the measure describing consistency of multi-locus LD, was much lower for across population scenarios compared to within population scenarios, and was lower when QTL had a low MAF compared to QTL randomly selected from the SNPs. Consistency of multi-locus LD was highly correlated with the realized accuracy of genomic prediction across different scenarios and the correlation was higher when QTL were weighted according to their effects in the selection index instead of weighting QTL equally. By only considering neighboring SNPs of QTL, accuracy of predicting QTL genotypes within population decreased, but it substantially increased the accuracy across populations.
Consistency of multi-locus LD across populations is a characteristic of the properties of the QTL in the investigated populations and can provide more insight in underlying reasons for a low empirical accuracy of across population genomic prediction. By focusing in genomic prediction models only on neighboring SNPs of QTL, multi-locus LD is more consistent across populations since only short-range LD is considered, and accuracy of predicting QTL genotypes of individuals from another population is increased.
Wientjes YC
,Veerkamp RF
,Calus MP
《BMC GENETICS》