Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.-Z研学术

Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.

来自 PUBMED

作者：

Schopp P ， Müller D ， Technow F ， Melchinger AE

展开 

摘要：

Synthetics play an important role in quantitative genetic research and plant breeding, but few studies have investigated the application of genomic prediction (GP) to these populations. Synthetics are generated by intermating a small number of parents ([Formula: see text] and thereby possess unique genetic properties, which make them especially suited for systematic investigations of factors contributing to the accuracy of GP. We generated synthetics in silico from [Formula: see text]2 to 32 maize (Zea mays L.) lines taken from an ancestral population with either short- or long-range linkage disequilibrium (LD). In eight scenarios differing in relatedness of the training and prediction sets and in the types of data used to calculate the relationship matrix (QTL, SNPs, tag markers, and pedigree), we investigated the prediction accuracy (PA) of Genomic best linear unbiased prediction (GBLUP) and analyzed contributions from pedigree relationships captured by SNP markers, as well as from cosegregation and ancestral LD between QTL and SNPs. The effects of training set size [Formula: see text] and marker density were also studied. Sampling few parents ([Formula: see text]) generates substantial sample LD that carries over into synthetics through cosegregation of alleles at linked loci. For fixed [Formula: see text], [Formula: see text] influences PA most strongly. If the training and prediction set are related, using [Formula: see text] parents yields high PA regardless of ancestral LD because SNPs capture pedigree relationships and Mendelian sampling through cosegregation. As [Formula: see text] increases, ancestral LD contributes more information, while other factors contribute less due to lower frequencies of closely related individuals. For unrelated prediction sets, only ancestral LD contributes information and accuracies were poor and highly variable for [Formula: see text] due to large sample LD. For large [Formula: see text], achieving moderate accuracy requires large [Formula: see text], long-range ancestral LD, and high marker density. Our approach for analyzing PA in synthetics provides new insights into the prospects of GP for many types of source populations encountered in plant breeding.

收起

展开 

关键词：

GBLUP ， GenPred ， Shared data resource ， genetic relationships ， genomic prediction ， genomic selection ， linkage disequilibrium ， synthetic populations

DOI：

10.1534/genetics.116.193243

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(781)

参考文献(48)

引证文献(28)

Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.

Synthetics play an important role in quantitative genetic research and plant breeding, but few studies have investigated the application of genomic prediction (GP) to these populations. Synthetics are generated by intermating a small number of parents ([Formula: see text] and thereby possess unique genetic properties, which make them especially suited for systematic investigations of factors contributing to the accuracy of GP. We generated synthetics in silico from [Formula: see text]2 to 32 maize (Zea mays L.) lines taken from an ancestral population with either short- or long-range linkage disequilibrium (LD). In eight scenarios differing in relatedness of the training and prediction sets and in the types of data used to calculate the relationship matrix (QTL, SNPs, tag markers, and pedigree), we investigated the prediction accuracy (PA) of Genomic best linear unbiased prediction (GBLUP) and analyzed contributions from pedigree relationships captured by SNP markers, as well as from cosegregation and ancestral LD between QTL and SNPs. The effects of training set size [Formula: see text] and marker density were also studied. Sampling few parents ([Formula: see text]) generates substantial sample LD that carries over into synthetics through cosegregation of alleles at linked loci. For fixed [Formula: see text], [Formula: see text] influences PA most strongly. If the training and prediction set are related, using [Formula: see text] parents yields high PA regardless of ancestral LD because SNPs capture pedigree relationships and Mendelian sampling through cosegregation. As [Formula: see text] increases, ancestral LD contributes more information, while other factors contribute less due to lower frequencies of closely related individuals. For unrelated prediction sets, only ancestral LD contributes information and accuracies were poor and highly variable for [Formula: see text] due to large sample LD. For large [Formula: see text], achieving moderate accuracy requires large [Formula: see text], long-range ancestral LD, and high marker density. Our approach for analyzing PA in synthetics provides new insights into the prospects of GP for many types of source populations encountered in plant breeding.

Schopp P ，Müller D ，Technow F ，Melchinger AE ... - 《-》

被引量: 28 发表:1970年
Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection.

Recurrent selection (RS) has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents [Formula: see text] but little is known about how [Formula: see text] affects genomic selection (GS) in RS, especially the persistency of prediction accuracy ([Formula: see text]) and genetic gain. Synthetics were simulated by intermating [Formula: see text]= 2-32 parent lines from an ancestral population with short- or long-range linkage disequilibrium ([Formula: see text]) and subjected to multiple cycles of GS. We determined [Formula: see text] and genetic gain across 30 cycles for different training set () sizes, marker densities, and generations of recombination before model training. Contributions to [Formula: see text] and genetic gain from pedigree relationships, as well as from cosegregation and [Formula: see text] between QTL and markers, were analyzed via four scenarios differing in (i) the relatedness between and selection candidates and (ii) whether selection was based on markers or pedigree records. Persistency of [Formula: see text] was high for small [Formula: see text] where predominantly cosegregation contributed to [Formula: see text], but also for large [Formula: see text] where [Formula: see text] replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing [Formula: see text] > 4, given long-range LD in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to [Formula: see text] for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger size ([Formula: see text]) and higher marker density improved persistency of [Formula: see text] and hence genetic gain, but additional recombinations could not increase genetic gain.

Müller D ，Schopp P ，Melchinger AE 《G3-Genes Genomes Genetics》

被引量: 18 发表:1970年
Genomic Prediction Within and Across Biparental Families: Means and Variances of Prediction Accuracy and Usefulness of Deterministic Equations.

A major application of genomic prediction (GP) in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs), experimental studies found substantial variation in prediction accuracy (PA), but little is known about the underlying factors. We used SNP marker genotypes of inbred lines from either elite germplasm or landraces of maize ( L.) as parents to generate 300 BPFs of doubled-haploid lines. We analyzed PA within each BPF for 50 simulated polygenic traits, using genomic best linear unbiased prediction (GBLUP) models trained with individuals from either full-sib (FSF), half-sib (HSF), or unrelated families (URF) for various sizes ([Formula: see text]) of the training set and different heritabilities ([Formula: see text] In addition, we modified two deterministic equations for forecasting PA to account for inbreeding and genetic variance unexplained by the training set. Averaged across traits, PA was high within FSF (0.41-0.97) with large variation only for [Formula: see text] and [Formula: see text] [Formula: see text] For HSF and URF, PA was on average ∼40-60% lower and varied substantially among different combinations of BPFs used for model training and prediction as well as different traits. As exemplified by HSF results, PA of across-family GP can be very low if causal variants not segregating in the training set account for a sizeable proportion of the genetic variance among predicted individuals. Deterministic equations accurately forecast the PA expected over many traits, yet cannot capture trait-specific deviations. We conclude that model training within BPFs generally yields stable PA, whereas a high level of uncertainty is encountered in across-family GP. Our study shows the extent of variation in PA that must be at least reckoned with in practice and offers a starting point for the design of training sets composed of multiple BPFs.

Schopp P ，Müller D ，Wientjes YCJ ，Melchinger AE ... - 《G3-Genes Genomes Genetics》

被引量: 20 发表:1970年
Contributions of linkage disequilibrium and co-segregation information to the accuracy of genomic prediction.

Traditional genomic prediction models using multiple regression on single nucleotide polymorphisms (SNPs) genotypes exploit associations between genotypes of quantitative trait loci (QTL) and SNPs, which can be created by historical linkage disequilibrium (LD), recent co-segregation (CS) and pedigree relationships. Results from field data analyses show that prediction accuracy is usually much higher for individuals that are close relatives of the training population than for distantly related individuals. A possible reason is that historical LD between QTL and SNPs is weak and, for close relatives, prediction accuracy of SNP models is mainly contributed by pedigree relationships and CS. Information from pedigree relationships decreases fast over generations and only contributes to within-family prediction. Information from CS is affected by family structures and effective population size, and can have a substantial contribution to prediction accuracy when modeled explicitly. In this study, a method to explicitly model CS was developed by following the transmission of putative QTL alleles using allele origins at SNPs. Bayesian hierarchical models that combine information from LD and CS (LD-CS model) were developed for genomic prediction in pedigree populations. Contributions of LD and CS information to prediction accuracy across families and generations without retraining were investigated in simulated half-sib datasets and deep pedigrees with different recent effective population sizes, respectively. Results from half-sib datasets showed that when historical LD between QTL and SNPs is low, accuracy of the LD model decreased when the training data size is increased by adding independent sire families, but accuracies from the CS and LD-CS models increased and plateaued rapidly. Results from deep pedigree datasets show that the LD model had high accuracy across generations only when historical LD between QTL and SNPs was high. Modeling CS explicitly resulted in higher accuracy than the LD model across generations when the mating design generated many close relatives. Our results suggest that modeling CS explicitly improves accuracy of genomic prediction when historical LD between QTL and SNPs is low. Modeling both LD and CS explicitly is expected to improve accuracy when recent effective population size is small, or when the training data include many independent families.

Sun X ，Fernando R ，Dekkers J 《-》

被引量: 10 发表:1970年
Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations.

The potential of combining multiple populations in genomic prediction is depending on the consistency of linkage disequilibrium (LD) between SNPs and QTL across populations. We investigated consistency of multi-locus LD across populations using selection index theory and investigated the relationship between consistency of multi-locus LD and accuracy of genomic prediction across different simulated scenarios. In the selection index, QTL genotypes were considered as breeding goal traits and SNP genotypes as index traits, based on LD among SNPs and between SNPs and QTL. The consistency of multi-locus LD across populations was computed as the accuracy of predicting QTL genotypes in selection candidates using a selection index derived in the reference population. Different scenarios of within and across population genomic prediction were evaluated, using all SNPs or only the four neighboring SNPs of a simulated QTL. Phenotypes were simulated using different numbers of QTL underlying the trait. The relationship between the calculated consistency of multi-locus LD and accuracy of genomic prediction using a GBLUP type of model was investigated. The accuracy of predicting QTL genotypes, i.e. the measure describing consistency of multi-locus LD, was much lower for across population scenarios compared to within population scenarios, and was lower when QTL had a low MAF compared to QTL randomly selected from the SNPs. Consistency of multi-locus LD was highly correlated with the realized accuracy of genomic prediction across different scenarios and the correlation was higher when QTL were weighted according to their effects in the selection index instead of weighting QTL equally. By only considering neighboring SNPs of QTL, accuracy of predicting QTL genotypes within population decreased, but it substantially increased the accuracy across populations. Consistency of multi-locus LD across populations is a characteristic of the properties of the QTL in the investigated populations and can provide more insight in underlying reasons for a low empirical accuracy of across population genomic prediction. By focusing in genomic prediction models only on neighboring SNPs of QTL, multi-locus LD is more consistent across populations since only short-range LD is considered, and accuracy of predicting QTL genotypes of individuals from another population is increased.

Wientjes YC ，Veerkamp RF ，Calus MP 《BMC GENETICS》

被引量: 11 发表:1970年

加载更多

来源期刊

影响因子：暂无数据

JCR分区：暂无

中科院分区：暂无