-
Factors affecting accuracy of estimated effective number of chromosome segments for numerically small breeds.
For numerically small breeds, obtaining a sufficiently large breed-specific reference population for genomic prediction is challenging or simply not possible, but may be overcome by adding individuals from another breed. To prioritize among available breeds, the effective number of chromosome segments (Me ) can be used as an indicator of relatedness between individuals from different breeds. The Me is also an important parameter in determining the accuracy of genomic prediction. The Me can be estimated both within a population and between two populations or breeds, as the reciprocal of the variance of genomic relationships. However, the threshold for number of individuals needed to accurately estimate within or between populations Me is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of Me between populations. In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated Me is affected by number of genotyped individuals, single-nucleotide polymorphism (SNP) density and pedigree availability. Our results show that a small sample of 10 genotyped individuals may result in substantial over or underestimation of Me . While estimates of within population Me were hardly affected by SNP density, between population Me values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When subtracting pedigree from genomic relationships before computing Me , estimates of within population Me were three to four times higher than estimates with genotypes only; however, between Me estimates remained the same. For accurate estimation of within and between population Me , at least 50 individuals should be genotyped per population. Estimates of within Me were highly affected by whether pedigree was used or not. For within Me , even the smallest SNP density (~11k) resulted in accurate representation of family relationships in the population; however, for between Me , many more markers are needed to capture all independent segments.
Marjanovic J
,Calus MPL
《-》
-
Relatedness between numerically small Dutch Red dairy cattle populations and possibilities for multibreed genomic prediction.
Red dairy breeds are a valuable cultural and historical asset, and often a source of unique genetic diversity. However, they have difficulties competing with other, more productive, dairy breeds. Improving competitiveness of Red dairy breeds, by accelerating their genetic improvement using genomic selection, may be a promising strategy to secure their long-term future. For many Red dairy breeds, establishing a sufficiently large breed-specific reference population for genomic prediction is often not possible, but may be overcome by adding individuals from another breed. Relatedness between breeds strongly decides the benefit of adding another breed to the reference population. To prioritize among available breeds, the effective number of chromosome segments (Me) can be used as an indicator of relatedness between individuals from different breeds. The Me is also an important parameter in determining the accuracy of genomic prediction. The Me can be estimated both within a population and between 2 populations or breeds, as the reciprocal of the variance of genomic relationships. We investigated relatedness between 6 Dutch Red cattle breeds, Groningen White Headed (GWH), Dutch Friesian (DF), Meuse-Rhine-Yssel (MRY), Dutch Belted (DB), Deep Red (DR), and Improved Red (IR), focusing primarily on the Me, to predict which of those breeds may benefit from including reference animals of the other breeds. All of these breeds, except MRY, are under high risk of extinction. Our results indicated high variability of Me, especially between Me ranging from ∼3,500 to ∼17,400, indicating different levels of relatedness between the breeds. Two clusters are especially important, one formed by MRY, DR, and IR, and the other comprising DF and DB. Although relatedness between breeds within each of these 2 clusters is high, across-breed genomic prediction is still limited by the current number of genotyped individuals, which for many breeds is low. However, adding MRY individuals would increase the reference population of DR substantially. We estimated that between 11 and 133 individuals from other breeds are needed to achieve accuracy of genomic prediction equivalent to using one additional individual from the same breed. Given the variation in size of the breeds in this study, the benefit of a multibreed reference population is expected to be lower for larger breeds than for the smaller ones.
Marjanovic J
,Hulsegge B
,Calus MPL
《-》
-
Efficiency of multi-breed genomic selection for dairy cattle breeds with different sizes of reference population.
Single-breed genomic selection (GS) based on medium single nucleotide polymorphism (SNP) density (~50,000; 50K) is now routinely implemented in several large cattle breeds. However, building large enough reference populations remains a challenge for many medium or small breeds. The high-density BovineHD BeadChip (HD chip; Illumina Inc., San Diego, CA) containing 777,609 SNP developed in 2010 is characterized by short-distance linkage disequilibrium expected to be maintained across breeds. Therefore, combining reference populations can be envisioned. A population of 1,869 influential ancestors from 3 dairy breeds (Holstein, Montbéliarde, and Normande) was genotyped with the HD chip. Using this sample, 50K genotypes were imputed within breed to high-density genotypes, leading to a large HD reference population. This population was used to develop a multi-breed genomic evaluation. The goal of this paper was to investigate the gain of multi-breed genomic evaluation for a small breed. The advantage of using a large breed (Normande in the present study) to mimic a small breed is the large potential validation population to compare alternative genomic selection approaches more reliably. In the Normande breed, 3 training sets were defined with 1,597, 404, and 198 bulls, and a unique validation set included the 394 youngest bulls. For each training set, estimated breeding values (EBV) were computed using pedigree-based BLUP, single-breed BayesC, or multi-breed BayesC for which the reference population was formed by any of the Normande training data sets and 4,989 Holstein and 1,788 Montbéliarde bulls. Phenotypes were standardized by within-breed genetic standard deviation, the proportion of polygenic variance was set to 30%, and the estimated number of SNP with a nonzero effect was about 7,000. The 2 genomic selection (GS) approaches were performed using either the 50K or HD genotypes. The correlations between EBV and observed daughter yield deviations (DYD) were computed for 6 traits and using the different prediction approaches. Compared with pedigree-based BLUP, the average gain in accuracy with GS in small populations was 0.057 for the single-breed and 0.086 for multi-breed approach. This gain was up to 0.193 and 0.209, respectively, with the large reference population. Improvement of EBV prediction due to the multi-breed evaluation was higher for animals not closely related to the reference population. In the case of a breed with a small reference population size, the increase in correlation due to multi-breed GS was 0.141 for bulls without their sire in reference population compared with 0.016 for bulls with their sire in reference population. These results demonstrate that multi-breed GS can contribute to increase genomic evaluation accuracy in small breeds.
Hozé C
,Fritz S
,Phocas F
,Boichard D
,Ducrocq V
,Croiseau P
... -
《-》
-
Accuracy of genotype imputation in sheep breeds.
Although genomic selection offers the prospect of improving the rate of genetic gain in meat, wool and dairy sheep breeding programs, the key constraint is likely to be the cost of genotyping. Potentially, this constraint can be overcome by genotyping selection candidates for a low density (low cost) panel of SNPs with sparse genotype coverage, imputing a much higher density of SNP genotypes using a densely genotyped reference population. These imputed genotypes would then be used with a prediction equation to produce genomic estimated breeding values. In the future, it may also be desirable to impute very dense marker genotypes or even whole genome re-sequence data from moderate density SNP panels. Such a strategy could lead to an accurate prediction of genomic estimated breeding values across breeds, for example. We used genotypes from 48 640 (50K) SNPs genotyped in four sheep breeds to investigate both the accuracy of imputation of the 50K SNPs from low density SNP panels, as well as prospects for imputing very dense or whole genome re-sequence data from the 50K SNPs (by leaving out a small number of the 50K SNPs at random). Accuracy of imputation was low if the sparse panel had less than 5000 (5K) markers. Across breeds, it was clear that the accuracy of imputing from sparse marker panels to 50K was higher if the genetic diversity within a breed was lower, such that relationships among animals in that breed were higher. The accuracy of imputation from sparse genotypes to 50K genotypes was higher when the imputation was performed within breed rather than when pooling all the data, despite the fact that the pooled reference set was much larger. For Border Leicesters, Poll Dorsets and White Suffolks, 5K sparse genotypes were sufficient to impute 50K with 80% accuracy. For Merinos, the accuracy of imputing 50K from 5K was lower at 71%, despite a large number of animals with full genotypes (2215) being used as a reference. For all breeds, the relationship of individuals to the reference explained up to 64% of the variation in accuracy of imputation, demonstrating that accuracy of imputation can be increased if sires and other ancestors of the individuals to be imputed are included in the reference population. The accuracy of imputation could also be increased if pedigree information was available and was used in tracking inheritance of large chromosome segments within families. In our study, we only considered methods of imputation based on population-wide linkage disequilibrium (largely because the pedigree for some of the populations was incomplete). Finally, in the scenarios designed to mimic imputation of high density or whole genome re-sequence data from the 50K panel, the accuracy of imputation was much higher (86-96%). This is promising, suggesting that in silico genome re-sequencing is possible in sheep if a suitable pool of key ancestors is sequenced for each breed.
Hayes BJ
,Bowman PJ
,Daetwyler HD
,Kijas JW
,van der Werf JH
... -
《-》
-
Genomic predictions in purebreds with a multibreed genomic relationship matrix1.
Combining breeds in a multibreed evaluation can have a negative impact on prediction accuracy, especially if single nucleotide polymorphism (SNP) effects differ among breeds. The aim of this study was to evaluate the use of a multibreed genomic relationship matrix (G), where SNP effects are considered to be unique to each breed, that is, nonshared. This multibreed G was created by treating SNP of different breeds as if they were on nonoverlapping positions on the chromosome, although, in reality, they were not. This simple setup may avoid spurious Identity by state (IBS) relationships between breeds and automatically considers breed-specific allele frequencies. This scenario was contrasted to a regular multibreed evaluation where all SNPs were shared, that is, the same position, and to single-breed evaluations. Different SNP densities (9k and 45k) and different effective population sizes (Ne) were tested. Five breeds mimicking recent beef cattle populations that diverged from the same historical population were simulated using different selection criteria. It was assumed that quantitative trait locus (QTL) effects were the same over all breeds. For the recent population, generations 1-9 had approximately half of the animals genotyped, whereas all animals in generation 10 were genotyped. Generation 10 animals were set for validation; therefore, each breed had a validation group. Analyses were performed using single-step genomic best linear unbiased prediction. Prediction accuracy was calculated as the correlation between true (T) and genomic estimated breeding values (GEBV). Accuracies of GEBV were lower for the larger Ne and low SNP density. All three evaluation scenarios using 45k resulted in similar accuracies, suggesting that the marker density is high enough to account for relationships and linkage disequilibrium with QTL. A shared multibreed evaluation using 9k resulted in a decrease of accuracy of 0.08 for a smaller Ne and 0.12 for a larger Ne. This loss was mostly avoided when markers were treated as nonshared within the same G matrix. A G matrix with nonshared SNP enables multibreed evaluations without considerably changing accuracy, especially with limited information per breed.
Steyn Y
,Lourenco DAL
,Misztal I
《-》