-
Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF.
Can an artificial intelligence (AI)-based model predict human embryo viability using images captured by optical light microscopy?
We have combined computer vision image processing methods and deep learning techniques to create the non-invasive Life Whisperer AI model for robust prediction of embryo viability, as measured by clinical pregnancy outcome, using single static images of Day 5 blastocysts obtained from standard optical light microscope systems.
Embryo selection following IVF is a critical factor in determining the success of ensuing pregnancy. Traditional morphokinetic grading by trained embryologists can be subjective and variable, and other complementary techniques, such as time-lapse imaging, require costly equipment and have not reliably demonstrated predictive ability for the endpoint of clinical pregnancy. AI methods are being investigated as a promising means for improving embryo selection and predicting implantation and pregnancy outcomes.
These studies involved analysis of retrospectively collected data including standard optical light microscope images and clinical outcomes of 8886 embryos from 11 different IVF clinics, across three different countries, between 2011 and 2018.
The AI-based model was trained using static two-dimensional optical light microscope images with known clinical pregnancy outcome as measured by fetal heartbeat to provide a confidence score for prediction of pregnancy. Predictive accuracy was determined by evaluating sensitivity, specificity and overall weighted accuracy, and was visualized using histograms of the distributions of predictions. Comparison to embryologists' predictive accuracy was performed using a binary classification approach and a 5-band ranking comparison.
The Life Whisperer AI model showed a sensitivity of 70.1% for viable embryos while maintaining a specificity of 60.5% for non-viable embryos across three independent blind test sets from different clinics. The weighted overall accuracy in each blind test set was >63%, with a combined accuracy of 64.3% across both viable and non-viable embryos, demonstrating model robustness and generalizability beyond the result expected from chance. Distributions of predictions showed clear separation of correctly and incorrectly classified embryos. Binary comparison of viable/non-viable embryo classification demonstrated an improvement of 24.7% over embryologists' accuracy (P = 0.047, n = 2, Student's t test), and 5-band ranking comparison demonstrated an improvement of 42.0% over embryologists (P = 0.028, n = 2, Student's t test).
The AI model developed here is limited to analysis of Day 5 embryos; therefore, further evaluation or modification of the model is needed to incorporate information from different time points. The endpoint described is clinical pregnancy as measured by fetal heartbeat, and this does not indicate the probability of live birth. The current investigation was performed with retrospectively collected data, and hence it will be of importance to collect data prospectively to assess real-world use of the AI model.
These studies demonstrated an improved predictive ability for evaluation of embryo viability when compared with embryologists' traditional morphokinetic grading methods. The superior accuracy of the Life Whisperer AI model could lead to improved pregnancy success rates in IVF when used in a clinical setting. It could also potentially assist in standardization of embryo selection methods across multiple clinical environments, while eliminating the need for complex time-lapse imaging equipment. Finally, the cloud-based software application used to apply the Life Whisperer AI model in clinical practice makes it broadly applicable and globally scalable to IVF clinics worldwide.
Life Whisperer Diagnostics, Pty Ltd is a wholly owned subsidiary of the parent company, Presagen Pty Ltd. Funding for the study was provided by Presagen with grant funding received from the South Australian Government: Research, Commercialisation and Startup Fund (RCSF). 'In kind' support and embryology expertise to guide algorithm development were provided by Ovation Fertility. J.M.M.H., D.P. and M.P. are co-owners of Life Whisperer and Presagen. Presagen has filed a provisional patent for the technology described in this manuscript (52985P pending). A.P.M. owns stock in Life Whisperer, and S.M.D., A.J., T.N. and A.P.M. are employees of Life Whisperer.
VerMilyea M
,Hall JMM
,Diakiw SM
,Johnston A
,Nguyen T
,Perugini D
,Miller A
,Picou A
,Murphy AP
,Perugini M
... -
《-》
-
Development of an artificial intelligence model for predicting the likelihood of human embryo euploidy based on blastocyst images from multiple imaging systems during IVF.
Can an artificial intelligence (AI) model predict human embryo ploidy status using static images captured by optical light microscopy?
Results demonstrated predictive accuracy for embryo euploidy and showed a significant correlation between AI score and euploidy rate, based on assessment of images of blastocysts at Day 5 after IVF.
Euploid embryos displaying the normal human chromosomal complement of 46 chromosomes are preferentially selected for transfer over aneuploid embryos (abnormal complement), as they are associated with improved clinical outcomes. Currently, evaluation of embryo genetic status is most commonly performed by preimplantation genetic testing for aneuploidy (PGT-A), which involves embryo biopsy and genetic testing. The potential for embryo damage during biopsy, and the non-uniform nature of aneuploid cells in mosaic embryos, has prompted investigation of additional, non-invasive, whole embryo methods for evaluation of embryo genetic status.
A total of 15 192 blastocyst-stage embryo images with associated clinical outcomes were provided by 10 different IVF clinics in the USA, India, Spain and Malaysia. The majority of data were retrospective, with two additional prospectively collected blind datasets provided by IVF clinics using the genetics AI model in clinical practice. Of these images, a total of 5050 images of embryos on Day 5 of in vitro culture were used for the development of the AI model. These Day 5 images were provided for 2438 consecutively treated women who had undergone IVF procedures in the USA between 2011 and 2020. The remaining images were used for evaluation of performance in different settings, or otherwise excluded for not matching the inclusion criteria.
The genetics AI model was trained using static 2-dimensional optical light microscope images of Day 5 blastocysts with linked genetic metadata obtained from PGT-A. The endpoint was ploidy status (euploid or aneuploid) based on PGT-A results. Predictive accuracy was determined by evaluating sensitivity (correct prediction of euploid), specificity (correct prediction of aneuploid) and overall accuracy. The Matthew correlation coefficient and receiver-operating characteristic curves and precision-recall curves (including AUC values), were also determined. Performance was also evaluated using correlation analyses and simulated cohort studies to evaluate ranking ability for euploid enrichment.
Overall accuracy for the prediction of euploidy on a blind test dataset was 65.3%, with a sensitivity of 74.6%. When the blind test dataset was cleansed of poor quality and mislabeled images, overall accuracy increased to 77.4%. This performance may be relevant to clinical situations where confounding factors, such as variability in PGT-A testing, have been accounted for. There was a significant positive correlation between AI score and the proportion of euploid embryos, with very high scoring embryos (9.0-10.0) twice as likely to be euploid than the lowest-scoring embryos (0.0-2.4). When using the genetics AI model to rank embryos in a cohort, the probability of the top-ranked embryo being euploid was 82.4%, which was 26.4% more effective than using random ranking, and ∼13-19% more effective than using the Gardner score. The probability increased to 97.0% when considering the likelihood of one of the top two ranked embryos being euploid, and the probability of both top two ranked embryos being euploid was 66.4%. Additional analyses showed that the AI model generalized well to different patient demographics and could also be used for the evaluation of Day 6 embryos and for images taken using multiple time-lapse systems. Results suggested that the AI model could potentially be used to differentiate mosaic embryos based on the level of mosaicism.
While the current investigation was performed using both retrospectively and prospectively collected data, it will be important to continue to evaluate real-world use of the genetics AI model. The endpoint described was euploidy based on the clinical outcome of PGT-A results only, so predictive accuracy for genetic status in utero or at birth was not evaluated. Rebiopsy studies of embryos using a range of PGT-A methods indicated a degree of variability in PGT-A results, which must be considered when interpreting the performance of the AI model.
These findings collectively support the use of this genetics AI model for the evaluation of embryo ploidy status in a clinical setting. Results can be used to aid in prioritizing and enriching for embryos that are likely to be euploid for multiple clinical purposes, including selection for transfer in the absence of alternative genetic testing methods, selection for cryopreservation for future use or selection for further confirmatory PGT-A testing, as required.
Life Whisperer Diagnostics is a wholly owned subsidiary of the parent company, Presagen Holdings Pty Ltd. Funding for the study was provided by Presagen with grant funding received from the South Australian Government: Research, Commercialisation, and Startup Fund (RCSF). 'In kind' support and embryology expertise to guide algorithm development were provided by Ovation Fertility. 'In kind' support in terms of computational resources provided through the Amazon Web Services (AWS) Activate Program. J.M.M.H., D.P. and M.P. are co-owners of Life Whisperer and Presagen. S.M.D., M.A.D. and T.V.N. are employees or former employees of Life Whisperer. S.M.D, J.M.M.H, M.A.D, T.V.N., D.P. and M.P. are listed as inventors of patents relating to this work, and also have stock options in the parent company Presagen. M.V. sits on the advisory board for the global distributor of the technology described in this study and also received support for attending meetings.
N/A.
Diakiw SM
,Hall JMM
,VerMilyea MD
,Amin J
,Aizpurua J
,Giardini L
,Briones YG
,Lim AYX
,Dakka MA
,Nguyen TV
,Perugini D
,Perugini M
... -
《-》
-
Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?
What is the accuracy and agreement of embryologists when assessing the implantation probability of blastocysts using time-lapse imaging (TLI), and can it be improved with a data-driven algorithm?
The overall interobserver agreement of a large panel of embryologists was moderate and prediction accuracy was modest, while the purpose-built artificial intelligence model generally resulted in higher performance metrics.
Previous studies have demonstrated significant interobserver variability amongst embryologists when assessing embryo quality. However, data concerning embryologists' ability to predict implantation probability using TLI is still lacking. Emerging technologies based on data-driven tools have shown great promise for improving embryo selection and predicting clinical outcomes.
TLI video files of 136 embryos with known implantation data were retrospectively collected from two clinical sites between 2018 and 2019 for the performance assessment of 36 embryologists and comparison with a deep neural network (DNN).
We recruited 39 embryologists from 13 different countries. All participants were blinded to clinical outcomes. A total of 136 TLI videos of embryos that reached the blastocyst stage were used for this experiment. Each embryo's likelihood of successfully implanting was assessed by 36 embryologists, providing implantation probability grades (IPGs) from 1 to 5, where 1 indicates a very low likelihood of implantation and 5 indicates a very high likelihood. Subsequently, three embryologists with over 5 years of experience provided Gardner scores. All 136 blastocysts were categorized into three quality groups based on their Gardner scores. Embryologist predictions were then converted into predictions of implantation (IPG ≥ 3) and no implantation (IPG ≤ 2). Embryologists' performance and agreement were assessed using Fleiss kappa coefficient. A 10-fold cross-validation DNN was developed to provide IPGs for TLI video files. The model's performance was compared to that of the embryologists.
Logistic regression was employed for the following confounding variables: country of residence, academic level, embryo scoring system, log years of experience and experience using TLI. None were found to have a statistically significant impact on embryologist performance at α = 0.05. The average implantation prediction accuracy for the embryologists was 51.9% for all embryos (N = 136). The average accuracy of the embryologists when assessing top quality and poor quality embryos (according to the Gardner score categorizations) was 57.5% and 57.4%, respectively, and 44.6% for fair quality embryos. Overall interobserver agreement was moderate (κ = 0.56, N = 136). The best agreement was achieved in the poor + top quality group (κ = 0.65, N = 77), while the agreement in the fair quality group was lower (κ = 0.25, N = 59). The DNN showed an overall accuracy rate of 62.5%, with accuracies of 62.2%, 61% and 65.6% for the poor, fair and top quality groups, respectively. The AUC for the DNN was higher than that of the embryologists overall (0.70 DNN vs 0.61 embryologists) as well as in all of the Gardner groups (DNN vs embryologists-Poor: 0.69 vs 0.62; Fair: 0.67 vs 0.53; Top: 0.77 vs 0.54).
Blastocyst assessment was performed using video files acquired from time-lapse incubators, where each video contained data from a single focal plane. Clinical data regarding the underlying cause of infertility and endometrial thickness before the transfer was not available, yet may explain implantation failure and lower accuracy of IPGs. Implantation was defined as the presence of a gestational sac, whereas the detection of fetal heartbeat is a more robust marker of embryo viability. The raw data were anonymized to the extent that it was not possible to quantify the number of unique patients and cycles included in the study, potentially masking the effect of bias from a limited patient pool. Furthermore, the lack of demographic data makes it difficult to draw conclusions on how representative the dataset was of the wider population. Finally, embryologists were required to assess the implantation potential, not embryo quality. Although this is not the traditional approach to embryo evaluation, morphology/morphokinetics as a means of assessing embryo quality is believed to be strongly correlated with viability and, for some methods, implantation potential.
Embryo selection is a key element in IVF success and continues to be a challenge. Improving the predictive ability could assist in optimizing implantation success rates and other clinical outcomes and could minimize the financial and emotional burden on the patient. This study demonstrates moderate agreement rates between embryologists, likely due to the subjective nature of embryo assessment. In particular, we found that average embryologist accuracy and agreement were significantly lower for fair quality embryos when compared with that for top and poor quality embryos. Using data-driven algorithms as an assistive tool may help IVF professionals increase success rates and promote much needed standardization in the IVF clinic. Our results indicate a need for further research regarding technological advancement in this field.
Embryonics Ltd is an Israel-based company. Funding for the study was partially provided by the Israeli Innovation Authority, grant #74556.
N/A.
Fordham DE
,Rosentraub D
,Polsky AL
,Aviram T
,Wolf Y
,Perl O
,Devir A
,Rosentraub S
,Silver DH
,Gold Zamir Y
,Bronstein AM
,Lara Lara M
,Ben Nagi J
,Alvarez A
,Munné S
... -
《-》
-
A hybrid artificial intelligence model leverages multi-centric clinical data to improve fetal heart rate pregnancy prediction across time-lapse systems.
Can artificial intelligence (AI) algorithms developed to assist embryologists in evaluating embryo morphokinetics be enriched with multi-centric clinical data to better predict clinical pregnancy outcome?
Training algorithms on multi-centric clinical data significantly increased AUC compared to algorithms that only analyzed the time-lapse system (TLS) videos.
Several AI-based algorithms have been developed to predict pregnancy, most of them based only on analysis of the time-lapse recording of embryo development. It remains unclear, however, whether considering numerous clinical features can improve the predictive performances of time-lapse based embryo evaluation.
A dataset of 9986 embryos (95.60% known clinical pregnancy outcome, 32.47% frozen transfers) from 5226 patients from 14 European fertility centers (in two countries) recorded with three different TLS was used to train and validate the algorithms. A total of 31 clinical factors were collected. A separate test set (447 videos) was used to compare performances between embryologists and the algorithm.
Clinical pregnancy (defined as a pregnancy leading to a fetal heartbeat) outcome was first predicted using a 3D convolutional neural network that analyzed videos of the embryonic development up to 2 or 3 days of development (33% of the database) or up to 5 or 6 days of development (67% of the database). The output video score was then fed as input alongside clinical features to a gradient boosting algorithm that generated a second score corresponding to the hybrid model. AUC was computed across 7-fold of the validation dataset for both models. These predictions were compared to those of 13 senior embryologists made on the test dataset.
The average AUC of the hybrid model across all 7-fold was significantly higher than that of the video model (0.727 versus 0.684, respectively, P = 0.015; Wilcoxon test). A SHapley Additive exPlanations (SHAP) analysis of the hybrid model showed that the six first most important features to predict pregnancy were morphokinetics of the embryo (video score), oocyte age, total gonadotrophin dose intake, number of embryos generated, number of oocytes retrieved, and endometrium thickness. The hybrid model was shown to be superior to embryologists with respect to different metrics, including the balanced accuracy (P ≤ 0.003; Wilcoxon test). The likelihood of pregnancy was linearly linked to the hybrid score, with increasing odds ratio (maximum P-value = 0.001), demonstrating the ranking capacity of the model. Training individual hybrid models did not improve predictive performance. A clinic hold-out experiment was conducted and resulted in AUCs ranging between 0.63 and 0.73. Performance of the hybrid model did not vary between TLS or between subgroups of embryos transferred at different days of embryonic development. The hybrid model did fare better for patients older than 35 years (P < 0.001; Mann-Whitney test), and for fresh transfers (P < 0.001; Mann-Whitney test).
Participant centers were located in two countries, thus limiting the generalization of our conclusion to wider subpopulations of patients. Not all clinical features were available for all embryos, thus limiting the performances of the hybrid model in some instances.
Our study suggests that considering clinical data improves pregnancy predictive performances and that there is no need to retrain algorithms at the clinic level unless they follow strikingly different practices. This study characterizes a versatile AI algorithm with similar performance on different time-lapse microscopes and on embryos transferred at different development stages. It can also help with patients of different ages and protocols used but with varying performances, presumably because the task of predicting fetal heartbeat becomes more or less hard depending on the clinical context. This AI model can be made widely available and can help embryologists in a wide range of clinical scenarios to standardize their practices.
Funding for the study was provided by ImVitro with grant funding received in part from BPIFrance (Bourse French Tech Emergence (DOS0106572/00), Paris Innovation Amorçage (DOS0132841/00), and Aide au Développement DeepTech (DOS0152872/00)). A.B.-C. is a co-owner of, and holds stocks in, ImVitro SAS. A.B.-C. and F.D.M. hold a patent for 'Devices and processes for machine learning prediction of in vitro fertilization' (EP20305914.2). A.D., N.D., M.M.F., and F.D.M. are or have been employees of ImVitro and have been granted stock options. X.P.-V. has been paid as a consultant to ImVitro and has been granted stocks options of ImVitro. L.C.-D. and C.G.-S. have undertaken paid consultancy for ImVitro SAS. The remaining authors have no conflicts to declare.
N/A.
Duval A
,Nogueira D
,Dissler N
,Maskani Filali M
,Delestro Matos F
,Chansel-Debordeaux L
,Ferrer-Buitrago M
,Ferrer E
,Antequera V
,Ruiz-Jorro M
,Papaxanthos A
,Ouchchane H
,Keppi B
,Prima PY
,Regnier-Vigouroux G
,Trebesses L
,Geoffroy-Siraudin C
,Zaragoza S
,Scalici E
,Sanguinet P
,Cassagnard N
,Ozanon C
,De La Fuente A
,Gómez E
,Gervoise Boyer M
,Boyer P
,Ricciarelli E
,Pollet-Villard X
,Boussommier-Calleja A
... -
《-》
-
Embryo selection through artificial intelligence versus embryologists: a systematic review.
Salih M
,Austin C
,Warty RR
,Tiktin C
,Rolnik DL
,Momeni M
,Rezatofighi H
,Reddy S
,Smith V
,Vollenhoven B
,Horta F
... -
《-》