-
Embryologist agreement when assessing blastocyst implantation probability: is data-driven prediction the solution to embryo assessment subjectivity?
What is the accuracy and agreement of embryologists when assessing the implantation probability of blastocysts using time-lapse imaging (TLI), and can it be improved with a data-driven algorithm?
The overall interobserver agreement of a large panel of embryologists was moderate and prediction accuracy was modest, while the purpose-built artificial intelligence model generally resulted in higher performance metrics.
Previous studies have demonstrated significant interobserver variability amongst embryologists when assessing embryo quality. However, data concerning embryologists' ability to predict implantation probability using TLI is still lacking. Emerging technologies based on data-driven tools have shown great promise for improving embryo selection and predicting clinical outcomes.
TLI video files of 136 embryos with known implantation data were retrospectively collected from two clinical sites between 2018 and 2019 for the performance assessment of 36 embryologists and comparison with a deep neural network (DNN).
We recruited 39 embryologists from 13 different countries. All participants were blinded to clinical outcomes. A total of 136 TLI videos of embryos that reached the blastocyst stage were used for this experiment. Each embryo's likelihood of successfully implanting was assessed by 36 embryologists, providing implantation probability grades (IPGs) from 1 to 5, where 1 indicates a very low likelihood of implantation and 5 indicates a very high likelihood. Subsequently, three embryologists with over 5 years of experience provided Gardner scores. All 136 blastocysts were categorized into three quality groups based on their Gardner scores. Embryologist predictions were then converted into predictions of implantation (IPG ≥ 3) and no implantation (IPG ≤ 2). Embryologists' performance and agreement were assessed using Fleiss kappa coefficient. A 10-fold cross-validation DNN was developed to provide IPGs for TLI video files. The model's performance was compared to that of the embryologists.
Logistic regression was employed for the following confounding variables: country of residence, academic level, embryo scoring system, log years of experience and experience using TLI. None were found to have a statistically significant impact on embryologist performance at α = 0.05. The average implantation prediction accuracy for the embryologists was 51.9% for all embryos (N = 136). The average accuracy of the embryologists when assessing top quality and poor quality embryos (according to the Gardner score categorizations) was 57.5% and 57.4%, respectively, and 44.6% for fair quality embryos. Overall interobserver agreement was moderate (κ = 0.56, N = 136). The best agreement was achieved in the poor + top quality group (κ = 0.65, N = 77), while the agreement in the fair quality group was lower (κ = 0.25, N = 59). The DNN showed an overall accuracy rate of 62.5%, with accuracies of 62.2%, 61% and 65.6% for the poor, fair and top quality groups, respectively. The AUC for the DNN was higher than that of the embryologists overall (0.70 DNN vs 0.61 embryologists) as well as in all of the Gardner groups (DNN vs embryologists-Poor: 0.69 vs 0.62; Fair: 0.67 vs 0.53; Top: 0.77 vs 0.54).
Blastocyst assessment was performed using video files acquired from time-lapse incubators, where each video contained data from a single focal plane. Clinical data regarding the underlying cause of infertility and endometrial thickness before the transfer was not available, yet may explain implantation failure and lower accuracy of IPGs. Implantation was defined as the presence of a gestational sac, whereas the detection of fetal heartbeat is a more robust marker of embryo viability. The raw data were anonymized to the extent that it was not possible to quantify the number of unique patients and cycles included in the study, potentially masking the effect of bias from a limited patient pool. Furthermore, the lack of demographic data makes it difficult to draw conclusions on how representative the dataset was of the wider population. Finally, embryologists were required to assess the implantation potential, not embryo quality. Although this is not the traditional approach to embryo evaluation, morphology/morphokinetics as a means of assessing embryo quality is believed to be strongly correlated with viability and, for some methods, implantation potential.
Embryo selection is a key element in IVF success and continues to be a challenge. Improving the predictive ability could assist in optimizing implantation success rates and other clinical outcomes and could minimize the financial and emotional burden on the patient. This study demonstrates moderate agreement rates between embryologists, likely due to the subjective nature of embryo assessment. In particular, we found that average embryologist accuracy and agreement were significantly lower for fair quality embryos when compared with that for top and poor quality embryos. Using data-driven algorithms as an assistive tool may help IVF professionals increase success rates and promote much needed standardization in the IVF clinic. Our results indicate a need for further research regarding technological advancement in this field.
Embryonics Ltd is an Israel-based company. Funding for the study was partially provided by the Israeli Innovation Authority, grant #74556.
N/A.
Fordham DE
,Rosentraub D
,Polsky AL
,Aviram T
,Wolf Y
,Perl O
,Devir A
,Rosentraub S
,Silver DH
,Gold Zamir Y
,Bronstein AM
,Lara Lara M
,Ben Nagi J
,Alvarez A
,Munné S
... -
《-》
-
Discard or not discard, that is the question: an international survey across 117 embryologists on the clinical management of borderline quality blastocysts.
Chiappetta V
,Innocenti F
,Coticchio G
,Ahlström A
,Albricci L
,Badajoz V
,Hebles M
,Gallardo M
,Benini F
,Canosa S
,Kumpošt J
,Milton K
,Montanino Oliva D
,Maggiulli R
,Rienzi L
,Cimadomo D
... -
《-》
-
Development of an artificial intelligence-based assessment model for prediction of embryo viability using static images captured by optical light microscopy during IVF.
Can an artificial intelligence (AI)-based model predict human embryo viability using images captured by optical light microscopy?
We have combined computer vision image processing methods and deep learning techniques to create the non-invasive Life Whisperer AI model for robust prediction of embryo viability, as measured by clinical pregnancy outcome, using single static images of Day 5 blastocysts obtained from standard optical light microscope systems.
Embryo selection following IVF is a critical factor in determining the success of ensuing pregnancy. Traditional morphokinetic grading by trained embryologists can be subjective and variable, and other complementary techniques, such as time-lapse imaging, require costly equipment and have not reliably demonstrated predictive ability for the endpoint of clinical pregnancy. AI methods are being investigated as a promising means for improving embryo selection and predicting implantation and pregnancy outcomes.
These studies involved analysis of retrospectively collected data including standard optical light microscope images and clinical outcomes of 8886 embryos from 11 different IVF clinics, across three different countries, between 2011 and 2018.
The AI-based model was trained using static two-dimensional optical light microscope images with known clinical pregnancy outcome as measured by fetal heartbeat to provide a confidence score for prediction of pregnancy. Predictive accuracy was determined by evaluating sensitivity, specificity and overall weighted accuracy, and was visualized using histograms of the distributions of predictions. Comparison to embryologists' predictive accuracy was performed using a binary classification approach and a 5-band ranking comparison.
The Life Whisperer AI model showed a sensitivity of 70.1% for viable embryos while maintaining a specificity of 60.5% for non-viable embryos across three independent blind test sets from different clinics. The weighted overall accuracy in each blind test set was >63%, with a combined accuracy of 64.3% across both viable and non-viable embryos, demonstrating model robustness and generalizability beyond the result expected from chance. Distributions of predictions showed clear separation of correctly and incorrectly classified embryos. Binary comparison of viable/non-viable embryo classification demonstrated an improvement of 24.7% over embryologists' accuracy (P = 0.047, n = 2, Student's t test), and 5-band ranking comparison demonstrated an improvement of 42.0% over embryologists (P = 0.028, n = 2, Student's t test).
The AI model developed here is limited to analysis of Day 5 embryos; therefore, further evaluation or modification of the model is needed to incorporate information from different time points. The endpoint described is clinical pregnancy as measured by fetal heartbeat, and this does not indicate the probability of live birth. The current investigation was performed with retrospectively collected data, and hence it will be of importance to collect data prospectively to assess real-world use of the AI model.
These studies demonstrated an improved predictive ability for evaluation of embryo viability when compared with embryologists' traditional morphokinetic grading methods. The superior accuracy of the Life Whisperer AI model could lead to improved pregnancy success rates in IVF when used in a clinical setting. It could also potentially assist in standardization of embryo selection methods across multiple clinical environments, while eliminating the need for complex time-lapse imaging equipment. Finally, the cloud-based software application used to apply the Life Whisperer AI model in clinical practice makes it broadly applicable and globally scalable to IVF clinics worldwide.
Life Whisperer Diagnostics, Pty Ltd is a wholly owned subsidiary of the parent company, Presagen Pty Ltd. Funding for the study was provided by Presagen with grant funding received from the South Australian Government: Research, Commercialisation and Startup Fund (RCSF). 'In kind' support and embryology expertise to guide algorithm development were provided by Ovation Fertility. J.M.M.H., D.P. and M.P. are co-owners of Life Whisperer and Presagen. Presagen has filed a provisional patent for the technology described in this manuscript (52985P pending). A.P.M. owns stock in Life Whisperer, and S.M.D., A.J., T.N. and A.P.M. are employees of Life Whisperer.
VerMilyea M
,Hall JMM
,Diakiw SM
,Johnston A
,Nguyen T
,Perugini D
,Miller A
,Picou A
,Murphy AP
,Perugini M
... -
《-》
-
Should we freeze it? Agreement on fate of borderline blastocysts is poor and does not improve with a modified blastocyst grading system.
What is the inter-observer agreement among embryologists for decision to freeze blastocysts of borderline morphology and can it be improved with a modified grading system?
The inter-observer agreement among embryologists deciding whether to freeze blastocysts of marginal morphology was low and was not improved by a modified grading system.
While previous research on inter-observer variability on the decision of which embryo to transfer from a cohort of blastocysts is good, the impact of grading variability regarding decision to freeze borderline blastocysts has not been investigated. Agreement for inner cell mass (ICM) and trophectoderm (TE) grade is only fair, factors which contribute to the grade that influences decision to freeze.
This was a prospective study involving 18 embryologists working at four different IVF clinics within a single organisation between January 2019 and July 2019.
All embryologists currently practicing blastocyst grading at a multi-site organisation were invited to participate. The survey was comprised of blastocyst images in three planes and asked (i) the likelihood of freezing and (ii) whether the blastocyst would be frozen based on visual assessment. Blastocysts varied by quality and were categorised as either top (n = 20), borderline (n = 60) or non-viable/degenerate quality (n = 20). A total of 1800 freeze decisions were assessed. To assess the impact of grading criteria on inter-observer agreement for decision to freeze, the survey was taken once when the embryologists used the Gardner criteria and again 6 months after transitioning to a modified Gardner criterion with four grades for ICM and TE. The fourth grade was introduced with the aim to promote higher levels of agreement for the clinical usability decision when the blastocyst was of marginal quality.
The inter-observer agreement for decision to freeze was near perfect (kappa 1.0) for top and non-viable/degenerate quality blastocysts, and this was not affected by the blastocysts grading criteria used (top quality; P = 0.330 and non-viable/degenerate quality; P = 0.18). In contrast, the cohort of borderline blastocysts received a mixed freeze rate (average 52.7%) during the first survey, indicative of blastocysts that showed uncertain viability and promoting significant disagreement for decision to freeze among the embryologists (kappa 0.304). After transitioning to a modified Gardner criteria with an additional grading tier, the average freeze rate increased (64.8%; P < 0.0001); however, the inter-observer agreement for decision to freeze was unchanged (kappa 0.301). Therefore, significant disagreement for decision to freeze among embryologists is an ongoing issue not resolved by the two grading criteria assessed here.
Blastocyst assessment was performed from time-lapse images in three planes, rather than with a microscope in the laboratory. The inter-observer agreement for decision to freeze may be lower for embryologists working in different clinics with different grading protocols.
The decision to freeze a blastocyst with borderline morphology is a common clinical issue that has the potential to arise for any patient during blastocyst culture. Disagreement for decision to freeze these blastocysts, and therefore clinical usability in frozen embryo transfer cycles, affects consistency in patient care due to a potential impact on cumulative live birth rates, as well as financial, emotional and time costs associated with the frozen embryo transfer cycles. We demonstrate significant disagreement for decision to freeze borderline blastocysts among embryologists using the same grading scheme within a large multisite organisation, a phenomenon which was not improved with a modified grading system. Decision-making around borderline embryos is an area requiring further research, especially as studies continue to demonstrate the reduced but modest live birth rates for low quality blastocysts (Grade C). These results provide support for emerging technology for embryo assessment, such as artificial intelligence.
None declared.
Not applicable.
Hammond ER
,Foong AKM
,Rosli N
,Morbeck DE
... -
《-》
-
Development of an artificial intelligence model for predicting the likelihood of human embryo euploidy based on blastocyst images from multiple imaging systems during IVF.
Can an artificial intelligence (AI) model predict human embryo ploidy status using static images captured by optical light microscopy?
Results demonstrated predictive accuracy for embryo euploidy and showed a significant correlation between AI score and euploidy rate, based on assessment of images of blastocysts at Day 5 after IVF.
Euploid embryos displaying the normal human chromosomal complement of 46 chromosomes are preferentially selected for transfer over aneuploid embryos (abnormal complement), as they are associated with improved clinical outcomes. Currently, evaluation of embryo genetic status is most commonly performed by preimplantation genetic testing for aneuploidy (PGT-A), which involves embryo biopsy and genetic testing. The potential for embryo damage during biopsy, and the non-uniform nature of aneuploid cells in mosaic embryos, has prompted investigation of additional, non-invasive, whole embryo methods for evaluation of embryo genetic status.
A total of 15 192 blastocyst-stage embryo images with associated clinical outcomes were provided by 10 different IVF clinics in the USA, India, Spain and Malaysia. The majority of data were retrospective, with two additional prospectively collected blind datasets provided by IVF clinics using the genetics AI model in clinical practice. Of these images, a total of 5050 images of embryos on Day 5 of in vitro culture were used for the development of the AI model. These Day 5 images were provided for 2438 consecutively treated women who had undergone IVF procedures in the USA between 2011 and 2020. The remaining images were used for evaluation of performance in different settings, or otherwise excluded for not matching the inclusion criteria.
The genetics AI model was trained using static 2-dimensional optical light microscope images of Day 5 blastocysts with linked genetic metadata obtained from PGT-A. The endpoint was ploidy status (euploid or aneuploid) based on PGT-A results. Predictive accuracy was determined by evaluating sensitivity (correct prediction of euploid), specificity (correct prediction of aneuploid) and overall accuracy. The Matthew correlation coefficient and receiver-operating characteristic curves and precision-recall curves (including AUC values), were also determined. Performance was also evaluated using correlation analyses and simulated cohort studies to evaluate ranking ability for euploid enrichment.
Overall accuracy for the prediction of euploidy on a blind test dataset was 65.3%, with a sensitivity of 74.6%. When the blind test dataset was cleansed of poor quality and mislabeled images, overall accuracy increased to 77.4%. This performance may be relevant to clinical situations where confounding factors, such as variability in PGT-A testing, have been accounted for. There was a significant positive correlation between AI score and the proportion of euploid embryos, with very high scoring embryos (9.0-10.0) twice as likely to be euploid than the lowest-scoring embryos (0.0-2.4). When using the genetics AI model to rank embryos in a cohort, the probability of the top-ranked embryo being euploid was 82.4%, which was 26.4% more effective than using random ranking, and ∼13-19% more effective than using the Gardner score. The probability increased to 97.0% when considering the likelihood of one of the top two ranked embryos being euploid, and the probability of both top two ranked embryos being euploid was 66.4%. Additional analyses showed that the AI model generalized well to different patient demographics and could also be used for the evaluation of Day 6 embryos and for images taken using multiple time-lapse systems. Results suggested that the AI model could potentially be used to differentiate mosaic embryos based on the level of mosaicism.
While the current investigation was performed using both retrospectively and prospectively collected data, it will be important to continue to evaluate real-world use of the genetics AI model. The endpoint described was euploidy based on the clinical outcome of PGT-A results only, so predictive accuracy for genetic status in utero or at birth was not evaluated. Rebiopsy studies of embryos using a range of PGT-A methods indicated a degree of variability in PGT-A results, which must be considered when interpreting the performance of the AI model.
These findings collectively support the use of this genetics AI model for the evaluation of embryo ploidy status in a clinical setting. Results can be used to aid in prioritizing and enriching for embryos that are likely to be euploid for multiple clinical purposes, including selection for transfer in the absence of alternative genetic testing methods, selection for cryopreservation for future use or selection for further confirmatory PGT-A testing, as required.
Life Whisperer Diagnostics is a wholly owned subsidiary of the parent company, Presagen Holdings Pty Ltd. Funding for the study was provided by Presagen with grant funding received from the South Australian Government: Research, Commercialisation, and Startup Fund (RCSF). 'In kind' support and embryology expertise to guide algorithm development were provided by Ovation Fertility. 'In kind' support in terms of computational resources provided through the Amazon Web Services (AWS) Activate Program. J.M.M.H., D.P. and M.P. are co-owners of Life Whisperer and Presagen. S.M.D., M.A.D. and T.V.N. are employees or former employees of Life Whisperer. S.M.D, J.M.M.H, M.A.D, T.V.N., D.P. and M.P. are listed as inventors of patents relating to this work, and also have stock options in the parent company Presagen. M.V. sits on the advisory board for the global distributor of the technology described in this study and also received support for attending meetings.
N/A.
Diakiw SM
,Hall JMM
,VerMilyea MD
,Amin J
,Aizpurua J
,Giardini L
,Briones YG
,Lim AYX
,Dakka MA
,Nguyen TV
,Perugini D
,Perugini M
... -
《-》