Advertisement
Research Article|Articles in Press, 100294

Exploring healthy retinal aging with deep learning

Open AccessPublished:February 28, 2023DOI:https://doi.org/10.1016/j.xops.2023.100294

      Abstract

      Purpose

      To study the individual course of retinal changes caused by healthy aging using deep learning.

      Design

      Retrospective analysis of a large dataset of retinal optical coherence tomography (OCT) images.

      Participants

      Eighty-five thousand seven hundred and nine adults between the age of 40 and 75 years of whom OCT images were acquired in the scope of the UK Biobank population study.

      Methods

      We created a counterfactual generative adversarial network (GAN), a type of neural network, that learns from cross-sectional, retrospective data. It then synthesizes high-resolution counterfactual OCT images and longitudinal time series. These counterfactuals allow visualization and analysis of hypothetical scenarios in which certain characteristics of the imaged subject, such as age or sex, are altered while other attributes, crucially the subject’s identity and image acquisition settings, remain fixed.

      Main Outcome Measures

      Using our counterfactual GAN, we investigated subject-specific changes to the retinal layer structure as a function of age and sex. In particular, we measured changes to the retinal nerve fiber layer (RNFL), combined ganglion cell layer plus inner plexiform layer (GCIPL), inner nuclear layer to inner boundary of retinal pigment epithelium (INL-RPE) and retinal pigment epithelium (RPE).

      Results

      Our counterfactual GAN is able to smoothly visualize the individual course of retinal aging. Across all counterfactual images, the RNFL, GCIPL, INL-RPE and RPE changed by −0.1μm ± 0.1μm, −0.5μm ± 0.2μm, −0.2μm ± 0.1μm and 0.1μm ± 0.1μm, respectively, per decade of age. These results agree well with previous studies based on the same cohort from the UK Biobank population study. Beyond population-wide average measures, our counterfactual GAN allows us to explore whether the retinal layers of a given eye will increase in thickness, decrease in thickness or stagnate as a subject ages.

      Conclusions

      This study demonstrates how counterfactual GANs can aid research into retinal aging by generating high-resolution, high-fidelity OCT images and longitudinal time series. Ultimately, we envision that they will enable clinical experts to derive and explore hypotheses for potential imaging biomarkers for healthy and pathological aging that can be refined and tested in prospective clinical trials.

      Keywords

      List of acronyms:

      OCT (optical coherence tomography), GAN (generative adversarial networks), ROCAUC (area under the receiver operating characteristic curve), RNFL (retinal nerve fiber layer), GCIPL (combined ganglion cell layer plus inner plexiform layer), INL-RPE (inner nuclear layer to inner boundary of retinal pigment epithelium), RPE (retinal pigment epithelium)

      1. Introduction

      Many retinal diseases, such as age-related macular degeneration and diabetic retinopathy, develop gradually over time.
      • Stitt AW
      • Curtis TM
      • Chen M
      • et al.
      The progress in understanding and treatment of diabetic retinopathy.
      ,
      • Mitchell P
      • Liew G
      • Gopinath B
      • Wong TY
      Age-related macular degeneration.
      Clinicians are able to track their progression using optical coherence tomography (OCT) imaging, which provides high-resolution images of the retina.
      • Adhi M
      • Duker JS
      Optical coherence tomography–current and future applications.
      However, the retina also undergoes age-related physiological changes.
      • Gao H
      • Hollyfield J
      Aging of the human retina. Differential loss of neurons and retinal pigment epithelial cells.
      A good understanding how healthy aging manifests itself in the retina is a crucial prerequisite to distinguish between normal and pathological changes and effectively diagnose, prognose and treat ocular diseases.
      The retina has been extensively studied by retrospectively or prospectively collecting large amounts of OCT images from representative populations.
      • Leung CKs
      • Cheung CYl
      • Weinreb RN
      • et al.
      Retinal nerve fiber layer imaging with spectral-domain optical coherence tomography: a variability and diagnostic performance study.
      • Sung KR
      • Wollstein G
      • Bilonick RA
      • et al.
      Effects of age on optical coherence tomography measurements of healthy retinal nerve fiber layer, macula, and optic nerve head.
      • Mwanza JC
      • Durbin MK
      • Budenz DL
      • et al.
      Profile and predictors of normal ganglion cell–inner plexiform layer thickness measured with frequency-domain optical coherence tomography.
      • Ooto S
      • Hangai M
      • Tomidokoro A
      • et al.
      Effects of age, sex, and axial length on the three-dimensional profile of normal macular layer structures.
      • Koh VT
      • Tham YC
      • Cheung CY
      • et al.
      Determinants of ganglion cell–inner plexiform layer thickness measured by high-definition optical coherence tomography.
      • Gupta P
      • Sidhartha E
      • Tham YC
      • et al.
      Determinants of macular thickness using spectral domain optical coherence tomography in healthy eyes: the Singapore Chinese Eye study.
      • Myers CE
      • Klein BE
      • Meuer SM
      • et al.
      Retinal thickness measured by spectral-domain optical coherence tomography in eyes without retinal abnormalities: the Beaver Dam Eye Study.
      • Ko F
      • Foster PJ
      • Strouthidis NG
      • et al.
      Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      The pooled images are analyzed by measuring the shape and thickness of individual retinal layers. By identifying population-wide correlations between the eyes’ structure and demographic, lifestyle and medical information, researchers are able to find and validate imaging biomarkers. Supported by the emergence of large population studies and automated tools for processing of medical images,
      • Schmidt-Erfurth U
      • Sadeghipour A
      • Gerendas BS
      • Waldstein SM
      • Bogunovic H
      Artificial intelligence in retina.
      these approaches have successfully found links between age and changes to the nerve fiber layer,
      • Leung CKs
      • Cheung CYl
      • Weinreb RN
      • et al.
      Retinal nerve fiber layer imaging with spectral-domain optical coherence tomography: a variability and diagnostic performance study.
      ,
      • Sung KR
      • Wollstein G
      • Bilonick RA
      • et al.
      Effects of age on optical coherence tomography measurements of healthy retinal nerve fiber layer, macula, and optic nerve head.
      ,
      • Ooto S
      • Hangai M
      • Tomidokoro A
      • et al.
      Effects of age, sex, and axial length on the three-dimensional profile of normal macular layer structures.
      ,
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      ganglion cell complex,
      • Mwanza JC
      • Durbin MK
      • Budenz DL
      • et al.
      Profile and predictors of normal ganglion cell–inner plexiform layer thickness measured with frequency-domain optical coherence tomography.
      • Ooto S
      • Hangai M
      • Tomidokoro A
      • et al.
      Effects of age, sex, and axial length on the three-dimensional profile of normal macular layer structures.
      • Koh VT
      • Tham YC
      • Cheung CY
      • et al.
      Determinants of ganglion cell–inner plexiform layer thickness measured by high-definition optical coherence tomography.
      ,
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      photoreceptor layers
      • Ooto S
      • Hangai M
      • Tomidokoro A
      • et al.
      Effects of age, sex, and axial length on the three-dimensional profile of normal macular layer structures.
      ,
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      and retinal pigment epithelium.
      • Ko F
      • Foster PJ
      • Strouthidis NG
      • et al.
      Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
      However, these population-based studies have several shortcomings. Usually, pooled datasets only include a single scan of each eye. Even if time series data is available, it is rare that a subject is monitored for longer than a couple of years. Furthermore, imaging conditions change between subsequent visits. The retina may appear differently due to varying levels of pupil dilation, changes in OCT scanner hardware and software and different orientation of the eye. Consequently, population-based studies are limited in their ability to evaluate the development of the eye on a subject-specific level and resolve subtle retinal changes that occur over the course of decades.
      In this study, we use deep learning to study the individual course of retinal changes caused by healthy aging. Our counterfactual generative adversarial network (GAN), a type of neural network, learns from cross-sectional retrospective data. It then synthesizes high-resolution counterfactual OCT images and longitudinal time series. These counterfactuals reflect hypothetical scenarios in which certain characteristics of the imaged subject, such as age or sex, are altered while other attributes, crucially the subject’s identity and image acquisition settings, remain fixed. Such counterfactual images allow the investigation of what-if questions that are impossible to answer in population-based studies. Examples of such counterfactual queries are “how will this person’s eye look in 20 years?” or “how would this eye look if the subject was born as the opposite sex?" We extensively benchmark the visual fidelity and realism of the generated counterfactual images before ultimately demonstrating the utility of our proposed method by quantifying the subject-specific retinal layer structure as a function of age and sex.

      2. Methods

      An overview of our method and study workflow is presented in Figure 1. After introducing the used dataset of OCT images in Section 2.1, we describe the counterfactual GAN in Section 2.2. Sections 2.3 and 2.4 present our experiments to measure the visual fidelity and realism of the artificial OCT images, respectively. Finally, Section 2.5 describes how we extract and analyze the retinal layer structure from the counterfactual images.
      Figure thumbnail gr1
      Figure 1Workflow diagram explaining how the counterfactual GAN is trained (top part), used to generate counterfactual OCT images (middle part) and benchmarked and utilized (bottom part).

      2.1 Participants and OCT image dataset

      We used the OCT image dataset that has been acquired as part of the UK Biobank population study. UK Biobank has collected extensive demographic, lifestyle, health and medical imaging information from more than 500000 members of the United Kingdom’s general public.
      • Sudlow C
      • Gallacher J
      • Allen N
      • et al.
      UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
      In its scope, 175844 retinal OCT scans of 85709 participants were acquired using a Topcon 3D OCT-1000 Mark II device (Topcon Corporation, Tokyo, Japan).
      • Keane PA
      • Grossi CM
      • Foster PJ
      • et al.
      Optical coherence tomography in the UK biobank study–rapid automated analysis of retinal thickness for large population-based studies.
      ,
      • Patel PJ
      • Foster PJ
      • Grossi CM
      • et al.
      Spectral-domain optical coherence tomography imaging in 67 321 adults: associations with macular thickness in the UK Biobank Study.
      The UK Biobank population study has been reviewed and approved by the North West Multi-centre Research Ethics Committee in accordance to the tenets of the Declaration of Helsinki, so that additional ethical approval was not required for our study.
      During image preprocessing, we filtered out scans of poor image quality using an intensity-histogram-based score (see Figure 2).
      • Stein D
      • Ishikawa H
      • Hariprasad R
      • et al.
      A new quality assessment parameter for optical coherence tomography.
      We also excluded any subjects that reported being affected by age-related macular degeneration, diabetic retinopathy, glaucoma, cataracts, previous eye trauma or other serious eye disease. Next, 11 retinal layer surfaces of the three-dimensional OCT scans were segmented using the Iowa Reference Algorithms (Retinal Image Analysis Lab, Iowa Institute for Biomedical Imaging, Iowa City, IA, USA).
      • Li K
      • Wu X
      • Chen DZ
      • Sonka M
      Optimal surface segmentation in volumetric images-a graph-theoretic approach.
      • Garvin MK
      • Abramoff MD
      • Wu X
      • Russell SR
      • Burns TL
      • Sonka M
      Automated 3-D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images.
      • Abràmoff MD
      • Garvin MK
      • Sonka M
      Retinal imaging and image analysis.
      The obtained layer segmentations were used to flatten and register all images. During flattening the images were sheared so that the outer boundary of the retinal pigment epithelium is orientated horizontally. The center of the fovea was defined as the position with the minimal distance between the inner limiting membrane and outer plexiform layer. We extracted the transverse two-dimensional slice that passes through this position. Finally, all images were resampled to 224×224 pixel with a pixel size of 23.4×7.0 μm
      • Mitchell P
      • Liew G
      • Gopinath B
      • Wong TY
      Age-related macular degeneration.
      , half the median resolution.
      Figure thumbnail gr2
      Figure 2Data flowchart presenting the data inclusion and exclusion criteria as well as the final split into the three independent datasets used in the study.
      Preprocessing yielded 117246 highly standardized images of 65831 subjects. The dataset was then split into three subdatasets (see Figure 2). 46444 scans of the right eye of 46444 subjects were used for training of the counterfactual GAN. 20000 images of 10000 eye pairs were used to train a set of referee neural networks that evaluated the generated images. 2112 images of 528 subjects, for which initial and follow-up scans of both eyes were available, were used for final testing. As images of left eyes were only used for training of the referee networks and final testing but not for GAN training, we ended up not using 48688 images from our dataset.

      2.2 GAN to synthesize counterfactual OCT images

      The task of generating counterfactual OCT images was formulated as image translation using a GAN.

      Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets Advances in neural information processing systems. 2014;27.

      Our counterfactual GAN consists of two neural networks, a generator and a discriminator (see Figure 1). The generator is provided with a real OCT image and a counterfactual query, which consists of the target age and sex encoded as vector. Given these inputs, the generator is tasked with creating a counterfactual image. These images are provided to the discriminator together with a set of real images from the training dataset. For each image, the discriminator has to establish whether it is real or artificially generated. The discriminator also has to estimate the age and sex of the subject in each case. Finally, the counterfactual images are passed through the generator once again with the goal of changing their appearance back to their original state.
      Based on these training objectives, the generator and discriminator are trained simultaneously in a zero-sum-game.
      The discriminator learns to identify artificially generated images and how patient attributes manifest themselves in OCT images. The generator learns to fool the discriminator by creating realistic OCT images that appear according to the counterfactual query. At the same time, the generator is incentivized to preserve the eyes’ identities by generating images that can be converted back to their original appearances.
      During inference, the trained generator receives an existing real OCT image and a counterfactual query and creates a corresponding counterfactual image (see Figure 1). The neural network framework was adapted from work by Choi et al.

      Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation in Proceedings of the IEEE conference on computer vision and pattern recognition:8789– 8797 2018.

      We describe our modifications to it and the full network architecture, training procedure and used hyperparameters in the supplemental material.

      2.3 Visual Turing test to assess image realism

      In order to quantitatively assess the realism of the counterfactual images, we conducted a visual Turing test. The test measured the ability of five expert ophthalmologists to distinguish between real and artificially generated OCT images. In order to ensure a fair comparison, all real images were downsampled and flattened according to the previously reported image preprocessing steps (see Section 2.1). Initially, the participants were given the option to review up to 100 real images. Afterwards, they were shown 50 real OCT images and 50 artificially generated images in random order and had to determine which ones were real and which ones were fake. We report the average accuracy of all ophthalmologists.

      2.4 Neural-network-based quantification of counterfactual age, sex and identity

      It is crucial that the counterfactual GAN faithfully models the effect of subject age and sex, while simultaneously preserving the subject identity. In order to measure this capability, we trained three referee neural networks built according to the well-established Resnet50 architecture.

      He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition in Proceedings of the IEEE conference on computer vision and pattern recognition:770–778 2016.

      The age prediction network estimates the subject’s age from an input OCT image. The sex classification network predicts whether a given OCT image belongs to a male or female subject. The identity matching network learns to assign a similarity score to an image pair consisting of a right and left eye. A high similarity score indicates that the two eyes belong to the same subject. Full network configuration and training procedure is included in the supplemental material.
      Before ultimately using the three referee networks to evaluate the counterfactual images, we benchmarked their performance on an independent subset of real OCT images. The age regression network estimated the subject’s age with a mean absolute error of 4.1 years. The sex classification network determined the subject’s sex with an accuracy of 79.5% and an area under the receiver operating characteristic curve (ROCAUC) of 0.90. The identity matching network was tasked with matching 2000 right and left eyes belonging to 1000 different subjects and achieved a sensitivity of 95.8% and a specificity of 97.7%. During evaluation, the three referee networks were shown 10000 counterfactual images. The counterfactual queries were evenly split between male and female sex and distributed across a uniform age distribution spanning from 40 to 75 years, the age range of the subjects in the training dataset. We report whether the determined age, sex and identity matched the corresponding counterfactual queries.

      2.5 Extraction and analysis of the retinal layer structure

      Finally, we analyzed the retinal structure in the OCT images. To this end, we trained a Resnet50 neural network to segment 11 retinal surfaces following the approach by Shah et al.
      • Shah A
      • Zhou L
      • Abrámoff MD
      • Wu X
      Multiple surface segmentation using convolution neural nets: application to retinal layer segmentation in OCT images.
      We used real OCT images and the previously obtained layer segmentations as training data (see Section 2.1). The network was able to accurately localize the retinal layers in an independent subset of 1000 real OCT images. Across all eleven layers, the mean absolute difference between the predicted and ground truth layer segmentations was 4.2μm ± 7.5μm. As we do not have ground truth annotations for the counterfactual OCT images, we cannot quantitatively assess the segmentation network’s performance on these images. However, we visually confirmed that the network is able to delineate the layers in artificially generated images before proceeding to process all counterfactual images. Additionally, we segmented and analyzed the real OCT images that were used to train the counterfactual GAN, resembling conventional population-based studies.
      • Ko F
      • Foster PJ
      • Strouthidis NG
      • et al.
      Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      More details can be found in the supplementary material.
      In this study, we focused on the retinal nerve fiber layer (RNFL), combined ganglion cell layer plus inner plexiform layer (GCIPL), inner nuclear layer to inner boundary of retinal pigment epithelium (INL-RPE), which contains the photoreceptor layers, and retinal pigment epithelium (RPE). We chose these retinal layers because their age-related changes have previously been researched using the UK Biobank database.
      • Ko F
      • Foster PJ
      • Strouthidis NG
      • et al.
      Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      We report the average thickness as well as effect of age and sex for each of the four layers. We further calculated these measures in each of the following five subfields of the retina, the outer temporal subfield, inner temporal subfield, central subfield, inner nasal subfield and outer nasal subfield. As the analysis was conducted using two-dimensional images, we cannot report results for the superior and inferior subfields. In order to better compare our findings with those of other studies, we corrected our one-dimensional measurements by assuming the same thickness profile in the entire two-dimensional subfield.

      3. Results

      3.1 Counterfactual OCT images to visualize the impact of healthy retinal aging and subject sex

      Our counterfactual GAN can smoothly visualize the individual course of retinal aging. Based on a single input image, it provides a plausible hypothesis how a specific eye will look several decades into the future or how it appeared in the past (see Figure 3). By comparison, population-based approaches are limited with regard to the availability, frequency and range of time series data. In the case of the UK Biobank dataset, follow-up OCT scans were acquired from fewer than 5% of the subjects and are dated only two to four years after the initial scan. Retinal layer orientation, image brightness and contrast is preserved in the counterfactual images while it fluctuates in the follow-up scans of the UK Biobank dataset. This allows focusing on subtle retinal changes which are difficult to appreciate in conventionally acquired time series. When visually inspecting the generated counterfactual time series, we found that increased age was associated with changes to several retinal layers, including the RNFL, photoreceptor layers and RPE. This agrees with previously reported findings
      • Leung CKs
      • Cheung CYl
      • Weinreb RN
      • et al.
      Retinal nerve fiber layer imaging with spectral-domain optical coherence tomography: a variability and diagnostic performance study.
      ,
      • Sung KR
      • Wollstein G
      • Bilonick RA
      • et al.
      Effects of age on optical coherence tomography measurements of healthy retinal nerve fiber layer, macula, and optic nerve head.
      ,
      • Ooto S
      • Hangai M
      • Tomidokoro A
      • et al.
      Effects of age, sex, and axial length on the three-dimensional profile of normal macular layer structures.
      ,
      • Gupta P
      • Sidhartha E
      • Tham YC
      • et al.
      Determinants of macular thickness using spectral domain optical coherence tomography in healthy eyes: the Singapore Chinese Eye study.
      • Myers CE
      • Klein BE
      • Meuer SM
      • et al.
      Retinal thickness measured by spectral-domain optical coherence tomography in eyes without retinal abnormalities: the Beaver Dam Eye Study.
      • Ko F
      • Foster PJ
      • Strouthidis NG
      • et al.
      Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      and is further analyzed in Section 3.3.
      Figure thumbnail gr3
      Figure 3The counterfactual GAN smoothly visualizes the process of healthy retinal aging on a subject-specific level. In each of the three representative examples, the first row presents the counterfactual time series as a function of age. The third row shows the two available real images from the UK Biobank dataset. The second and fourth row depict the pixel-wise difference between the time series image and the base image. Red and blue color denote image regions in which the counterfactual is brighter and darker, respectively.
      The counterfactual GAN can also simulate how an individual’s eye would appear if the subject had been born as the opposite sex (see Figure 4). Naturally, this counterfactual scenario is not possible to research in population-based studies. When changing female eyes to a male appearance, we consistently observe that the foveal pit becomes slightly deeper and more steep. In many cases the overall macular thickness increases. Conversely, when counterfactually converting male eyes to female, the retina becomes shallower and thinner.
      Figure thumbnail gr4
      Figure 4Six representative examples in which the counterfactual GAN alters a real retinal OCT scan (left image) to appear as if the subject was born as the opposite sex (middle image). The pixel-wise difference between the two images is also shown (right image). In red areas the counterfactual is brighter than the base image and in blue regions the counterfactual is darker.

      3.2 Benchmarking of counterfactual OCT images

      The realism of the counterfactual OCT images was quantified in a visual Turing test. The mean accuracy across the five ophthalmologists was 76.6% ± 18.4% (see Figure 5). While this is substantially better than random choice (i.e. 50% accuracy), the ophthalmologists were not able to correctly distinguish whether an image was real or artificially generated in many cases. Two ophthalmologists achieved a substantially higher accuracy of 98% and 94%, respectively, by looking at the choroid and vitreous in the background of the images. They also relied on spotting pathological features as well as shadowing artifacts caused by blood vessels, as these would mostly occur in real OCT images. All ophthalmologists agreed that the counterfactual GAN produces samples with a highly realistic looking retinal layer structure, which we focus on in the remaining study.
      Figure thumbnail gr5
      Figure 5Benchmarking of the counterfactual OCT images. The visual Turing test assessed the images’ realism (left-most graph). Referee neural networks determined the subject’s age and sex from the counterfactual images. We measured whether their prediction agreed with the counterfactual query (middle two graphs). A third referee neural network matched counterfactual right eyes with real left eyes. We assessed whether the correct pairing is among the top K guesses of the network as this indicated that the subject identity was preserved in the counterfactual images (right-most graph).
      Next, we used the referee networks to predict age, sex and identity from counterfactual images. We measured whether these attributes matched the corresponding counterfactual queries. The age estimated by the age prediction network agreed with the counterfactual query with a mean absolute error of 4.2 years ± 0.4 years, while being strongly correlated (Pearson’s R of 0.86 ± 0.02; see Figure 5). The sex classification network correctly predicted the counterfactual sex with an accuracy of 79.7% ± 5.8% and a ROCAUC of 0.92 ± 0.03 (see Figure 5). Finally, we tested whether the identity was preserved in the counterfactual OCT images. We counterfactually increased the age in 528 OCT scans to match the subject’s age at the time of a follow-up scan. The identity matching network compared the images of the artificially aged right eyes to images of the real left eyes. In 65.8% ± 9.1% of the cases, the referee network correctly matched the right eye to its corresponding left eye while being given 1000 candidate eyes (top-1-accuracy; see Figure 5). In 91.5% ± 3.3% of cases, the correct eye is among the top ten guesses (top-10-accuracy). Even considering the residual error of all referee networks (see Section 2.4), these results quantitatively confirm that the counterfactual GAN is able to faithfully simulate the effect of age and sex on the retina while preserving the identity of the subject.

      3.3 Retinal layer structure in counterfactual images

      We segmented and analyzed the RNFL, GCIPL, INL-RPE and RPE in the counterfactual images. Table 1 presents their mean thickness as well as the change per decade aging and effect of subject sex. The thickness of the RNFL, GCIPL and INL-RPE decreases as we increase the counterfactual age, while the RPE grows slightly with age. The RNFL and RPE are thicker in male subjects than in female subjects. We also obtained the same set of measurements directly from the real OCT images that were used to train the counterfactual GAN. This approach is similar to a conventional population-based study. The absolute thickness of the four retinal structures is very similar in the two different approaches. The counterfactual GAN accurately learned to model the impact of age and sex in the RNFL and RPE, while slightly underestimating the effect in the large GCIPL and INL-RPE structures.
      Table 1Average thickness, change per decade aging and effect of male sex for four different retinal layer structures and five retinal subfields. In each case we compare the findings obtained by analyzing the conterfactual images (“Counterfactual GAN”) and real OCT images (“Population-based”). The average thickness is reported as mean with its standard deviation, while the changes caused by aging and sex are listed as average difference and their 95% confidence intervals.
      Table thumbnail fx1

      4. Discussion

      In this study, we have created a counterfactual GAN to investigate the individual course of retinal changes caused by healthy aging. Learning from a dataset with only one OCT image per subject, our machine learning algorithm is able to generate synthetic longitudinal time series. It can visualize how the retina may develop with age on a subject-specific level. This allows studying of subtle structural changes that occur over the course of decades and cannot be resolved in conventional population-based studies. The counterfactual GAN can also simulate how a given eye would look if a person was born as the opposite sex, a scenario that cannot be researched naturally. In extensive benchmarking experiments, we confirmed that our tool creates realistic OCT images and faithfully models the influence of subject age, sex and identity on the retina.
      Our results agree well with previous studies based on the same cohort from the UK Biobank population study.
      • Ko F
      • Foster PJ
      • Strouthidis NG
      • et al.
      Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      Khawaja et al. reported very similar absolute thicknesses and impact of aging for the RNFL and GCIPL.
      • Khawaja AP
      • Chua S
      • Hysi PG
      • et al.
      Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
      They found that both structures are thinner in male than in female subjects, while we found that sex only has a small impact on the GCIPL and the opposite effect on the RNFL. The association of age and photoreceptor thickness was researched by Chua et al.
      • Chua SY
      • Dhillon B
      • Aslam T
      • et al.
      Associations with photoreceptor thickness measures in the UK Biobank.
      Our measured INL-RPE thickness was slightly larger than their reported value. The observed relationship between thickness and age was the same in their study and ours. Ko et al. measured the retinal pigment epithelium-Bruch’s membrane complex. In both their and our study, the retinal pigment epithelium thickness was barely affected by subject sex. They found that the complex thins with increasing age in subjects that are 45 years or older, which we did not observe. These differences are potentially caused by our layer segmentation algorithm not fully outlining the fine Bruch’s membrane in the lower-resolution images. This hypothesis is supported by the fact that we detect a slightly thinner RPE structure compared to their reported thicknesses. In all three comparisons, minor discrepancies may also result from the use of two-dimensional instead of three-dimensional images in our study and different algorithms for retinal layer segmentation.
      Previous work has explored the use of GANs for a range of tasks in the field of ophthalmology, such as image denoising, super-resolution and domain transfer.
      • Costa P
      • Galdran A
      • Meyer MI
      • et al.
      End-to-end adversarial retinal image synthesis.
      • Halupka KJ
      • Antony BJ
      • Lee MH
      • et al.
      Retinal optical coherence tomography image enhancement via deep learning.
      • Zhao H
      • Li H
      • Maurer-Stroh S
      • Cheng L
      Synthesizing retinal and neuronal images with generative adversarial nets.
      • Huang Y
      • Lu Z
      • Shao Z
      • et al.
      Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network.
      In these applications, GANs alter medical images to reflect an improved or functionally different image acquisition process. Ideally, any image transformations would not change information content related to the patient. Conversely, our study researches the setting in which images are altered to reflect changes of the imaged subjects themselves while the acquisition settings are kept fixed. To our knowledge, there is only one other study exploring counterfactual image generation for biomarker discovery in ophthalmology. Narayanaswamy et al. have previously proposed counterfactual synthesis of color fundus photographs to discover indicators of diabetic macular edema.

      Narayanaswamy A, Venugopalan S, Webster DR, et al. Scientific Discovery by Generating Counterfactuals using Image Translation in International Conference on Medical Image Computing and Computer-Assisted Intervention: 273–283 Springer 2020.

      They found that disease state is linked to the presence of exudates, a known biomarker for diabetic macular edema, as well as a darkening of the foveal region, which is currently not being used for clinical predictions. However, they have not quantitatively assessed the quality of the counterfactual images and did not extract imaging biomarkers from the images. Nonetheless, their study showcases an exciting usage for our tool, modeling the effect of ocular disease on the eye.
      At the moment, our counterfactual GAN assumes that the eye’s appearance in OCT images is governed independently by the subject’s age, sex and identity. While we aimed to exclude any patients affected by serious eye disease, some eyes with early-stage disease potentially remain in the training dataset. The GAN may inadvertently learn to correlate these disease features with age or sex and alter them when generating counterfactual images. In order to avoid such artifacts, future work could see the creation of a more sophisticated causal model and its integration with a GAN.

      Pearl J. Causality. Cambridge university press 2009.

      ,

      Pawlowski N, Castro D, Glocker B. Deep Structural Causal Models for Tractable Counterfactual Inference Advances in Neural Information Processing Systems. 2020;33.

      Such a model could include and explicitly model the relationship between the eye’s appearance and subject genotype, lifestyle or retinal diseases. However, this requires the availability of corresponding labels in the dataset that is used to train the GAN. Furthermore, the algorithm cannot learn to the relationship for groups of subjects that it has not seen in the dataset. For example, our GAN has been trained on subjects between the ages of 40 and 75. It is not able to model how the eye develops in children, young adults or individuals that are older than 75 years. Finally, the counterfactual GAN currently only generates two-dimensional OCT images. While generating volumetric images with GANs is challenging,

      Hong S, Marinescu R, Dalca AV, et al. 3d-stylegan: A style-based generative adversarial network for generative modeling of three-dimensional medical images in Deep Generative Models, and Data Augmentation, Labelling, and Imperfections:24–34Springer 2021.

      future work should look to increase in the images’ dimensionality as well as their field-of-view and resolution.
      In conclusion, this study has demonstrated how counterfactual GANs can aid research into retinal aging by synthesizing high-resolution, high-fidelity OCT images and longitudinal time series. Ultimately, we envision that they will enable clinical experts to derive and explore hypotheses for potential imaging biomarkers for healthy and pathological aging that can be refined and tested in prospective clinical trials.

      Supplementary data

      References

        • Stitt AW
        • Curtis TM
        • Chen M
        • et al.
        The progress in understanding and treatment of diabetic retinopathy.
        Progress in retinal and eye research. 2016; 51: 156-186
        • Mitchell P
        • Liew G
        • Gopinath B
        • Wong TY
        Age-related macular degeneration.
        The Lancet. 2018; 392: 1147-1159
        • Adhi M
        • Duker JS
        Optical coherence tomography–current and future applications.
        Current opinion in ophthalmology. 2013; 24: 213
        • Gao H
        • Hollyfield J
        Aging of the human retina. Differential loss of neurons and retinal pigment epithelial cells.
        Investigative ophthalmology & visual science. 1992; 33: 1-17
        • Leung CKs
        • Cheung CYl
        • Weinreb RN
        • et al.
        Retinal nerve fiber layer imaging with spectral-domain optical coherence tomography: a variability and diagnostic performance study.
        Ophthalmology. 2009; 116: 1257-1263
        • Sung KR
        • Wollstein G
        • Bilonick RA
        • et al.
        Effects of age on optical coherence tomography measurements of healthy retinal nerve fiber layer, macula, and optic nerve head.
        Ophthalmology. 2009; 116: 1119-1124
        • Mwanza JC
        • Durbin MK
        • Budenz DL
        • et al.
        Profile and predictors of normal ganglion cell–inner plexiform layer thickness measured with frequency-domain optical coherence tomography.
        Investigative ophthalmology & visual science. 2011; 52: 7872-7879
        • Ooto S
        • Hangai M
        • Tomidokoro A
        • et al.
        Effects of age, sex, and axial length on the three-dimensional profile of normal macular layer structures.
        Investigative ophthalmology & visual science. 2011; 52: 8769-8779
        • Koh VT
        • Tham YC
        • Cheung CY
        • et al.
        Determinants of ganglion cell–inner plexiform layer thickness measured by high-definition optical coherence tomography.
        Investigative ophthalmology & visual science. 2012; 53: 5853-5859
        • Gupta P
        • Sidhartha E
        • Tham YC
        • et al.
        Determinants of macular thickness using spectral domain optical coherence tomography in healthy eyes: the Singapore Chinese Eye study.
        Investigative ophthalmology & visual science. 2013; 54: 7968-7976
        • Myers CE
        • Klein BE
        • Meuer SM
        • et al.
        Retinal thickness measured by spectral-domain optical coherence tomography in eyes without retinal abnormalities: the Beaver Dam Eye Study.
        American journal of ophthalmology. 2015; 159: 445-456
        • Ko F
        • Foster PJ
        • Strouthidis NG
        • et al.
        Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank.
        Ophthalmology. 2017; 124: 105-117
        • Chua SY
        • Dhillon B
        • Aslam T
        • et al.
        Associations with photoreceptor thickness measures in the UK Biobank.
        Scientific reports. 2019; 9: 1-14
        • Khawaja AP
        • Chua S
        • Hysi PG
        • et al.
        Comparison of associations with different macular inner retinal thickness parameters in a large cohort: the UK Biobank.
        Ophthalmology. 2020; 127: 62-71
        • Schmidt-Erfurth U
        • Sadeghipour A
        • Gerendas BS
        • Waldstein SM
        • Bogunovic H
        Artificial intelligence in retina.
        Progress in retinal and eye research. 2018; 67: 1-29
        • Sudlow C
        • Gallacher J
        • Allen N
        • et al.
        UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.
        Plos med. 2015; 12e1001779
        • Keane PA
        • Grossi CM
        • Foster PJ
        • et al.
        Optical coherence tomography in the UK biobank study–rapid automated analysis of retinal thickness for large population-based studies.
        PLoS One. 2016; 11e0164095
        • Patel PJ
        • Foster PJ
        • Grossi CM
        • et al.
        Spectral-domain optical coherence tomography imaging in 67 321 adults: associations with macular thickness in the UK Biobank Study.
        Ophthalmology. 2016; 123: 829-840
        • Stein D
        • Ishikawa H
        • Hariprasad R
        • et al.
        A new quality assessment parameter for optical coherence tomography.
        British Journal of Ophthalmology. 2006; 90: 186-190
        • Li K
        • Wu X
        • Chen DZ
        • Sonka M
        Optimal surface segmentation in volumetric images-a graph-theoretic approach.
        IEEE transactions on pattern analysis and machine intelligence. 2005; 28: 119-134
        • Garvin MK
        • Abramoff MD
        • Wu X
        • Russell SR
        • Burns TL
        • Sonka M
        Automated 3-D intraretinal layer segmentation of macular spectral-domain optical coherence tomography images.
        IEEE transactions on medical imaging. 2009; 28: 1436-1447
        • Abràmoff MD
        • Garvin MK
        • Sonka M
        Retinal imaging and image analysis.
        IEEE reviews in biomedical engineering. 2010; 3: 169-208
      1. Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets Advances in neural information processing systems. 2014;27.

      2. Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation in Proceedings of the IEEE conference on computer vision and pattern recognition:8789– 8797 2018.

      3. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition in Proceedings of the IEEE conference on computer vision and pattern recognition:770–778 2016.

        • Shah A
        • Zhou L
        • Abrámoff MD
        • Wu X
        Multiple surface segmentation using convolution neural nets: application to retinal layer segmentation in OCT images.
        Biomedical optics express. 2018; 9: 4509-4526
        • Costa P
        • Galdran A
        • Meyer MI
        • et al.
        End-to-end adversarial retinal image synthesis.
        IEEE transactions on medical imaging. 2017; 37: 781-791
        • Halupka KJ
        • Antony BJ
        • Lee MH
        • et al.
        Retinal optical coherence tomography image enhancement via deep learning.
        Biomedical optics express. 2018; 9: 6205-6221
        • Zhao H
        • Li H
        • Maurer-Stroh S
        • Cheng L
        Synthesizing retinal and neuronal images with generative adversarial nets.
        Medical image analysis. 2018; 49: 14-26
        • Huang Y
        • Lu Z
        • Shao Z
        • et al.
        Simultaneous denoising and super-resolution of optical coherence tomography images based on generative adversarial network.
        Optics express. 2019; 27: 12289-12307
      4. Narayanaswamy A, Venugopalan S, Webster DR, et al. Scientific Discovery by Generating Counterfactuals using Image Translation in International Conference on Medical Image Computing and Computer-Assisted Intervention: 273–283 Springer 2020.

      5. Pearl J. Causality. Cambridge university press 2009.

      6. Pawlowski N, Castro D, Glocker B. Deep Structural Causal Models for Tractable Counterfactual Inference Advances in Neural Information Processing Systems. 2020;33.

      7. Hong S, Marinescu R, Dalca AV, et al. 3d-stylegan: A style-based generative adversarial network for generative modeling of three-dimensional medical images in Deep Generative Models, and Data Augmentation, Labelling, and Imperfections:24–34Springer 2021.