Using human gene expression profiles to predict longevity
Posted May 20 2009 9:28am
Individuals of the same species age at different rates, and these differences should be reflected in their gene expression profiles. However, most microarray studies of aging are designed only to capture the gene changes that occur with age in a “typical” individual and (with rare exceptions ) ignore individual variability – all animals of a given age are lumped together into a group, and different age groups are compared.
To study how gene changes are related to individual longevity, we need another type of data in addition to gene expression profiles: the survival time of individual animals after their gene expression is measured. With this information, we could determine which transcriptional responses are associated with a longer lifespan, and in principle even develop a personalized medicine approach to aging: we could train a machine learning algorithm to peek at the expression levels of a handful of crucial genes and predict your physiological age – and the number of healthy years you have left.
Previous microarray studies of aging animals didn’t include survival times because the animals were sacrificed at the time of sample collection (in order to get enough RNA), and studies of aging humans haven’t included survival times because we live too long.
Recently, some human survival data – together with matching gene expression data from lymphoblastoid cell lines – have become available from a long-range study that began in the early 1980s. In the first aging study to take advantage of this resource, Kerber et al. mine the data to identify gene changes associated with longevity:
Gene Expression Profiles Associated with Aging and Mortality in Humans We investigated the hypothesis that gene expression profiles in cultured cell lines from adults, aged 57-97 years, contain information about the biological age and potential longevity of the donors. We studied 104 unrelated grandparents from 31 Utah CEU (Centre d’Etude du Polymorphisme Humain – Utah) families, for whom lymphoblastoid cell lines were established in the 1980s. Combining publicly available gene expression data from these cell lines, and survival data from the Utah Population Database, we tested the relationship between expression of 2,151 always-expressed genes, age, and survival of the donors. Approximately 16% of 2,151 expression levels were associated with donor age: 10% decreased in expression with age, and 6% increased with age. CDC42 and CORO1A exhibited strong associations both with age at draw and survival after draw, (multiple comparisons-adjusted Monte Carlo p-value < 0.05). In general, gene expressions that increased with age were associated with increased mortality. Gene expressions that decreased with age were generally associated with reduced mortality. A multivariate estimate of biological age modeled from expression data was dominated by CDC42 expression, and was a significant predictor of survival after blood draw. A multivariate model of survival as a function of gene expression was dominated by CORO1A expression. This model accounted for approximately 23% of the variation in survival among the CEU grandparents. Some expression levels were negligibly associated with age in this cross-sectional dataset, but strongly associated with inter-individual differences in survival. These observations may lead to new insights regarding the genetic contribution to exceptional longevity.
The novel aspect of this study was the integration of gene expression and survival data to identify genes associated with longevity; the authors also identified genes associated with chronological age using both univariate and multivariate models.
A brief summary of some of their major findings:
A six-gene model accounts for 23% of the variation in survival time The authors trained a penalized regression model to predict survival time on the basis of the expression levels of roughly 2000 genes. After training, only six genes had non-zero model coefficients: CORO1A, FXR2, CBX5, PIK3CA, AKAP2, and CUL3. The model was dominated by the expression levels of CORO1A (which is negatively associated with mortality) and FXR2 (which is positively associated with mortality). CORO1A has been implicated in mitochondrial apoptosis, and FXR2 is involved in Fragile X syndrome; the exact role of these two genes in aging has yet to be determined.
Genes associated with age are not necessarily associated with survival (and vice versa) The authors used linear regression to identify individual gene changes that were associated with chronological age, and a proportional hazards model to identify changes associated with survival. Among the top 10 genes identified by each test, only one gene appears on both lists (CORO1A) – i.e., genes that are strongly associated with longevity are not necessarily strongly associated with survival. This is an important point – it means that in order to identify gene expression biomarkers of physiological age and longevity, we need more microarray studies that report survival data.
Looking at expression data alone, it is difficult to tell which of the very many age-related gene changes are good and which are bad, i.e., whether a given gene change causes a problem associated with aging or is part of some beneficial damage-control response – an issue which we previously discussed in the context of gender differences in brain aging. With survival data, we can now ask a specific question of each gene: is its age-related response associated with increased or with reduced mortality? For nine of the ten genes most strongly related to survival in this study, relative overexpression was associated with reduced mortality. This strongly suggests that those genes (including CORO1A) are doing something good, i.e. that they are involved in some sort of defense or repair mechanisms.
The expression dataset used by the authors of this study is publically available through GEO: GSE1485, GSE2552.