Survival analysis

Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics or sociology. More generally, survival analysis involves the modeling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs, after which the organism or mechanism is dead or broken.

More recently, many concepts in survival analysis have been explained by Counting Process Theory, which adds flexibility in that it allows modeling multiple (or recurrent) events. This type of modeling fits very well in many situations, when the event is significant but does not end the lifespan of the subject – e.g. people can go to jail multiple times, alcoholics can start and stop drinking multiple times, and people can get married and divorced multiple times.

Survival analysis attempts to answer questions such as: what is the fraction of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the odds of survival?

To answer such questions, it is necessary to define "lifetime". In the case of biological survival, death is unambiguous, but for mechanical reliability, failure may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise localized in time. Even in biological problems, some events (for example, heart attack or other organ failure) may have the same ambiguity. The theory outlined below assumes well-defined events at specific times; other cases may be better treated by models which explicitly account for ambiguous events.

The theory of survival presented here also assumes that death or failure happens just once for each subject. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research.

This article is phrased primarily in terms of biological survival, but this is just for convenience. An equivalent formulation in terms of mechanical failure can be made by replacing every occurrence of death with failure.

1 General formulation
2 Censoring
3 Fitting parameters to data
4 Non-parametric estimation
5 Distributions used in survival analysis
6 See also
7 References
8 External links

General formulation

Survival function

The object of primary interest is the survival function, conventionally denoted S, which is defined as

S (t) = Pr (T > t)

where t is some time, T is a random variable denoting the time of death, and "Pr" stands for probability. That is, the survival function is the probability that the time of death is later than some specified time t. The survival function is also called the survivor function or survivorship function in problems of biological survival, and the reliability function in mechanical survival problems. In the latter case, the reliability function is denoted R(t).

Usually one assumes S(0) = 1, although it could be less than 1 if there is the possibility of immediate death or failure.

The survival function must be non-increasing: S(u) ≤ S(t) if u ≥ t. This property follows directly from F(t) = 1 - S (t) being the integral of a non-negative function. This reflects the notion that survival to a later age is only possible if all younger ages are attained. Given this property, the lifetime distribution function and event density (F and f below) are well-defined.

The survival function is usually assumed to approach zero as age increases without bound, i.e., S(t) → 0 as t → ∞, although the limit could be greater than zero if eternal life is possible. For instance, we could apply survival analysis to a mixture of stable and unstable carbon isotopes; unstable isotopes would decay sooner or later, but the stable isotopes would last indefinitely.

Lifetime distribution function and event density

Related quantities are defined in terms of the survival function.

The lifetime distribution function, conventionally denoted F, is defined as the complement of the survival function,

$F(t) = \Pr(T \le t) = 1 - S(t)$

and the derivative of F, which is the density function of the lifetime distribution, is conventionally denoted f,

$f(t) = F'(t) = \frac{d}{dt} F(t).$

The function f is sometimes called the event density; it is the rate of death or failure events per unit time.

The survival function is often defined in terms of distribution and density functions

$S(t) = \Pr(T > <span class=$ t) = \int_t^{\infty} f(u)\,du = 1-F(t)." border="0">

Similarly, a survival event density function can be defined as

$s(t) = S'(t) = \frac{d}{dt} S(t) = \frac{d}{dt} \int_t^{\infty} f(u)\,du = \frac{d}{dt} [1-F(t)] = -f(t).$

Hazard function and cumulative hazard function

The hazard function, conventionally denoted $λ$ , is defined as the event rate at time t conditional on survival until time t or later (that is, T ≥ t),

$\lambda(t)\,dt = \Pr(t \leq T < t+dt\,|\,T \geq t) = \frac{f(t)\,dt}{S(t)} = -\frac{S'(t)\,dt}{S(t)}.$

Force of mortality is a synonym of hazard function which is used particularly in demography and actuarial science, where it is denoted by $μ$ . The term hazard rate is another synonym.

The hazard function must be non-negative, λ(t) ≥ 0, and its integral over $[0, \infty]$ must be infinite, but is not otherwise constrained; it may be increasing or decreasing, non-monotonic, or discontinuous. An example is the bathtub curve hazard function, which is large for small values of t, decreasing to some minimum, and thereafter increasing again; this can model the property of some mechanical systems to either fail soon after operation, or much later, as the system ages.

The hazard function can alternatively be represented in terms of the cumulative hazard function, conventionally denoted $Λ$ :

$\,\Lambda(t) = -\log S(t)$

so transposing signs and exponentiating

$\,S(t) = \exp(-\Lambda(t))$

or differentiating (with the chain rule)

$\frac{d}{dt} \Lambda(t) = -\frac{S'(t)}{S(t)} = \lambda(t).$

The name "cumulative hazard function" is derived from the fact that

$\Lambda(t) = \int_0^{t} \lambda(u)\,du$

which is the "accumulation" of the hazard over time.

From the definition of $Λ (t)$ , we see that it increases without bound as t tends to infinity (assuming that S(t) tends to zero). This implies that $λ (t)$ must not decrease too quickly, since, by definition, the cumulative hazard has to diverge. For example, $exp (- t)$ is not the hazard function of any survival distribution, because its integral converges to 1.

Quantities derived from the survival distribution

Future lifetime at a given time $t 0$ is the time remaining until death, given survival to age $t 0$ . Thus, it is $T - t 0$ in the present notation. The expected future lifetime is the expected value of future lifetime. The probability of death at or before age $t + t 0$ , given survival until age $t 0$ , is just

$P(T \le t_0 + t | T > <span class=$ t_0) = \frac{P(t_0 < T \le t_0 + t)}{P(T > t_0)} = \frac{F(t_0 + t) - F(t_0)}{S(t_0)}." border="0">

Therefore the probability density of future lifetime is

$\frac{d}{dt}\frac{F(t_0 + t) - F(t_0)}{S(t_0)} = \frac{f(t_0 + t)}{S(t_0)}$

and the expected future lifetime is

$\frac{1}{S(t_0)} \int_0^{\infty} t\,f(t+t_0)\,dt = \frac{1}{S(t_0)} \int_{t_0}^{\infty} S(t)\,dt,$

where the second expression is obtained using integration by parts.

For $t 0 = 0$ , that is, at birth, this reduces to the expected lifetime.

In reliability problems, the expected lifetime is called the mean time to failure, and the expected future lifetime is called the mean residual lifetime.

As the probability of an individual surviving until age t or later is S(t), by definition, the expected number of survivors at age t out of an initial population of n newborns is n × S(t), assuming the same survival function for all individuals. Thus the expected proportion of survivors is S(t). If the survival of different individuals is independent, the number of survivors at age t has a binomial distribution with parameters n and S(t), and the variance of the proportion of survivors is S(t) × (1-S(t))/n.

The age at which a specified proportion of survivors remain can be found by solving the equation S(t) = q for t, where q is the quantile in question. Typically one is interested in the median lifetime, for which q = 1/2, or other quantiles such as q = 0.90 or q = 0.99.

One can also make more complex inferences from the survival distribution. In mechanical reliability problems, one can bring cost (or, more generally, utility) into consideration, and thus solve problems concerning repair or replacement. This leads to the study of renewal theory and reliability theory of aging and longevity.

Censoring

Censoring is a form of missing data problem which is common in survival analysis. Ideally, both the birth and death dates of a subject are known, in which case the lifetime is known.

If it is known only that the date of death is after some date, this is called right censoring. Right censoring will occur for those subjects whose birth date is known but who are still alive when they are lost to follow-up or when the study ends.

If a subject's lifetime is known to be less than a certain duration, the lifetime is said to be left-censored. Left-truncated data is common in actuarial work for life insurance and pensions (Richards, 2010).

It may also happen that subjects with a lifetime less than some threshold may not be observed at all: this is called truncation. Note that truncation is different from left censoring, since for a left censored datum, we know the subject exists, but for a truncated datum, we may be completely unaware of the subject. Truncation is also common. In a so-called delayed entry study, subjects are not observed at all until they have reached a certain age. For example, people may not be observed until they have reached the age to enter school. Any deceased subjects in the pre-school age group would be unknown.

We generally encounter right-censored data. Left-censored data can occur when a person's survival time becomes incomplete on the left side of the follow-up period for the person. As an example, we may follow up a patient for any infectious disorder from the time of his or her being tested positive for the infection. We may never know the exact time of exposure to the infectious agent.^[1]

Fitting parameters to data

Survival models can be usefully viewed as ordinary regression models in which the response variable is time. However, computing the likelihood function (needed for fitting parameters or making other kinds of inferences) is complicated by the censoring. The likelihood function for a survival model, in the presence of censored data, is formulated as follows. By definition the likelihood function is the conditional probability of the data given the parameters of the model. It is customary to assume that the data are independent given the parameters. Then the likelihood function is the product of the likelihood of each datum. It is convenient to partition the data into four categories: uncensored, left censored, right censored, and interval censored. These are denoted "unc.", "l.c.", "r.c.", and "i.c." in the equation below.

$L(\theta) = \prod_{T_i\in unc.} \Pr(T = T_i|\theta) \prod_{i\in l.c.} \Pr(T < T_i|\theta) \prod_{i\in r.c.} \Pr(T > <span class=$ T_i|\theta) \prod_{i\in i.c.} \Pr(T_{i,l} < T < T_{i,r}|\theta) ." border="0">

For an uncensored datum, with $T i$ equal to the age at death, we have

Pr (T = T i | θ) = f (T i | θ).

For a left censored datum, such that the age at death is known to be less than $T i$ , we have

Pr (T < T i | θ) = F (T i | θ) = 1 - S (T i | θ).

For a right censored datum, such that the age at death is known to be greater than $T i$ , we have

Pr (T > T i | θ) = 1 - F (T i | θ) = S (T i | θ).

For an interval censored datum, such that the age at death is known to be less than $T i, r$ and greater than $T i, l$ , we have

Pr (T i, l < T < T i, r | θ) = S (T i, l | θ) - S (T i, r | θ).

An important application where interval censored data arises is current status data, where the actual occurrence of an event $T i$ is only known to the extent that it known not to occurred before observation time and to have occurred before the next.

Non-parametric estimation

The Nelson–Aalen estimator can be used to provide a non-parametric estimate of the cumulative hazard rate function.

Distributions used in survival analysis

Exponential distribution
Weibull distribution
Exponential-logarithmic distribution

References

^ Singh R, Mukhopadhyay K. Survival analysis in clinical trials: Basics and must know areas. Perspect Clin Res [serial online] 2011 [cited 2011 Nov 1];2:145-8. Available from: http://www.picronline.org/text.asp?2011/2/4/145/86872

David Collett. Modelling Survival Data in Medical Research, Second Edition. Boca Raton: Chapman & Hall/CRC. 2003. ISBN 978-1584883258
Regina Elandt-Johnson and Norman Johnson. Survival Models and Data Analysis. New York: John Wiley & Sons. 1980/1999.
J. D. Kalbfleisch and Ross L. Prentice. The statistical analysis of failure time data. New York: John Wiley & Sons. 1980 (1st ed.), 2002 (2nd ed.) ISBN 9780471363576
Jerald F. Lawless. Statistical Models and Methods for Lifetime Data, 2nd edition. John Wiley and Sons, Hoboken. 2003.
Terry Therneau. "A Package for Survival Analysis in S". http://www.mayo.edu/hsr/people/therneau/survival.ps, at: http://mayoresearch.mayo.edu/mayo/research/biostat/therneau.cfm
"Engineering Statistics Handbook", NIST/SEMATEK, [1]
Survival Analysis - Commercial Usage http://www.discover-right.com/images/survival_analysis_-_understanding_and_implementation.pdf
Rausand, M. and Hoyland, A. System Reliability Theory: Models, Statistical Methods, and Applications, John Wiley & Sons, Hoboken, 2004. See web site.
Richards, S. J. A handbook of parametric survival models for actuarial use. Scandinavian Actuarial Journal [2]

Singh R, Mukhopadhyay K. Survival analysis in clinical trials: Basics and must know areas. Perspect Clin Res [serial online] 2011 [cited 2011 Nov 1];2:145-8. Available from: http://www.picronline.org/text.asp?2011/2/4/145/86872

External links

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) · Median · Mode

Dispersion	Range · Standard deviation · Coefficient of variation · Percentile · Interquartile range

Shape	Variance · Skewness · Kurtosis · Moments · L-moments

Count data

Index of dispersion

Summary tables

Grouped data · Frequency distribution · Contingency table

Dependence

Pearson product-moment correlation · Rank correlation (Spearman's rho, Kendall's tau) · Partial correlation · Scatter plot

Statistical graphics

Bar chart · Biplot · Box plot · Control chart · Correlogram · Forest plot · Histogram · Q-Q plot · Run chart · Scatter plot · Stemplot · Radar chart

Data collection

Designing studies	Effect size · Standard error · Statistical power · Sample size determination

Survey methodology	Sampling · Stratified sampling · Opinion poll · Questionnaire

Controlled experiment	Design of experiments · Factorial experiment · Randomized experiment · Random assignment · Replication · Blocking · Optimal design

Uncontrolled studies	Natural experiment · Quasi-experiment · Observational study

Statistical inference

Statistical theory	Sampling distribution · Sufficient statistic · Meta-analysis

Bayesian inference	Bayesian probability · Prior · Posterior · Credible interval · Bayes factor · Bayesian estimator · Maximum posterior estimator

Frequentist inference	Confidence interval · Hypothesis testing · Likelihood-ratio

Specific tests	Z-test (normal) · Student's t-test · F-test · Pearson's chi-squared test · Wald test · Mann–Whitney U · Shapiro–Wilk · Signed-rank · Kolmogorov–Smirnov test

General estimation	Mean-unbiased · Median-unbiased · Maximum likelihood · Method of moments · Minimum distance · Density estimation

Correlation and regression analysis

Correlation	Pearson product-moment correlation · Partial correlation · Confounding variable · Coefficient of determination

Regression analysis	Errors and residuals · Regression model validation · Mixed effects models · Simultaneous equations models

Linear regression	Simple linear regression · Ordinary least squares · General linear model · Bayesian regression

Non-standard predictors	Nonlinear regression · Nonparametric · Semiparametric · Isotonic · Robust

Generalized linear model	Exponential families · Logistic (Bernoulli) · Binomial · Poisson

Partition of variance	Analysis of variance (ANOVA) · Analysis of covariance · Multivariate ANOVA · Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data	Cohen's kappa · Contingency table · Graphical model · Log-linear model · McNemar's test

Multivariate statistics	Multivariate regression · Principal components · Factor analysis · Cluster analysis · Copulas

Time series analysis	Decomposition (Trend · Stationary process) · ARMA model · ARIMA model · Vector autoregression · Spectral density estimation

Survival analysis	Survival function · Kaplan–Meier · Logrank test · Failure rate · Proportional hazards models · Accelerated failure time model

Applications

Biostatistics	Bioinformatics · Biometrics · Clinical trials & studies · Epidemiology · Medical statistics · Pharmaceutical statistics

Engineering statistics	Methods engineering · Probabilistic design · Process & Quality control · Reliability · System identification

Social statistics	Actuarial science · Census · Crime statistics · Demography · Econometrics · National accounts · Official statistics · Population · Psychometrics

Spatial statistics	Cartography · Environmental statistics · Geographic information system · Geostatistics · Kriging

Category · Portal · Outline · Index

Categories:

Survival analysis

Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

Survival Analysis — Ereigniszeitanalyse ist eine statistische Analyse, bei der die Zeit bis zu einem bestimmten Ereignis ( time to event ) zwischen zwei oder mehr Gruppen verglichen wird, um die Wirkung von prognostischen Faktoren, medizinischer Behandlung oder… … Deutsch Wikipedia
Survival Analysis — A branch of statistics which studies the amount of time that it takes before a particular events, such as death, occurs. However, the same techniques can be used to study the time until any event. While a time to event study is theoretically… … Investment dictionary
survival analysis — statistical analysis that evaluates the timing of events, particularly survival but also by extension other nonrecurrent events occurring in a cohort over time, such as relapse, death, or marriage. It involves following the cohort, plotting the… … Medical dictionary
Survival — may refer to: * Survival analysis * Survival of the fittest * Survival kit * Survival rate * Survival skills * Survivalism, a survival belief based around preparation for survival after social upheaval * Survivalism (life after death), the belief … Wikipedia
Survival function — The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability… … Wikipedia
Analysis of Failure Times — Ereigniszeitanalyse ist eine statistische Analyse, bei der die Zeit bis zu einem bestimmten Ereignis ( time to event ) zwischen zwei oder mehr Gruppen verglichen wird, um die Wirkung von prognostischen Faktoren, medizinischer Behandlung oder… … Deutsch Wikipedia
Survival-Analyse — Ereigniszeitanalyse ist eine statistische Analyse, bei der die Zeit bis zu einem bestimmten Ereignis ( time to event ) zwischen zwei oder mehr Gruppen verglichen wird, um die Wirkung von prognostischen Faktoren, medizinischer Behandlung oder… … Deutsch Wikipedia
Analysis of variance — In statistics, analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of… … Wikipedia
Survival rate — In biostatistics, survival rate is a part of survival analysis, indicating the percentage of people in a study or treatment group who are alive for a given period of time after diagnosis. Survival rates are important for prognosis; for example,… … Wikipedia
Survival motor neuron domain containing 1 — Survival motor neuron domain containing 1, also known as SMNDC1, is a human gene.cite web | title = Entrez Gene: SMNDC1 survival motor neuron domain containing 1| url = http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene Cmd=ShowDetailView… … Wikipedia

Academic Dictionaries and Encyclopedias

Survival analysis

Contents

General formulation

Survival function

Lifetime distribution function and event density

Hazard function and cumulative hazard function

Quantities derived from the survival distribution

Censoring

Fitting parameters to data

Non-parametric estimation

Distributions used in survival analysis

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Survival analysis

Contents

General formulation

Survival function

Lifetime distribution function and event density

Hazard function and cumulative hazard function

Quantities derived from the survival distribution

Censoring

Fitting parameters to data

Non-parametric estimation

Distributions used in survival analysis

See also

References

External links

Look at other dictionaries:

Share the article and excerpts

Direct link