Survival Analysis
Survival analysis
Survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. - wikipedia
Historically, originally developed and used by acturaries and medical researchers to estimate population lifetimes.
-
- Assume: censoring is non-informative - being censored or not is not related to the probablity of event happening
-
Video: Survival Function, Hazard, & Hazard Ratio
-
Survival function: S(t) = P(T>t) = prob. of survival beyond time t
-
Hazard: HAZ = P(T< t + dt | T >t) = prob of dying in next few seconds, given alive now
-
Hazard ratio (HR) , prob of dying in next few seconds given exposure vs. not
\( HR = {HAZ, x=1 \over HAZ, x=0} \)
-
-
model Kaplan-Meier Exponential Cox-PH model type non-parametric parametric semi-parametric prob Simple
can est S(t)can est S(t) and HR hazard can fluctuate with time
can est HRcon No functional form
cannot est HR *not always realistic
assume constant HAZ
Weibull model allows haz to proprotially ↗ or ↘ with time)cannot est S(t) -
Video: part 4- Kaplan-Meier model
- Process to compute the survival curve:
- censored data get incorporated for all the prob. calculations before they drop out
- a tick indicate censored data
- example graph from water pipe break paper
-
Video: Exponential vs Weibull vs Cox Proportional Hazards
- Overall:
- \( Survival Function = S(t) = P(T>t) = e^{-HAZ*t} \)
- \( HAZ = e^{ b{0} + b{1}x{1} + b{2}x{2} + ... + b{k}x_{k}} \)
- \( \ln(HAZ) = b{0} + b{1}x{1} + b{2}x{2} + ... + b{k}x_{k} \)
- \( b_0 \) is \( \ln(HAZ) \) for reference (at T=0)
- Exponential
- \( b_0 \) is constant -> constant hazard
- Weibull
- \( b_0 \) is proprotional to time: \( \ln(\alpha) \ln(t) + b_0 \) => \( b_0 \)
- \(\alpha = 1 \) - constant hazard
- \(\alpha > 1 \) - hazard increase with time
- \(\alpha < 1 \) - hazard decrease with time
- Cox Proportional Hazards
- \( b_0 \) is a function of time
- the algo can estimate \( b_1, b_2, .... \) without having to specify the funtion for \( b_0 \)
- good for analyzing hazard ratios: how effective is treatment A vs B, exposure or non-exposure
- can't do predictive models on survival
Lifelines: Survival analysis in python
- Overall:
Coursera: AI for Medical Prognosis
- prognosis vs diagnosis:
- prognosis = predicting the likely or expected development of a disease
- examples in medical practice:
- CHA2DS2-VASc score for atrial fibrillation
- MELD score for end-stage liver desease:
- \( ln \) terms
- contain an intercept = if all other values is 0, expected risk score
- ASCVD (Atherosclerotic Cardiovascular Disease) Risk Calculator
- interaction terms - capture dependence btw variables
- e.g. blood pressure has less effect of risk when patient is older
- interaction terms - capture dependence btw variables
Evaluating risk scores
-
Concordant Pairs:
- Concordant = patient with worse outcome has higher risk score
- if outcome ties => exclude
- if outcome different => permissible pair => inlcude
- rule:
- +1 for permissible pair that is Concordant
- +0.5 for permissible pair with risk tie (outcome different, but same risk score)
-
C-index
\( C-index = { Count{concordant} + 0.5 Count{ties} \over Count_{permissible}} \)
-
Applying c-index on censored data - Harrel's C-Index
- patient A & B both not censored => always permissible, even if A & B has same time-to-event
- patient A & B both censored => not permissible
- patient A censored, B not censored:
- if A < B - not permissible - if A >= B - permissible