Survival Distribution Container
survDistr.Rd
survDistr is an R6 specialized container designed for storing and managing prediction outputs from survival models in single-event settings. This includes models such as Cox proportional hazards, random survival forests, and other classical or machine learning-based survival estimators.
The main prediction data type can be a survival or a hazard matrix, where rows represent observations and columns represent time points.
Details
The input matrix (survival probabilities \(S(t)\) or hazard \(h(t)\))
is stored internally and accessed by the $data
field.
the interpolation type needed for the
public methods is stored in the $interp_meth
slot.
During construction, the function assert_prob_matrix()
is used to validate
the input data matrix according to the given data_type
.
Public fields
times
(
numeric
])
Numeric vector of time points corresponding to columns ofdata
.data_type
(
character(1)
)
Either"surv"
for survival or"haz"
for hazard matrices.interp_meth
(
character(1)
)
Interpolation method; one of"const_surv"
,"linear_surv"
, or"const_haz"
.
Methods
Method new()
Creates a new instance of this R6 class.
Usage
survDistr$new(x, times = NULL, data_type = "surv", interp_meth = "const_surv")
Arguments
x
(
matrix
)
A numeric matrix of either survival probabilities (values between 0 and 1) or hazard values (non-negative values). Column names must correspond to time points iftimes
isNULL
.times
(
numeric(1)
)
Numeric vector of time points for matrixx
, must match the number of columns.data_type
(
character(1)
)
The type of input data. Either a survival matrix ("survival"
, default) or a hazard ("hazard"
) matrix.interp_meth
(
character(1)
)
Interpolation method to use when requesting the quantity of interest at time points different than the ones in the stored object (accessible via thetimes
method). Currently supported interpolation methods include"const_surv"
(default),"linear_surv"
and"const_haz"
. See details.
Method print()
Displays summary information about a survDistr object, including the number of observations and time points.
Method data()
Return the stored data matrix.
Method survival()
Computes survival probabilities \(S(t)\) at the specified time points.
Uses mat_interp()
.
Method cdf()
Computes the cumulative distribution function \(F(t) = 1 - S(t)\) at the specified time points.
\(F(t)\) is the probability that the event has occurred up until time \(t\).
Uses mat_interp()
.
Method cumhazard()
Computes the cumulative hazard at the specified time points as: \(H(t) = -log(S(t))\).
Arguments
times
(
numeric
)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. IfNULL
, the object's stored time points are used.add_times
(
logical(1)
)
IfTRUE
(default), column names are set to the relevant time points.eps
(
numeric(1)
)
Very small number to substitute zero values in order to prevent errors in e.g. log(0) and/or division-by-zero calculations. Default value is 1e-06.
Method hazard()
Computes the hazard at the specified time points as: \(h(t) = H(t) - H(t-1)\).
Arguments
times
(
numeric
)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. IfNULL
, the object's stored time points are used.eps
(
numeric(1)
)
Very small number to substitute zero values in order to prevent errors in e.g. log(0) and/or division-by-zero calculations. Default value is 1e-06.
Method pdf()
Computes the probability density function \(f(t)\) at the specified time points. \(f(t)\) is the probability of the event occurring at the specific time \(t\). For constant survival interpolation, \(f(t) = F(t) - F(t-1)\), where \(F(t)\) is the cumulative distribution.
Examples
# generate survival matrix
mat = matrix(data = c(1,0.6,0.4,0.8,0.8,0.7), nrow = 2,
ncol = 3, byrow = TRUE)
times = c(12, 34, 42)
x = survDistr$new(mat, times)
x
#> A [2 x 3] survival matrix
#> Number of observations: 2
#> Number of time points: 3
#> Interpolation method: Piece-wise Constant Survival
# stored survival matrix
x$data()
#> 12 34 42
#> [1,] 1.0 0.6 0.4
#> [2,] 0.8 0.8 0.7
# interpolation method
x$interp_meth
#> [1] "const_surv"
# time points
x$times
#> [1] 12 34 42
# S(t) at given time points (constant interpolation)
x$survival(times = c(10, 30, 42, 50))
#> 10 30 42 50
#> [1,] 1 1.0 0.4 0.4
#> [2,] 1 0.8 0.7 0.7
# same but with linear interpolation
x$interp_meth = "linear_surv"
x$survival(times = c(10, 30, 42, 50))
#> 10 30 42 50
#> [1,] 1.0000000 0.6727273 0.4 0.2
#> [2,] 0.8333333 0.8000000 0.7 0.6
# time points can be unordered and duplicated
x$survival(times = c(10, 30, 10, 50))
#> 10 30 10 50
#> [1,] 1.0000000 0.6727273 1.0000000 0.2
#> [2,] 0.8333333 0.8000000 0.8333333 0.6
# Cumulative hazard
x$cumhazard()
#> 12 34 42
#> [1,] 0.0000000 0.5108256 0.9162907
#> [2,] 0.2231436 0.2231436 0.3566749