Survival Distribution Container

survDistr is an R6 specialized container designed for storing and managing prediction outputs from survival models in single-event settings. This includes models such as Cox proportional hazards, random survival forests, and other classical or machine learning-based survival estimators.

The main prediction data type can be a survival or a hazard matrix, where rows represent observations and columns represent time points.

Details

The input matrix (survival probabilities $S(t)$ or hazard $h(t)$) is stored internally and accessed by the $data field. the interpolation type needed for the public methods is stored in the $interp_meth slot.

During construction, the function assert_prob_matrix() is used to validate the input data matrix according to the given data_type.

Public fields

times: (numeric])
Numeric vector of time points corresponding to columns of data.
data_type: (character(1))
Either "surv" for survival or "haz" for hazard matrices.
interp_meth: (character(1))
Interpolation method; one of "const_surv", "linear_surv", or "const_haz".

Methods

Method `new()`

Creates a new instance of this R6 class.

Usage

survDistr$new(x, times = NULL, data_type = "surv", interp_meth = "const_surv")

Arguments

x: (matrix)
A numeric matrix of either survival probabilities (values between 0 and 1) or hazard values (non-negative values). Column names must correspond to time points if times is NULL.
times: (numeric(1))
Numeric vector of time points for matrix x, must match the number of columns.
data_type: (character(1))
The type of input data. Either a survival matrix ("survival", default) or a hazard ("hazard") matrix.
interp_meth: (character(1))
Interpolation method to use when requesting the quantity of interest at time points different than the ones in the stored object (accessible via the times method). Currently supported interpolation methods include "const_surv" (default), "linear_surv" and "const_haz". See details.

Method `print()`

Displays summary information about a survDistr object, including the number of observations and time points.

Usage

survDistr$print()

Method `data()`

Return the stored data matrix.

Usage

survDistr$data(add_times = TRUE)

Arguments

add_times: (logical(1))
If TRUE (default), column names are set to the relevant time points.

Returns

(matrix)

Method `survival()`

Computes survival probabilities $S(t)$ at the specified time points. Uses mat_interp().

Usage

survDistr$survival(times = NULL, add_times = TRUE)

Arguments

times: (numeric)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. If NULL, the object's stored time points are used.
add_times: (logical(1))
If TRUE (default), column names are set to the relevant time points.

Returns

a matrix of survival probabilities

Method `cdf()`

Computes the cumulative distribution function $F(t) = 1 - S(t)$ at the specified time points. $F(t)$ is the probability that the event has occurred up until time $t$. Uses mat_interp().

Usage

survDistr$cdf(times = NULL, add_times = TRUE)

Arguments

times: (numeric)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. If NULL, the object's stored time points are used.
add_times: (logical(1))
If TRUE (default), column names are set to the relevant time points.

Returns

a cdf matrix.

Method `cumhazard()`

Computes the cumulative hazard at the specified time points as: $H(t) = -log(S(t))$.

Usage

survDistr$cumhazard(times = NULL, add_times = TRUE, eps = 1e-06)

Arguments

times: (numeric)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. If NULL, the object's stored time points are used.
add_times: (logical(1))
If TRUE (default), column names are set to the relevant time points.
eps: (numeric(1))
Very small number to substitute zero values in order to prevent errors in e.g. log(0) and/or division-by-zero calculations. Default value is 1e-06.

Returns

a matrix of cumulative hazards.

Method `hazard()`

Computes the hazard at the specified time points as: $h(t) = H(t) - H(t-1)$.

Usage

survDistr$hazard(times = NULL, eps = 1e-06)

Arguments

times: (numeric)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. If NULL, the object's stored time points are used.
eps: (numeric(1))
Very small number to substitute zero values in order to prevent errors in e.g. log(0) and/or division-by-zero calculations. Default value is 1e-06.

Returns

a hazard matrix.

Method `pdf()`

Computes the probability density function $f(t)$ at the specified time points. $f(t)$ is the probability of the event occurring at the specific time $t$. For constant survival interpolation, $f(t) = F(t) - F(t-1)$, where $F(t)$ is the cumulative distribution.

Usage

survDistr$pdf(times = NULL)

Arguments

times: (numeric)
New time points at which to interpolate. Values do not need to be sorted or unique, just non-negative. If NULL, the object's stored time points are used.

Returns

a pdf matrix.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

survDistr$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

# generate survival matrix
mat = matrix(data = c(1,0.6,0.4,0.8,0.8,0.7), nrow = 2,
             ncol = 3, byrow = TRUE)
times = c(12, 34, 42)
x = survDistr$new(mat, times)
x
#> A [2 x 3] survival matrix
#> Number of observations: 2
#> Number of time points: 3
#> Interpolation method: Piece-wise Constant Survival 

# stored survival matrix
x$data()
#>       12  34  42
#> [1,] 1.0 0.6 0.4
#> [2,] 0.8 0.8 0.7

# interpolation method
x$interp_meth
#> [1] "const_surv"

# time points
x$times
#> [1] 12 34 42

# S(t) at given time points (constant interpolation)
x$survival(times = c(10, 30, 42, 50))
#>      10  30  42  50
#> [1,]  1 1.0 0.4 0.4
#> [2,]  1 0.8 0.7 0.7
# same but with linear interpolation
x$interp_meth = "linear_surv"
x$survival(times = c(10, 30, 42, 50))
#>             10        30  42  50
#> [1,] 1.0000000 0.6727273 0.4 0.2
#> [2,] 0.8333333 0.8000000 0.7 0.6
# time points can be unordered and duplicated
x$survival(times = c(10, 30, 10, 50))
#>             10        30        10  50
#> [1,] 1.0000000 0.6727273 1.0000000 0.2
#> [2,] 0.8333333 0.8000000 0.8333333 0.6

# Cumulative hazard
x$cumhazard()
#>             12        34        42
#> [1,] 0.0000000 0.5108256 0.9162907
#> [2,] 0.2231436 0.2231436 0.3566749

Details

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method print()

Usage

Method data()

Usage

Arguments

Returns

Method survival()

Usage

Arguments

Returns

Method cdf()

Usage

Arguments

Returns

Method cumhazard()

Usage

Arguments

Returns

Method hazard()

Usage

Arguments

Returns

Method pdf()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Examples

Method `new()`

Method `print()`

Method `data()`

Method `survival()`

Method `cdf()`

Method `cumhazard()`

Method `hazard()`

Method `pdf()`

Method `clone()`