Title: | Stochastic Gradient Descent log-Likelihood Estimation in Cox Proportional Hazards Model |
---|---|
Description: | Estimate coefficients of Cox proportional hazards model using stochastic gradient descent algorithm for batch data. |
Authors: | Marcin Kosinski [aut, cre], Przemyslaw Biecek [ctb] |
Maintainer: | Marcin Kosinski <[email protected]> |
License: | GPL-2 |
Version: | 0.2.1 |
Built: | 2024-11-03 03:03:54 UTC |
Source: | https://github.com/marcinkosinski/coxphsgd |
coxphSGD
estimates coefficients using stochastic
gradient descent algorithm in Cox proportional hazards model.
coxphSGD(formula, data, learn.rates = function(x) { 1/x }, beta.zero = 0, epsilon = 1e-05, max.iter = 500, verbose = FALSE)
coxphSGD(formula, data, learn.rates = function(x) { 1/x }, beta.zero = 0, epsilon = 1e-05, max.iter = 500, verbose = FALSE)
formula |
a formula object, with the response on the left of a ~ operator, and the terms on the right. The response must be a survival object as returned by the Surv function. |
data |
a list of batch data.frames in which to interpret the variables named in the |
learn.rates |
a function specifing how to define learning rates in
steps of the algorithm. By default the |
beta.zero |
a numeric vector (if of length 1 then will be replicated) of length
equal to the number of variables after using |
epsilon |
a numeric value with the stop condition of the estimation algorithm. |
max.iter |
numeric specifing maximal number of iterations. |
verbose |
whether to cat the number of the iteration |
A data
argument should be a list of data.frames, where in every batch data.frame
there is the same structure and naming convention for explanatory and survival (times, censoring)
variables. See Examples.
If one of the conditions is fullfiled (j denotes the step number)
epsilon
parameter for any
the estimation process is stopped.
Marcin Kosinski, [email protected]
library(survival) set.seed(456) x <- matrix(sample(0:1, size = 20000, replace = TRUE), ncol = 2) head(x) dCox <- dataCox(10^4, lambda = 3, rho = 2, x, beta = c(2,2), cens.rate = 5) batch_id <- sample(1:90, size = 10^4, replace = TRUE) dCox_split <- split(dCox, batch_id) results <- coxphSGD(formula = Surv(time, status) ~ x.1+x.2, data = dCox_split, epsilon = 1e-5, learn.rates = function(x){1/(100*sqrt(x))}, beta.zero = c(0,0), max.iter = 10*90) coeff_by_iteration <- as.data.frame( do.call( rbind, results$coefficients ) ) head(coeff_by_iteration)
library(survival) set.seed(456) x <- matrix(sample(0:1, size = 20000, replace = TRUE), ncol = 2) head(x) dCox <- dataCox(10^4, lambda = 3, rho = 2, x, beta = c(2,2), cens.rate = 5) batch_id <- sample(1:90, size = 10^4, replace = TRUE) dCox_split <- split(dCox, batch_id) results <- coxphSGD(formula = Surv(time, status) ~ x.1+x.2, data = dCox_split, epsilon = 1e-5, learn.rates = function(x){1/(100*sqrt(x))}, beta.zero = c(0,0), max.iter = 10*90) coeff_by_iteration <- as.data.frame( do.call( rbind, results$coefficients ) ) head(coeff_by_iteration)
Function dataCox
generaters random survivaldata from Weibull
distribution (with parameters lambda
and rho
for given input
x
data, model coefficients beta
and censoring rate for censoring
that comes from exponential distribution with parameter cens.rate
.
dataCox(n, lambda, rho, x, beta, cens.rate)
dataCox(n, lambda, rho, x, beta, cens.rate)
n |
Number of observations to generate. |
lambda |
lambda parameter for Weibull distribution. |
rho |
rho parameter for Weibull distribution. |
x |
A data.frame with an input data to generate the survival times for. |
beta |
True model coefficients. |
cens.rate |
Parameter for exponential distribution, which is responsible for censoring. |
For each observation true survival time is generated and a censroing time. If censoring time is less then survival time, then the survival time
is returned and a status of observations is set to 0
which means the
observation had censored time. If the survival time is less than censoring
time, then for this observation the true survival time is returned and the
status of this observation is set to 1
which means that the event has
been noticed.
A data.frame
containing columns:
id
an integer.
time
survival times.
status
observation status (event occured (1) or not (0)).
x
a data.frame
with an input data to generate the survival times for.
http://onlinelibrary.wiley.com/doi/10.1002/sim.2059/abstract
Generating survival times to simulate Cox proportional hazards models
, 2005 by Ralf Bender, Thomas Augustin, Maria Blettner.
## Not run: x <- matrix(sample(0:1, size = 20000, replace = TRUE), ncol = 2) dataCox(10^4, lambda = 3, rho = 2, x, beta = c(1,3), cens.rate = 5) -> dCox ## End(Not run)
## Not run: x <- matrix(sample(0:1, size = 20000, replace = TRUE), ncol = 2) dataCox(10^4, lambda = 3, rho = 2, x, beta = c(1,3), cens.rate = 5) -> dCox ## End(Not run)