Algorithm 2: Procedure to obtain optimally the coefficients to be used in Algorithm 1

Obtain predictions

Compute weighted Brier Loss function for a single marker or a linear weighted combination of markers

Compute weighted Cross Entropy Loss function for a single marker or a linear weighted combination of markers

Obtain the lambda hyparameter for the LASSO using cross-validation

Internal stackBagg helper functions

Compute the risk of missclassifying an individual using as a marker a single prediction or weighted linear combination of several predictions (1-AUC)

Predictions based on a library of Machine Learning procedures

Library of Machine Learning procedures

Predictions based on those Machine Learning procedures in the library that allow for weights to be specified as an argument of the R function. No bagging occurs. This group of algorithms is denoted as Native Weights

Library of Machine Learning procedures that allows for weights

A grid of values for hyperparameters used in the Real Data Application: InfCareHIV Register. This grid of values isan argument in the tuning parameter function tune_parameter_ml.R

ipcw_ensbagg(folds, MLprocedures, fmla, tuneparams, tao, B = NULL, A,
  data, xnam, xnam.factor, xnam.cont, xnam.cont.gam, ens.library)

ipcw_genbagg(fmla, tuneparams, MLprocedures, traindata, testdata, B, A,
  xnam, xnam.factor, xnam.cont, xnam.cont.gam, ens.library)

ipcw_brier(par, Z, y, wts)

ipcw_crossentropy(par, Z, y, wts)

tune_lasso(folds, fmla, tao, data, xnam)

optimun_auc_coef(coef_init, lambda, data, Z, tao)

risk_auc(par, lambda, Z, data, tao)

MLprocedures(traindata, testdata, fmla, xnam, xnam.factor, xnam.cont,
  xnam.cont.gam, tuneparams, ens.library, i)

ML_list

MLprocedures_natively(traindata, testdata, fmla, xnam, xnam.factor,
  xnam.cont, xnam.cont.gam, tuneparams)

ML_list_natively

grid_parametersDataHIV(xnam, data, tao)

Arguments

folds	Number of folds
MLprocedures	MLprocedures
fmla	formula object ex. "E ~ x1+x2"
tuneparams	a list of tune parameters for each machine learning procedure
tao	time point of interest
B	number of bootstrap samples
data	a training data set
xnam	all covariates in the model
xnam.factor	categorical variables include in the model
xnam.cont	continous variables include in the model
xnam.cont.gam	continous variables to be included in the smoothing operator gam::s(,df)
ens.library	algorithms in the library
traindata	a training data set
testdata	a test data set
par	a vector of weights. Its length must be equal to the number of predictions included in Z
Z	a matrix that contains the predictions. Each column represents a single marker.
y	vector of response variable (binary).
wts	IPC weights
coef_init	starting values for the coefficients
lambda	penalization term. It is a positive scalar.
i	sample selected by bootstrap
fmla	formula object ex. "E ~ x1+x2"
tuneparams	a list of tune parameters for each machine learning procedure
MLprocedures	MLprocedures
B	number of bootstrap samples
xnam	all covariates in the model
xnam.factor	categorical variables include in the model
xnam.cont	continous variables include in the model
xnam.cont.gam	continous variables to be included in the smoothing operator gam::s(,df=)
ens.library	algorithms in the library
par	a vector of weights. Its length must be equal to the number of predictions included in Z
Z	a matrix that contains the predictions. Each column represents a single marker.
y	vector of response variable (binary).
wts	IPC weights
folds	number of folds
fmla	formula object ex. "E ~ x1+x2"
tao	time point of interest
data	a training data set
data	A data frame that contains at least: ttilde, delta, wts
Z	a matrix that contains the predictions. Each column represents a single marker.
tao	time point of interest
par	a vector of coefficients/weights. Its length must be equal to the number of predictions included in Z
lambda	penalization term. It is a positive scalar.
Z	a matrix that contains the predictions. Each column represents a single marker.
data	A data frame that constains at least: ttilde= time to event, delta=event type, wts= IPC weights
tao	time point of interest
traindata	training data set
testdata	validation/test data set
fmla	formula object ex. "E ~ x1+x2"
tuneparams	a list of tune parameters for each machine learning procedure
traindata	training data set
testdata	validation/test data set
fmla	formula object ex. "E ~ x1+x2"
tuneparams	a list of tune parameters for each machine learning procedure
xnam	a vector with the covariates names considered in the modeling
data	a training data set
tao	time point of interest

Format

An object of class list of length 8.

Value

a list with the predictions of each machine learning algorithm (id, predictions), the average AUC across folds for each of them, the optimal coefficients, an indicator if the optimization procedure has converged and the value of penalization term chosen

a matrix with the predictions on the test data set of each machine learning algorithm considered in MLprocedures

lambda to be used in the glmnet function

a vector with the optimal AUC value and the optimal coefficient

1-AUC

a matrix of predictions where each column is the prediction of each algorithm based on the testdata

a list of Machine Learning functions

a matrix of predictions where each column is the prediction of each algorithm based on the testdata

a list of Machine Learning functions

a list with a grid of values for each hyperparameter gam_param a vector containing degree of freedom 3 and 4 lasso_param a grid of values for the shrinkage term lambda randomforest_param a two column matrix: first column denotes the num_trees parameter and the second column denotes the mtry parameter. knn_param a grid of positive integers values svm_param a three column matrix: first column denotes the cost parameter, second column the gamma and third column the kernel. kernel=1 denotes "radial" and kernel=2 denotes "linear". nn_param a grid of positive integers values for the neurons bart_param a three column matrix: first column denotes the num_tree parameter, second column the k parameter and third column the q parameter.

Details

These functions are not intended for use by users.

Algorithm 2: Procedure to obtain optimally the coefficients to be used in Algorithm 1

Arguments

Format

Value

Details

Contents