Example.Rmd
library(BACps)
The simulated data can be generated using the function sim_data where the first argument is the sample size and the second one is the exposure effect (conditional) on the outcome. we generate a data of sample size equal to 1000 and an exposure effect of 0.3.
sim_data <- BACps::sim_data(500,.3)
is a list where the first item is the outcome, the second one is the exposure and the last one is the set of covariates. The binary exposure is:
x = sim_data[[2]]
The binary outcome is:
y = sim_data[[1]]
and the five covariates are:
U = as.data.frame(sim_data[[3]])
Let’s run our procedure BACps with dependence parameter (w) equal to 50. This parameter governs the relationship between the prognostic score and propensity score. If the dependence parameter is equal to 1, there is no information flowing between both models. So, w=1 corresponds to an uninformative prior for the model indicator of the propensity score.
fit_bacps = BACps::bacps(w=50,x=x,y=y,U=as.data.frame(U), niter_1st=1000, niter_2nd=1500, burnin_1st=500, burnin_2nd=500)
#> [1] "acceptation ratio of parameter alpha:"
#> [1] 0.196
#> [1] "First Stage Done!"
#> user system elapsed
#> 2.78 0.03 2.81
#>
#> 0.01066667
#> [1] "2nd Stage Done!"
#> user system elapsed
#> 34.1 0.0 34.1
The average causal effect (conditional) is:
fit_bacps[[5]]
#> [1] 0.1234435
The posterior probability on the model indicator variable is:
fit_bacps[[4]]
#> post.alphaX
#> 6 6 1 0 1 0 0 0.026973027
#> 10 10 1 0 0 1 0 0.000999001
#> 14 14 1 0 1 1 0 0.972027972
The informative prior distribution on the model indicator of the propensity score used in the second stage (which is a posterior probability in the first stage):
fit_bacps[[1]]
#> <NA> <NA> <NA> <NA> <NA> post.alphaX|Y
#> [1,] 0 0 0 0 1 0.014919545
#> [2,] 0 0 0 1 0 0.094261414
#> [3,] 0 0 0 1 1 0.014919545
#> [4,] 0 0 1 0 0 0.094261414
#> [5,] 0 0 1 0 1 0.014919545
#> [6,] 0 0 1 1 0 0.094261414
#> [7,] 0 0 1 1 1 0.014919545
#> [8,] 0 1 0 0 0 0.024362801
#> [9,] 0 1 0 0 1 0.003238916
#> [10,] 0 1 0 1 0 0.024362801
#> [11,] 0 1 0 1 1 0.003238916
#> [12,] 0 1 1 0 0 0.024362801
#> [13,] 0 1 1 0 1 0.003238916
#> [14,] 0 1 1 1 0 0.024362801
#> [15,] 0 1 1 1 1 0.003238916
#> [16,] 1 0 0 0 0 0.094261414
#> [17,] 1 0 0 0 1 0.014919545
#> [18,] 1 0 0 1 0 0.094261414
#> [19,] 1 0 0 1 1 0.014919545
#> [20,] 1 0 1 0 0 0.094261414
#> [21,] 1 0 1 0 1 0.014919545
#> [22,] 1 0 1 1 0 0.094261414
#> [23,] 1 0 1 1 1 0.014919545
#> [24,] 1 1 0 0 0 0.024362801
#> [25,] 1 1 0 0 1 0.003238916
#> [26,] 1 1 0 1 0 0.024362801
#> [27,] 1 1 0 1 1 0.003238916
#> [28,] 1 1 1 0 0 0.024362801
#> [29,] 1 1 1 0 1 0.003238916
#> [30,] 1 1 1 1 0 0.024362801
#> [31,] 1 1 1 1 1 0.003238916
The posterior distribution of the model indicators of the prognostic score model which is used to compute the posterior distribution of the model indicator of the propensity score model:
fit_bacps[[2]]
#> 00000 00001 00010 00011 00100 00101 00110 00111
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> 01000 01001 01010 01011 01100 01101 01110 01111
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> 10000 10001 10010 10011 10100 10101 10110 10111
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.4610778 0.1596806
#> 11000 11001 11010 11011 11100 11101 11110 11111
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3033932 0.0758483
As a comparator, let’s run a penalized logistic regression (LASSO) forcing the exposure variable to be always in the model.
library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-1
data_glm = as.data.frame(cbind(x,U))
# Choose Constrained Coefficients so we force the exposure to always be in the model
lb <- rep(0,length(colnames( data_glm)))
ub <- rep(Inf,length(colnames( data_glm )))
ub[1]=0 # so there is no penalization on the exposure variable
cv.lasso <- glmnet::cv.glmnet(x=as.matrix(data_glm),y=y,lower.limits = lb, upper.limits = ub, family = "binomial")
# Fit the final model on the training data
model <- glmnet::glmnet(x=as.matrix(data_glm),y=y, alpha = 1, family = "binomial",lambda = cv.lasso$lambda.min)
# Display regression coefficients
coef(model)
#> 7 x 1 sparse Matrix of class "dgCMatrix"
#> s0
#> (Intercept) 0.09715998
#> x 0.02853295
#> X1 1.12922403
#> X2 0.10739836
#> X3 1.04613231
#> X4 1.09778409
#> X5 0.04880489