library(BACps)

Example: Simulated Data

The simulated data can be generated using the function sim_data where the first argument is the sample size and the second one is the exposure effect (conditional) on the outcome. we generate a data of sample size equal to 1000 and an exposure effect of 0.3.

sim_data <- BACps::sim_data(500,.3)

is a list where the first item is the outcome, the second one is the exposure and the last one is the set of covariates. The binary exposure is:

x = sim_data[[2]]

The binary outcome is:

y = sim_data[[1]]

and the five covariates are:

U = as.data.frame(sim_data[[3]])

Let’s run our procedure BACps with dependence parameter (w) equal to 50. This parameter governs the relationship between the prognostic score and propensity score. If the dependence parameter is equal to 1, there is no information flowing between both models. So, w=1 corresponds to an uninformative prior for the model indicator of the propensity score.

fit_bacps = BACps::bacps(w=50,x=x,y=y,U=as.data.frame(U), niter_1st=1000, niter_2nd=1500, burnin_1st=500, burnin_2nd=500)
#> [1] "acceptation ratio of parameter alpha:"
#> [1] 0.196
#> [1] "First Stage Done!"
#>    user  system elapsed 
#>    2.78    0.03    2.81 
#>            
#> 0.01066667 
#> [1] "2nd Stage Done!"
#>    user  system elapsed 
#>    34.1     0.0    34.1

The average causal effect (conditional) is:

fit_bacps[[5]]
#> [1] 0.1234435

The posterior probability on the model indicator variable is:

fit_bacps[[4]]
#>                 post.alphaX
#> 6   6 1 0 1 0 0 0.026973027
#> 10 10 1 0 0 1 0 0.000999001
#> 14 14 1 0 1 1 0 0.972027972

The informative prior distribution on the model indicator of the propensity score used in the second stage (which is a posterior probability in the first stage):

fit_bacps[[1]]
#>       <NA> <NA> <NA> <NA> <NA> post.alphaX|Y
#>  [1,]    0    0    0    0    1   0.014919545
#>  [2,]    0    0    0    1    0   0.094261414
#>  [3,]    0    0    0    1    1   0.014919545
#>  [4,]    0    0    1    0    0   0.094261414
#>  [5,]    0    0    1    0    1   0.014919545
#>  [6,]    0    0    1    1    0   0.094261414
#>  [7,]    0    0    1    1    1   0.014919545
#>  [8,]    0    1    0    0    0   0.024362801
#>  [9,]    0    1    0    0    1   0.003238916
#> [10,]    0    1    0    1    0   0.024362801
#> [11,]    0    1    0    1    1   0.003238916
#> [12,]    0    1    1    0    0   0.024362801
#> [13,]    0    1    1    0    1   0.003238916
#> [14,]    0    1    1    1    0   0.024362801
#> [15,]    0    1    1    1    1   0.003238916
#> [16,]    1    0    0    0    0   0.094261414
#> [17,]    1    0    0    0    1   0.014919545
#> [18,]    1    0    0    1    0   0.094261414
#> [19,]    1    0    0    1    1   0.014919545
#> [20,]    1    0    1    0    0   0.094261414
#> [21,]    1    0    1    0    1   0.014919545
#> [22,]    1    0    1    1    0   0.094261414
#> [23,]    1    0    1    1    1   0.014919545
#> [24,]    1    1    0    0    0   0.024362801
#> [25,]    1    1    0    0    1   0.003238916
#> [26,]    1    1    0    1    0   0.024362801
#> [27,]    1    1    0    1    1   0.003238916
#> [28,]    1    1    1    0    0   0.024362801
#> [29,]    1    1    1    0    1   0.003238916
#> [30,]    1    1    1    1    0   0.024362801
#> [31,]    1    1    1    1    1   0.003238916

The posterior distribution of the model indicators of the prognostic score model which is used to compute the posterior distribution of the model indicator of the propensity score model:

fit_bacps[[2]]
#>     00000     00001     00010     00011     00100     00101     00110     00111 
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
#>     01000     01001     01010     01011     01100     01101     01110     01111 
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
#>     10000     10001     10010     10011     10100     10101     10110     10111 
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.4610778 0.1596806 
#>     11000     11001     11010     11011     11100     11101     11110     11111 
#> 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3033932 0.0758483

As a comparator, let’s run a penalized logistic regression (LASSO) forcing the exposure variable to be always in the model.

library(glmnet)
#> Loading required package: Matrix
#> Loaded glmnet 4.1-1
data_glm = as.data.frame(cbind(x,U))
# Choose Constrained Coefficients so we force the exposure to always be in the model
lb <- rep(0,length(colnames( data_glm)))
ub <- rep(Inf,length(colnames( data_glm )))
ub[1]=0 # so there is no penalization on the exposure variable
cv.lasso <- glmnet::cv.glmnet(x=as.matrix(data_glm),y=y,lower.limits = lb, upper.limits = ub, family = "binomial")
# Fit the final model on the training data
model <- glmnet::glmnet(x=as.matrix(data_glm),y=y, alpha = 1, family = "binomial",lambda = cv.lasso$lambda.min)
# Display regression coefficients
coef(model)
#> 7 x 1 sparse Matrix of class "dgCMatrix"
#>                     s0
#> (Intercept) 0.09715998
#> x           0.02853295
#> X1          1.12922403
#> X2          0.10739836
#> X3          1.04613231
#> X4          1.09778409
#> X5          0.04880489