phantSEM

Introduction

This vignette describes the use of the functions SA_step1 SA_step2 and SA_step3. The goal of these functions are to conduct a sensitivity analysis with phantom variables. Phantom variables are variables that you did not observe, but that you can specify in a structural equation model by specifying the variance and covariances of the phantom variable with other variables. Creating a single phantom variable is fairly straightforward in the R package lavaan. There are scenarios where a researcher may wish to see how the results from an analysis might differ if they had observed additional variables. For instance, a researcher may want to assess whether a certain regression coefficient would be statistically significant if they had controlled for another variable in their analysis. Because the researcher does not know how this phantom variable covaries with the variables they did observe, it would be useful to try different values for the covariances in a sensitivity analysis. The phantSEM package is built for researchers to test different values for the covariances between their phantom variables and observed variables.

In this vignette, a full example is illustrated using the functions of phantSEM.

Cross-sectional Mediation

The example that we will use to illustrate the use of phantSEM is one in which we have fit a cross-sectional mediation model and would like to see if our estimates of the mediated effect would hold if we had collected an additional wave of data. To start, we will load the packages we need.


library(phantSEM)
library(lavaan)
library(tidyr)
library(tidyverse)

The example we use is based on a memory study described by MacKinnon et al., (2018). Participants in the study were given a list of words and were randomly assigned to either make mental images of the words, or repeat the words. The individuals were then asked to recall as many words from the list as they could. Therefore, the predictor X was the condition (1=imagery, 0=repetition), the mediator M was the extent to which they used mental imagery and the outcome Y was the number of words they recalled.

We first define a 3 x 3 cross-sectional covariance matrix

memory_CrossSectional <- matrix(c(
  0.2509804, 0.9511983, 0.4257081,
  0.9511983, 8.8661765, 2.6609477,
  0.4257081, 2.6609477, 10.8592048
), nrow = 3, byrow = T)
colnames(memory_CrossSectional) <- c("X", "M2", "Y2")

Next, we fit a single-mediator model to the covariance matrix. The model is defined using lavaan syntax.

Observed_Model <- "
M2 ~ X
Y2 ~ M2+X
"

fit_obs <- sem(model = Observed_Model, sample.cov = memory_CrossSectional, sample.nobs = 138)

We now define the model with phantom variables for M and Y using lavaan syntax. It is very important to add labels to the parameters that will be of interest for the sensitivity analysis. For this example, the a-path, b-path and c’-path may be of interest.

Phantom_Model <- "
M2 ~ M1 + Y1 + a*X
Y2 ~ M1 + Y1 + b*M2 + cp*X
"

we now have the necessary parts for the first step of the sensitivity analysis.

SA_step1

The purpose of this function is to determine which variables are the phantom variables and provide the covariance parameters involving the phantom variables so that the user can enter them into the next step.

Step1 <- SA_step1(
  lavoutput = fit_obs,
  mod_obs = Observed_Model,
  mod_phant = Phantom_Model
)
#> Here are the phantom covariance matrix parameters (copy and paste and add values/names for step2):
#> 
#> 
#> phantom_assignment <- list("CovM1M2"= ,
#> "CovM1X"= ,
#> "CovM1Y2"= ,
#> "CovY1M1"= ,
#> "CovY1M2"= ,
#> "CovY1X"= ,
#> "CovY1Y2"= ,
#> "VarM1"= ,
#> "VarY1"= )
#> Here are the observed covariance matrix parameters:
#> CovXM2,CovXY2,CovY2M2
#> Choose which values you want to use for your phantom covariances.

This function prints the names of the phantom covariance parameters which will be used in the next function. Those parameters are: “CovM1M2”,“CovM1X”,“CovM1Y2”,“CovY1M1”,“CovY1M2”,“CovY1X”,“CovY1Y2”,“VarM1”,“VarY1”. The function also provides the names of the observed covariance parameters for the user’s reference.

SA_step2

The second step of the sensitivity analysis is to create covariance matrices that the phantom model will be fit to. To do this, the user must provide information about how the phantom variables covary with the observed variables. For this example, the user may wish to test different values for “CovM1M2”, “CovY1Y2”, “CovY1M2” and “CovM1Y2” and fix the other phantom covariances to single values. The easiest way to create the arguments for SA_step2 is to copy and paste the list of parameter names from the output of SA_step1 and add values.

There are several ways to specify the values for the phantom parameters. The first is to assign a single value, such as “CovM1X” = 0, or a list of values, such as “CovM1M2= seq(0,.6,.1). The other is to set it equal to another parameter, such as an observed parameter or another phantom parameter. In this case, if we wanted to make CovY1Y2 equal to CovM1M2, we would simply put”CovY1Y2”=“CovM1M2” and the values for CovM1M2 would be used for CovY1Y2 as well.



phantom_assignment <- list(
  "CovM1X" = 0,
  "CovY1M1" = "CovY2M2",
  "CovY1X" = 0,
  "VarM1" = 1,
  "VarY1" = 1,
  "CovM1M2" = seq(0, .6, .1),
  "CovY1Y2" = "CovM1M2",
  "CovY1M2" = seq(-.6, .6, .1),
  "CovM1Y2" = "CovY1M2"
)



Step2 <- SA_step2(
  phantom_assignment = phantom_assignment,
  step1 = Step1
)

SA_step2 creates a covariance matrix for each combination of the parameter test values provided. The function returns a list of five objects that are then passed to SA_step3, they are the lavaan syntax for the phantom variable model (“mod_phant”), a vector of variances (“var_phant”), a data frame of the combinations of phantom covariances (“combos”), a list of correlation matrices (“correlation_matrix_list”) and a list of covariance matrices (“covariance_matrix_list”).

names(Step2)
#> [1] "mod_phant"               "var_phant"              
#> [3] "combos"                  "correlation_matrix_list"
#> [5] "covariance_matrix_list"

The third function is SA_step3, which is where the sensitivity analysis is run. This function can take a great amount of time depending on how many different phantom parameter combinations the user has entered. For this function, there are only two arguments. The first is for the output of SA_step2 and the second is the sample size from the observed data.

Step3 <- SA_step3(
  step2 = Step2,
  n = 138
)

The function returns a list of four objects. The first object is a list of the lavaan output from each model, the second object is the list of covariance matrices used for each model, the third object is a list of correlation matrices, and the fourth object is the data frame of phantom covariance combinations.

At this point, the sensitivity analysis is complete, but the results are not in the most user-friendly form. The package contains a function to help to organize the results so that they can be examined or plotted. This function is called ghost_par_ests and it is used to pull out parameter estimatesfor a particular parameter (which must be named) from each phantom covariance combination. It has three arguments: step3 is the results from SA_step3, parameter_label is the label used in the lavaan code to identify the parameter of interest, and remove_NA is a logical object indicating whether the results returned by the function should include rows in which the model did not produce estimates.

Let’s see what the b-estimates would be for our example. The output from this function is a data frame that has the values of the phantom covariances that were varied as well as the lavaan output for the chosen parameter.

b_ests <- ghost_par_ests(
  step3 = Step3,
  parameter_label = "b",
  remove_NA = FALSE
)

a_ests <- ghost_par_ests(
  step3 = Step3,
  parameter_label = "a",
  remove_NA = FALSE
)

b_ests$aest <- a_ests$est
b_ests$abest <- b_ests$est * b_ests$aest
b_ests <- cbind(Step2[["combos"]], b_ests)
b_ests$`CovM1M2,CovY1Y2` <- round(b_ests$`CovM1M2,CovY1Y2`, 2)
b_ests$`CovY1M2,CovM1Y2` <- round(b_ests$`CovY1M2,CovM1Y2`, 2)

head(b_ests)
#>           CovM1M2,CovY1Y2 CovY1M2,CovM1Y2 rep  lhs   op  rhs label      est
#> reptemp               0.0            -0.6   1   Y2    ~   M2     b 1.146209
#> reptemp.1             0.1            -0.6   2 <NA> <NA> <NA>  <NA>       NA
#> reptemp.2             0.2            -0.6   3 <NA> <NA> <NA>  <NA>       NA
#> reptemp.3             0.3            -0.6   4 <NA> <NA> <NA>  <NA>       NA
#> reptemp.4             0.4            -0.6   5 <NA> <NA> <NA>  <NA>       NA
#> reptemp.5             0.5            -0.6   6 <NA> <NA> <NA>  <NA>       NA
#>                  se        z pvalue  ci.lower ci.upper     aest    abest
#> reptemp   0.1187104 9.655506      0 0.9135409 1.378877 3.789931 4.344053
#> reptemp.1        NA       NA     NA        NA       NA       NA       NA
#> reptemp.2        NA       NA     NA        NA       NA       NA       NA
#> reptemp.3        NA       NA     NA        NA       NA       NA       NA
#> reptemp.4        NA       NA     NA        NA       NA       NA       NA
#> reptemp.5        NA       NA     NA        NA       NA       NA       NA

We can then plot the results using ggplot2 (plotting function to be developed later).

library(ggplot2)
ggplot(b_ests, aes(x = `CovM1M2,CovY1Y2`, y = est)) +
  geom_point(size = 2) +
  scale_shape_manual() + # scale_fill_manual(values = c("black", "lightgray")) +
  scale_color_manual(values = c("grey", "black")) +
  facet_grid(~ b_ests$`CovY1M2,CovM1Y2`) +
  theme_bw() +
  scale_fill_grey() +
  theme(
    axis.title.y = element_text(size = 18),
    axis.text.x = element_text(size = 12),
    axis.title.x = element_text(size = 20),
    axis.text = element_text(size = 15),
    strip.text.y = element_text(size = 15),
    strip.text.x = element_text(size = 20),
    legend.text = element_text(size = 15),
    legend.title = element_text(size = 18),
    legend.position = "bottom",
    panel.spacing.x = unit(2, "mm")
  )

Another option is to use the lookup table that is part of the package. The function SA_lookup can be used to do this. Note that for this version, the SA_lookup function only provides standardized estimates as it uses the correlation matrix only.

cov2cor(memory_CrossSectional)
#>              X        M2        Y2
#> [1,] 1.0000000 0.6376509 0.2578654
#> [2,] 0.6376509 1.0000000 0.2711872
#> [3,] 0.2578654 0.2711872 1.0000000
lookup_results <- SA_lookup(CorXM = .64, CorXY = .26, CorMY = .27)
lookup_results_plot <- lookup_results %>% filter(round(corr, 1) == .3 & (stabr > -.1 & stabr < .7))

ggplot(lookup_results_plot, aes(x = stabr, y = est_ab)) +
  geom_point(size = 2) +
  scale_shape_manual() + # scale_fill_manual(values = c("black", "lightgray")) +
  scale_color_manual(values = c("grey", "black")) +
  facet_grid(~ lookup_results_plot$clr) +
  theme_bw() +
  scale_fill_grey() +
  theme(
    axis.title.y = element_text(size = 18),
    axis.text.x = element_text(size = 12),
    axis.title.x = element_text(size = 20),
    axis.text = element_text(size = 15),
    strip.text.y = element_text(size = 15),
    strip.text.x = element_text(size = 20),
    legend.text = element_text(size = 15),
    legend.title = element_text(size = 18),
    legend.position = "bottom",
    panel.spacing.x = unit(2, "mm")
  )

Comparing this result to the earlier result, the pattern of the results is similar, but the scale of the y-axis is different. This is because these results are standardized.