Improvements to lavaanPlot
Alex Lishinski
2024-01-26
Source:vignettes/Improvements_to_lavaanPlot.Rmd
Improvements_to_lavaanPlot.Rmd
I’ve been working on some improvements to the lavaanPlot package, in order to take advantage of updates to the diagrammeR package.
I’ll spare you all the details, but diagrammeR has introduced a way
of building graph plots using node and edge defining dataframes, which
enable a more extensible way of customizing plots. I am trying to bring
the advantages of this flexibility to the lavaanPlot
package to enable more customization. The old way that the package was
set up is solid, but it’s difficult to add new features, and the goal of
the new approach is to unlock the full customization options that the
graphViz software and the DOT language have to offer.
I’ve tried to keep things from the old approach to the extent that I could, but there are some new elements of the user interface for this new version of the package. I’m not finished with everything I set out to accomplish yet, but I’m writing this vignette to introduce the new version of the package so people can give it a try and find issues that I can fix. I’m releasing this as version 0.7.0, with the goal of fixing issues and fully fleshing out the functionality that I’d like the package to have over a couple more iterations of the package to get ready for a fully matured 1.0.0 release, where hopefully then I can fully deprecate the old code.
Here are some examples with the new code, the function being called
lavaanPlot2
.
Starting with a basic model using mtcars
which only
contains observed variable relationships and no latent variable
relationships.
## This is lavaan 0.6-17
## lavaan is FREE software! Please report any bugs.
library(lavaanPlot)
model <- 'mpg ~ cyl + disp + hp
qsec ~ disp + hp + wt'
fit <- sem(model, data = mtcars)
summary(fit)
## lavaan 0.6.17 ended normally after 32 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 9
##
## Number of observations 32
##
## Model Test User Model:
##
## Test statistic 18.266
## Degrees of freedom 2
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## mpg ~
## cyl -0.987 0.738 -1.337 0.181
## disp -0.021 0.010 -2.178 0.029
## hp -0.017 0.014 -1.218 0.223
## qsec ~
## disp -0.008 0.004 -2.122 0.034
## hp -0.023 0.004 -5.229 0.000
## wt 1.695 0.398 4.256 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .mpg ~~
## .qsec 0.447 0.511 0.874 0.382
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .mpg 8.194 2.049 4.000 0.000
## .qsec 0.996 0.249 4.000 0.000
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
fit2 <- cfa(HS.model, data=HolzingerSwineford1939)
summary(fit2)
## lavaan 0.6.17 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Number of observations 301
##
## Model Test User Model:
##
## Test statistic 85.306
## Degrees of freedom 24
## P-value (Chi-square) 0.000
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## visual =~
## x1 1.000
## x2 0.554 0.100 5.554 0.000
## x3 0.729 0.109 6.685 0.000
## textual =~
## x4 1.000
## x5 1.113 0.065 17.014 0.000
## x6 0.926 0.055 16.703 0.000
## speed =~
## x7 1.000
## x8 1.180 0.165 7.152 0.000
## x9 1.082 0.151 7.155 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## visual ~~
## textual 0.408 0.074 5.552 0.000
## speed 0.262 0.056 4.660 0.000
## textual ~~
## speed 0.173 0.049 3.518 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .x1 0.549 0.114 4.833 0.000
## .x2 1.134 0.102 11.146 0.000
## .x3 0.844 0.091 9.317 0.000
## .x4 0.371 0.048 7.779 0.000
## .x5 0.446 0.058 7.642 0.000
## .x6 0.356 0.043 8.277 0.000
## .x7 0.799 0.081 9.823 0.000
## .x8 0.488 0.074 6.573 0.000
## .x9 0.566 0.071 8.003 0.000
## visual 0.809 0.145 5.564 0.000
## textual 0.979 0.112 8.737 0.000
## speed 0.384 0.086 4.451 0.000
labels2 = c(visual = "Visual Ability", textual = "Textual Ability", speed = "Speed Ability")
You can still plot the model using no additional options:
lavaanPlot2(fit)
You can still add labels to the plot with the labels
argument, although it now uses a named character vector instead of a
list:
labels <- c(mpg = "Miles Per Gallon", cyl = "Cylinders", disp = "Displacement", hp = "Horsepower", qsec = "Speed", wt = "Weight")
lavaanPlot2(fit, labels = labels)
Graph options, node options, and edge options are supplied via named lists, as previously:
lavaanPlot2(fit, labels = labels, graph_options = list(label = "my first graph", rankdir = "LR"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"))
A change to the interface is how one can indicate which model paths
to include in the plot, using the include
argument. The
default option will include just regression and latent variable
relationships, include = covs
will include model
covariances, whereas include = all
will also include error
variances.
lavaanPlot2(fit, include = "covs", labels = labels, graph_options = list(label = "Including covariates"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"))
lavaanPlot2(fit, include = "all", labels = labels, graph_options = list(label = "including error variances"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"))
Coefficient labels can still be included on the edges, and
selectively for the different parts of the plot, using the
coef_lablels
argument:
lavaanPlot2(fit, include = "covs", coef_labels = TRUE, labels = labels, graph_options = list(label = "including coefficient labels"), node_options = list(fontname = "Helvetica"), edge_options = list(color = "grey"))
And significance stars can be added to these coefficient labels using
the stars argument, just as with lavaanPlot
:
lavaanPlot2(fit, include = "covs", labels = labels, graph_options = list(label = "my first graph with significance stars"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"), stars = c("regress"), coef_labels = TRUE)
lavaanPlot2(fit2, include = "covs", labels = labels2, graph_options = list(label = "my first graph with signficance stars"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"), stars = c("latent"), coef_labels = TRUE)
lavaanPlot2(fit2, include = "covs", labels = labels2, graph_options = list(label = "my first graph, which is being used to illustrate how to use the new code in the lavaanPlot package"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"), stars = c("covs"), coef_labels = TRUE)
The next stage of development is to allow for subset formatting, where different formatting is applied to sets of nodes or edges. The most obvious cases for this are to allow different formatting for the sets of latent vs observed nodes, and the regression, latent, covariance, and error variance edges. Ideally though I want to enable users to be able to apply different formatting to arbitrary subsets of nodes or edges as they see fit.