Transition diagnostic classification models (TDCMs) are psychometric models that estimate respondents latent class membership longitudinally (Madison & Bradshaw, 2018). TDCMs are commonly estimated using the maximum likelihood algorithm. However, there are benefits to estimating Bayesian models (e.g., Zhan et al., 2019), and we can use Stan to estimate Bayesian TDCMs. The problem is that developing Stan code is both time intensive and requires significant technical expertise.
We aimed to address that problem through the tdcmStan package. The goal of tdcmStan is to facilitate the production of Stan code for estimating TDCMs. The tdcmStan packages creates the Stan code for estimating a TDCM, without requiring extensive knowledge of Stan syntax. For those with experience in producing Stan syntax, the Stan code produced by the tdcmStan package can be further editted to meet specific needs.
You can install the release version of tdcmStan from CRAN:
To install the development version from GitHub use:
To estimate models using Stan code, users will need an installation of rstan or cmdstanr. The instructions for installing rstan can be found here: RStan Getting Started for Windows, Mac, and Linux. The instructions for installing cmdstanr can be found here: Getting Started with CmdStanR. A full demonstration of how to install and use rstan and cmdstanr is outside of the scope of this vignette.
Once tdcmStan has been installed, we are ready to create Stan code for estimating a TDCM.
To demonstrate the workflow using tdcmStan, we present an example
case for generating Stan code. For this example, we will assume
that we are going to be estimating a TDCM for a 5-item 1-attribute
assessment that is administered at two time points. In our example, the
resulting Stan code is first saved in the
stan_code
object, and then the Stan code is saved
as a .stan file where it can be read in for model estimation.
library(tibble)
library(here)
library(readr)
q_matrix <- tibble(rep(1, 5))
stan_code <- create_stan_tdcm(q_matrix)
stan_code %>%
readr::write_lines(here("Stan/tdcm.stan"))
To be consistent with the assumption of item invariance described in
(2018), the
create_stan_tdcm()
function estimates item parameters such
that the parameter values for each item are the same at Time 1 and Time
2.
In many cases, the process for estimating Bayesian TDCMs may be time intensive. To work around this, we can use estimate the TDCM with multi-threading, which uses parallel processes to increase estimation efficiency and to reduce estimation time.
To estimate a TDCM with multi-threading, we would run:
The number of shards refers to the number of parallel processes that
are used in a multi-threaded TDCM. The tdcmStan package includes a
shard_calculator()
function to automatically calculate how
many shards can be used. To do this, we run:
The tdcmStan package follows a similar process for producing Stan code for TDCMs with other constraints. In addition to the assumption of item invariance across time points, we can assume item fungibility. Fungibility means that the items measuring an attribute are assumed to have the same item parameter values. For example, in our running example, all five items measure the attribute. In a fungible model, this means that there will be one shared item intercept parameter and one shared item main effect parameter for all five items.
To estimate a fungible TDCM for this 5-item 1-attribute assessment, we would run:
The previous examples have created Stan code for estimating a TDCM under the assumption that the respondents are completing the same items at each assessment point; however, this may not always be the case for assessment programs. Some assessment programs may only allow respondents to complete each item once.
To estimate a fungible TDCM for a 1-attribute assessment where respondents complete different items at each assessment point, we would run:
q_matrix <- tibble(rep(1, 10))
fng_no_cmn_items_stan_code <- create_fng_no_common_item_tdcm(q_matrix)
In this example, the assessment consists of five items that are completed at each time point, but respondents do not complete the same item twice. This means that respondents would be completing five of the items at Time 1 and the other five items at Time 2, and the items completed at Time 1 do not necessarily have to be consistent across all of the respondents. For example, Respondent 1 might complete Items 2, 4, 6, 8, and 10 at Time 1 and the remaining items at Time 2, while Respondent 2 might complete Items 1, 2, 3, 4, and 5 at Time 1 and the remaining items at Time 2.