Page 5 Effect of telehealth on quality of life and psychological out comes over 12 months
P. 5
BMJ 2013;346:f653 doi: 10.1136/bmj.f653 (Published 26 February 2013) Page 5 of 20
RESEARCH
depressive symptoms (Center for Epidemiologic Studies questionnaire was completed. Owing to the variability in
Depression Scale (CESD-10)). telehealth duration for intervention participants at short term
58
The SF-12 is a 12 item measure of general health status and and long term assessments, this variable was included as a
health related QoL that uses norm based scoring for the general covariate.
population in the United States in 1998. The instrument was
scored in two subscales, the physical component summary score Sample size
and the mental component summary score; higher scores For the telehealth questionnaire study, a power calculation was
represent better health related QoL. The SF-12 has shown good conducted on the basis of detecting a small effect size (Cohen’s
test-retest reliability, validity, and responsiveness, and is d=0.3), allowing for an intracluster correlation coefficient of
recommended for patients with heart failure. 59 0.05, power of 80%, and P<0.05. This calculation indicated that
60
The EQ-5D assesses five domains of generic health related about 500 patients would be required to allow sufficient power
QoL (mobility, self care, usual activities, pain and discomfort, to detect this small difference, ranging from 420 participants
anxiety and depression) and can generate either a health state (five from each of 84 practices) to 520 participants (10 from
(of 243 different states) or a single summary score (higher scores each of 52 practices). These numbers were inflated by 10% to
reflect better health related QoL). The EQ-5D has shown good allow for the maximum possible increase in sample size due to
74
validity and responsiveness and has been recommended for variable cluster size. The required sample size thus increased
61
patients with diabetes and, more cautiously, for patients with to 550. For sufficient power in our secondary subgroup analyses
62
59
chronic obstructive pulmonary disease and heart failure. For (not reported here), we aimed to recruit 550 patients per long
current purposes, the summary score was used. term condition, or 1650 overall. All analyses reported here
63
The Brief STAI is a six item measure of state anxiety that has exceed the required sample size (550) and are therefore
shown acceptable reliability and validity. 63 64 It is widely used adequately powered.
in clinical research, notably in studies of patients with diabetes. 65
The state version, rather than the trait version, of the Brief STAI Statistical methods
was used (higher scores reflect greater state anxiety). Missing self reported data could occur at the questionnaire level
66
The CESD-10 is a 10 item measure of depressive symptoms or at the item or scale level. A participant who completes the
covering cognitive, emotional, and behavioural domains. It has questionnaire battery at baseline could fail to complete the
66
acceptable validity and reliability, and sensitivity and questionnaire at short term or long term. Alternatively, a
67
specificity. The original 20 item version has been used widely participant who largely completes a questionnaire could
with clinical populations, including chronic obstructive nevertheless fail to provide responses to certain items or may
69
68
pulmonary disease and heart failure, although both versions miss out whole scales within the battery.
of the scale include items that confound symptoms of physical For the outcomes reported, missing values at the questionnaire
illness with symptoms of depression (for example, “I felt that level were not imputed. We imputed missing values at the item
everything I did was an effort”; “My sleep was restless”). 70 or scale level using two methods. If a missing value belonged
Scores range from 0 to 30, with higher scores indicating more to a scale and at least 50% of responses were available for the
depressive symptoms. scale (for a particular participant), we used the series mean for
Minimal clinically important differences (MCIDs) have not that scale (for that participant) to fill in missing values. If a
been established for these patient reported outcomes. To evaluate missing value for an item did not belong to a scale (for example,
the magnitude of any treatment effect, we regarded a trial index of multiple deprivation score) or if fewer than 50% of
defined MCID as an effect size equivalent to Cohen’s d=0.3. scale items were completed, missing values (either for items or
This magnitude represents a “small” effect in the behavioural scale totals) were multiply imputed (m=10), on the basis of
sciences. 71 available data from several scales and items across all
participants. We did multiple imputation using the Markov chain
Covariates in the analyses Monte Carlo function (SPSS).
Data were collected on a range of sociodemographic and trial We repeated analyses on each of the ten imputed datasets, and
related characteristics that could plausibly be related to the study thereafter used standard multiple imputation procedures to
75-77
outcomes. These variables were used as covariates in the main combine the multiple scalar and multivariate estimates with
78
analyses. Date of birth and sex were extracted from general SPSS (version 19) and NORM. We explored the influence of
practice records. Ethnicity was assessed by self report, using missing data at the questionnaire level by conducting complete
16 response categories based on standard UK categories from case analyses (participants with data for all variables at all time
72
the Office of National Statistics ; missing responses were points) and available case analyses (participants with data for
subsequently completed using data from medical records, where all variables at baseline and at least one other time point).
available. Education was assessed by self report using five Depending on the reasons for missingness, both these
response categories ranging from no formal education to approaches can generate biased results, but they are used here
graduate or professional level. We used participants’ postcodes as sensitivity analyses to assess the robustness of the findings.
to allocate an index of multiple deprivation score. 73 General practices were the unit of randomisation and were
Comorbidity was assessed by a count of diagnosed conditions directly involved in the delivery of care to all participants, which
in hospital episode statistics over the three years before the trial could result in participants within practices being more similar
began. The WSD project teams provided data for participants’ than participants between practices. Causes of similarity within
WSD site; the presence or absence of a diagnosis of chronic practices include pre-existing case mix differences between
obstructive pulmonary disease, diabetes, and heart failure; and practice populations, and both general and specific practice
the number and type of telehealth peripheral devices installed. effects (for example, factors that facilitate or inhibit access,
The WSD evaluation team held data for participants’ allocation general practitioner case load, the extent to which care is centred
(to telehealth or usual care) and calculated the duration of around the patient). To account for practice differences,
exposure to telehealth (in days) at the time each assessment multilevel modelling was used with observations (at different
No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe
RESEARCH
depressive symptoms (Center for Epidemiologic Studies questionnaire was completed. Owing to the variability in
Depression Scale (CESD-10)). telehealth duration for intervention participants at short term
58
The SF-12 is a 12 item measure of general health status and and long term assessments, this variable was included as a
health related QoL that uses norm based scoring for the general covariate.
population in the United States in 1998. The instrument was
scored in two subscales, the physical component summary score Sample size
and the mental component summary score; higher scores For the telehealth questionnaire study, a power calculation was
represent better health related QoL. The SF-12 has shown good conducted on the basis of detecting a small effect size (Cohen’s
test-retest reliability, validity, and responsiveness, and is d=0.3), allowing for an intracluster correlation coefficient of
recommended for patients with heart failure. 59 0.05, power of 80%, and P<0.05. This calculation indicated that
60
The EQ-5D assesses five domains of generic health related about 500 patients would be required to allow sufficient power
QoL (mobility, self care, usual activities, pain and discomfort, to detect this small difference, ranging from 420 participants
anxiety and depression) and can generate either a health state (five from each of 84 practices) to 520 participants (10 from
(of 243 different states) or a single summary score (higher scores each of 52 practices). These numbers were inflated by 10% to
reflect better health related QoL). The EQ-5D has shown good allow for the maximum possible increase in sample size due to
74
validity and responsiveness and has been recommended for variable cluster size. The required sample size thus increased
61
patients with diabetes and, more cautiously, for patients with to 550. For sufficient power in our secondary subgroup analyses
62
59
chronic obstructive pulmonary disease and heart failure. For (not reported here), we aimed to recruit 550 patients per long
current purposes, the summary score was used. term condition, or 1650 overall. All analyses reported here
63
The Brief STAI is a six item measure of state anxiety that has exceed the required sample size (550) and are therefore
shown acceptable reliability and validity. 63 64 It is widely used adequately powered.
in clinical research, notably in studies of patients with diabetes. 65
The state version, rather than the trait version, of the Brief STAI Statistical methods
was used (higher scores reflect greater state anxiety). Missing self reported data could occur at the questionnaire level
66
The CESD-10 is a 10 item measure of depressive symptoms or at the item or scale level. A participant who completes the
covering cognitive, emotional, and behavioural domains. It has questionnaire battery at baseline could fail to complete the
66
acceptable validity and reliability, and sensitivity and questionnaire at short term or long term. Alternatively, a
67
specificity. The original 20 item version has been used widely participant who largely completes a questionnaire could
with clinical populations, including chronic obstructive nevertheless fail to provide responses to certain items or may
69
68
pulmonary disease and heart failure, although both versions miss out whole scales within the battery.
of the scale include items that confound symptoms of physical For the outcomes reported, missing values at the questionnaire
illness with symptoms of depression (for example, “I felt that level were not imputed. We imputed missing values at the item
everything I did was an effort”; “My sleep was restless”). 70 or scale level using two methods. If a missing value belonged
Scores range from 0 to 30, with higher scores indicating more to a scale and at least 50% of responses were available for the
depressive symptoms. scale (for a particular participant), we used the series mean for
Minimal clinically important differences (MCIDs) have not that scale (for that participant) to fill in missing values. If a
been established for these patient reported outcomes. To evaluate missing value for an item did not belong to a scale (for example,
the magnitude of any treatment effect, we regarded a trial index of multiple deprivation score) or if fewer than 50% of
defined MCID as an effect size equivalent to Cohen’s d=0.3. scale items were completed, missing values (either for items or
This magnitude represents a “small” effect in the behavioural scale totals) were multiply imputed (m=10), on the basis of
sciences. 71 available data from several scales and items across all
participants. We did multiple imputation using the Markov chain
Covariates in the analyses Monte Carlo function (SPSS).
Data were collected on a range of sociodemographic and trial We repeated analyses on each of the ten imputed datasets, and
related characteristics that could plausibly be related to the study thereafter used standard multiple imputation procedures to
75-77
outcomes. These variables were used as covariates in the main combine the multiple scalar and multivariate estimates with
78
analyses. Date of birth and sex were extracted from general SPSS (version 19) and NORM. We explored the influence of
practice records. Ethnicity was assessed by self report, using missing data at the questionnaire level by conducting complete
16 response categories based on standard UK categories from case analyses (participants with data for all variables at all time
72
the Office of National Statistics ; missing responses were points) and available case analyses (participants with data for
subsequently completed using data from medical records, where all variables at baseline and at least one other time point).
available. Education was assessed by self report using five Depending on the reasons for missingness, both these
response categories ranging from no formal education to approaches can generate biased results, but they are used here
graduate or professional level. We used participants’ postcodes as sensitivity analyses to assess the robustness of the findings.
to allocate an index of multiple deprivation score. 73 General practices were the unit of randomisation and were
Comorbidity was assessed by a count of diagnosed conditions directly involved in the delivery of care to all participants, which
in hospital episode statistics over the three years before the trial could result in participants within practices being more similar
began. The WSD project teams provided data for participants’ than participants between practices. Causes of similarity within
WSD site; the presence or absence of a diagnosis of chronic practices include pre-existing case mix differences between
obstructive pulmonary disease, diabetes, and heart failure; and practice populations, and both general and specific practice
the number and type of telehealth peripheral devices installed. effects (for example, factors that facilitate or inhibit access,
The WSD evaluation team held data for participants’ allocation general practitioner case load, the extent to which care is centred
(to telehealth or usual care) and calculated the duration of around the patient). To account for practice differences,
exposure to telehealth (in days) at the time each assessment multilevel modelling was used with observations (at different
No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe