Assessment of Dichotomous Response

Were you happier or healthier any time?

Often one sees evaluations of effect using improvements by more than a threshold value at a specific time-point or at any time over a schedule of assessments. We will see, through our calculator, that these endpoints can be problematic even when the thresholds represent minimally important differences (MID – we use half a standard deviation, often seen to be relevant). Flagging subject response based on the crossing of a threshold at any time over a large number of assessments is particularly susceptible to overstatements of effect. This is like evaluating if a subject is happier or healthier at any time over 5 or 10 visits – likely a lot of people will be.

Calculator for Assessment of Response Classifications

In this calculator we evaluate response criteria based on improvement by various threshold values on measures from study baseline to post-baseline. The estimated proportion of subjects likely to be deemed to have responded based on such thresholds is provided for user specified aggregate mean improvement post-baseline (in MID units). We look at subject level improvement thresholds in multiples M of the minimally important differences (MID) on the measure of interest and evaluate the proportions of subjects crossing these improvement thresholds given aggregate effect. The first box of the calculator allows user input of data. The second box assesses subject level response rates given an immediate step change post-baseline due to the intervention being studied. The third box assesses subject level response rates given a gradual linear change post-baseline due to the intervention. We examine three scenarios.

In the first scenario, there is no net effect due the intervention. The aggregate mean improvement is 0. The second row allows input of a correlation between the baseline and the post-baseline measures. We entered 0.5 as a convenience, as this results in a standard deviation (SD) of the change post-baseline of the same magnitude as that for the  baseline measure – 2 MID units (I had noted incorrectly previously that a correlation of 0.7 results in an SD of the change identical to that for baseline – I thank a Health Outcomes statistician for this correction). We enter 6 periodic measures post-baseline and a 1 MID subject level threshold for the change to assess the subject as having a response. This results in an estimated 30.85% response rate “at the last visit” despite the null net effect. This brings back our little paradox with the large proportion discordant to an aggregate effect discussed at this page.  If a mean and distribution framework were ‘real’, then one would discount the 30.85% response rate as unreal and arising out of random variation around the mean. A scientist having data with close to a null aggregate effect, will, if he pores through individual subject records, see changes for about 30% of his subjects which are of a relevant and important magnitude and hence ‘real’. An 89.07% estimated response rate, with this null net effect, is likely if we flag response based on the crossing of the threshold “at any time”. Now, this might be something that the scientist will likely see as unreal,  as those flagged as having response may not have an effect that persists over the periodic assessments. A deterioration statistic, which one would not typically report, has the same estimated proportions under null net effect.

In scenario 2 we consider an aggregate effect of 0.5 MID. The proportion having a response “at the last visit” goes up to 40.13%. This is a little deceptive in contexts without a parallel control. In such contexts one should perhaps compare against a putative control rate of  30.85% – the rate we obtained in scenario 1 when there is  no net effect post-baseline. In scenario 3, with an aggregate worsening by 0.5 MID, we see a 22.66% response rate “at last visit” and a 78.60% response rate when we look at response “at any time”. The numbers in the third box of the calculator are somewhat lower for scenario 2 and higher for scenario 3 for the “at any time” assessment as the change occurs gradually. Note that high within subject correlations on a measure help, and lower correlation results in increases in subjects crossing the threshold. You can enter a value of 0.2 instead of the 0.5 for the correlation in the calculator to see this effect.

Some Observations on Constructing a Response Classification

If you increase M to a larger number than the default 1 MID, you make the threshold more conservative, resulting in a lower ‘zero error’ proportion when there is null net effect. However, it is likely that there will be loss of information with too conservative a threshold in certain contexts where interventions are only moderately effective. It might  be helpful then to provide the null effect proportions and perhaps the symmetric ‘deterioration’ assessment. Stronger response assessment can also be constructed by using a composite requiring a threshold on multiple measures.

The American College of Rheumatology (ACR) 20, 50 and 70% improvement measures require improvements on tender joint counts and swollen joint counts and three of five other measures, before a patient with rheumatoid arthritis is deemed to have had  a response. What strengthens the presentation of response in this context is the use of increasing thresholds improvements from 20 to 70% as well as the requirement that the thresholds be met on multiple measures. Notice the and‘s in the assessment – you do not want to see “or” as that will result in a weaker threshold than the use of the constituent measures. Note that the constituent measures should ideally provide additional independent information. There are moderate correlations across the ACR measures making it likely that there is some residual response rate triggered even with stable disease post-baseline, especially with the ACR composite with the lower 20% threshold. A number of studies report placebo (usually with less effective older standards such as Methotrexate added in) response rate on ACR20 of about 15%. Likely one may see ACR(Minus20), defined analogously, looking at deterioration instead, reporting a similar 15% rate in these control groups if methotrexate lacks effect.

Edit the blue cells in the spreadsheet and enter your data and the calculations in the bottom box of the spreadsheet will refresh.