Hi Maria. Whether the CI crosses 1 or not is not the most important thing by any means.
The total breadth of included values is what should be considered, as these would all be considered relatively compatible with the observed point estimate. If it includes 1, then a "null" result would be considered compatible.
Then again, if the lower bound is 0.99 vs 1.01, clearly there is not a very big difference between these two results, even though one "crosses 1" and the other doesn't.
The width also gives us *some* handle on certainty. If the point estimate is 2, but the 95%CI goes from 0.5 to 44 - clearly we simply have no idea what the true effect is likely to be. Equally, if the 95%CI is very narrow (and most importantly the methods are highly robust) then we might feel more confident of the range of possible effects.
One thing to beware - 95%CIs are exceptionally influenced by sample size. Very big samples create very narrow 95%CIs - this is what we call "precision". However, if there is a bias in the study somewhere, this may be high precision around the wrong result! This is "accuracy" (or lack of!). Don't confuse precision for accuracy (a very common mistake in studies with big data).
Really interesting. Are there theories or explanations for the basis in the gap between the CI and the subjective probability? Is it tied to the experimental design?
Not experimental design, but perhaps yes for statistical design. Whilst the purist frequentists and Bayesians perhaps wouldn't like me framing it this way, the gap here is "closed" by the prior used in a Bayesian statistical analysis.
You start with a prior distribution of potential effects based on what is known before the experiment, then collect your data, and this results in a posterior distribution incorporating the new knowledge into what was already known (or believed...).
The difficulty here is when people disagree on the prior distribution. To get over this there are often a range of priors provided, including optimistic, pessimistic and flat.
Thank you for explaining. I now understand why clinical trials would have both types of stats in their protocols. Bit of a different read with my morning coffee before wrk 😁
Thanks for an interesting article regarding confidence intervals. Your example regarding vitamin C and sepsis has me scratching my head. Perhaps some formulas and maths would have been helpful.
You mentioned the true effect size as being being approximately zero. So essentially your testing the null hypothesis that Vitamin C does not lower mortality and you find in the hypothetical trial that the RR = 0.8 CI 95% CI 0.7-0.9. Why would you not reject your null hypothesis?
To me your just brushing off the hypothetical trial findings as wrong because it doesn’t match everyone else’s results. If that’s what I found I’d certainly consider it surprising and check for errors, but not matching prior results doesn’t make it automatically wrong. If nobody else can reproduce the results, that’s when I’d be worried but that’s not the example you’ve chosen.
How can you use your prior knowledge (assumption) of the result if that’s what your trying to determine? It defeats the point of running the trial in the first place.
You use the term effect size without explaining what it is. Apologies, if you’ve explained it in another article, I’ve only read this one and Measuring What Matters which I liked, which prompted me to read this one.
I can see a scenario where both can be true i.e. a statistically significant RR and approximately 0 effect size that don’t require the trial result to be random. I would contend the results as stated imply that the finding is in fact unlikely to be random. The right sample size combined with a small absolute risk reduction and low incidence of mortality could be combined so both statements are true. You can get the desired RR by dividing two small numbers and still have a treatment that is “effective” but also having little or no real world benefit i.e. approximately 0 “effect size.” Maybe that was your point and I missed it.
If you combine the above with measuring the wrong thing or surrogate end points that don’t correlate well with real world outcomes (sort of the subject of your other article) you get things that should work in theory but fail in practice.
Sorry, if the comment comes across as a bit harsh, I actually enjoyed the article and it made me think about the relationship between confidence intervals and point estimates. Your explanation is good but the example is confusing and probably would have benefited with a worked example showing where the numbers come from but I can also understand why you chose to leave out the math.
Thank you, very enlightening. So do we need to also pay attention to how wide this 95%CI is or just to whether it crosses the 1 as we usually do?
Hi Maria. Whether the CI crosses 1 or not is not the most important thing by any means.
The total breadth of included values is what should be considered, as these would all be considered relatively compatible with the observed point estimate. If it includes 1, then a "null" result would be considered compatible.
Then again, if the lower bound is 0.99 vs 1.01, clearly there is not a very big difference between these two results, even though one "crosses 1" and the other doesn't.
The width also gives us *some* handle on certainty. If the point estimate is 2, but the 95%CI goes from 0.5 to 44 - clearly we simply have no idea what the true effect is likely to be. Equally, if the 95%CI is very narrow (and most importantly the methods are highly robust) then we might feel more confident of the range of possible effects.
One thing to beware - 95%CIs are exceptionally influenced by sample size. Very big samples create very narrow 95%CIs - this is what we call "precision". However, if there is a bias in the study somewhere, this may be high precision around the wrong result! This is "accuracy" (or lack of!). Don't confuse precision for accuracy (a very common mistake in studies with big data).
Hope that helps!
Really interesting. Are there theories or explanations for the basis in the gap between the CI and the subjective probability? Is it tied to the experimental design?
Not experimental design, but perhaps yes for statistical design. Whilst the purist frequentists and Bayesians perhaps wouldn't like me framing it this way, the gap here is "closed" by the prior used in a Bayesian statistical analysis.
You start with a prior distribution of potential effects based on what is known before the experiment, then collect your data, and this results in a posterior distribution incorporating the new knowledge into what was already known (or believed...).
The difficulty here is when people disagree on the prior distribution. To get over this there are often a range of priors provided, including optimistic, pessimistic and flat.
Thank you for explaining. I now understand why clinical trials would have both types of stats in their protocols. Bit of a different read with my morning coffee before wrk 😁
Thanks for an interesting article regarding confidence intervals. Your example regarding vitamin C and sepsis has me scratching my head. Perhaps some formulas and maths would have been helpful.
You mentioned the true effect size as being being approximately zero. So essentially your testing the null hypothesis that Vitamin C does not lower mortality and you find in the hypothetical trial that the RR = 0.8 CI 95% CI 0.7-0.9. Why would you not reject your null hypothesis?
To me your just brushing off the hypothetical trial findings as wrong because it doesn’t match everyone else’s results. If that’s what I found I’d certainly consider it surprising and check for errors, but not matching prior results doesn’t make it automatically wrong. If nobody else can reproduce the results, that’s when I’d be worried but that’s not the example you’ve chosen.
How can you use your prior knowledge (assumption) of the result if that’s what your trying to determine? It defeats the point of running the trial in the first place.
You use the term effect size without explaining what it is. Apologies, if you’ve explained it in another article, I’ve only read this one and Measuring What Matters which I liked, which prompted me to read this one.
I can see a scenario where both can be true i.e. a statistically significant RR and approximately 0 effect size that don’t require the trial result to be random. I would contend the results as stated imply that the finding is in fact unlikely to be random. The right sample size combined with a small absolute risk reduction and low incidence of mortality could be combined so both statements are true. You can get the desired RR by dividing two small numbers and still have a treatment that is “effective” but also having little or no real world benefit i.e. approximately 0 “effect size.” Maybe that was your point and I missed it.
If you combine the above with measuring the wrong thing or surrogate end points that don’t correlate well with real world outcomes (sort of the subject of your other article) you get things that should work in theory but fail in practice.
Sorry, if the comment comes across as a bit harsh, I actually enjoyed the article and it made me think about the relationship between confidence intervals and point estimates. Your explanation is good but the example is confusing and probably would have benefited with a worked example showing where the numbers come from but I can also understand why you chose to leave out the math.