The limitations of data and the fracturing of opinion
From within, science appears as usual: parsimonious, slow, pedantic. But on the periphery something has changed.
I grew up in a town that others would describe as backwards. I was told, with curt assurance, farmers can predict the weather more accurately than meteorologists. People despised scientists, especially academic ones. They shunned the rigour of experiment and preferred personal trial-and-error. They practised dowsing (a pseudoscientific method of finding water underground), lauded echinacea as a panacea, and boasted about disgreements they'd incited with their GP. When I entered science I tried to keep it to myself and was ridiculed by those who found out: “You want to cower from the real world.”
Yet I always retained admiration for their (misplaced) distrust. In a way they were right: I learned, as I worked in various regions of the world, that the academics lacked resources, everywhere I went, and they were self-promotional, error-prone, and no one was monitoring them.
With fondness I recognise in online forums today that refusal to tolerate extraneous wisdom. The difference is: they now reciprocate others' curiosity. They are saying “what's your source?” when a comment arises that might be informed by data. It fills me with contentment. They are engaging with science and demanding a lot from her, including unequivocal prescriptions: “Do masks work or not?” (They do not know that this is best answered by a large pragmatic trial - an administrative nightmare, likely a mess with insufficient quality control, contaminated with non-adherence and missing data.)
There is a crudeness to their interpretation of results, lacking the specificity of instinct that accumulates through practice (e.g., a seasoned programmer can sense where an error resides in ten thousand lines of code). I try to tell myself: perhaps they will do some good by inadvertently energising the open-science movement which the academics support only nominally. But it seems they are more likely to fracture interpretation when they point to inconsistencies they do not understand, and infer that scientists have withheld information or misled them.
In an attempt to foment a shared, coherent scepticism, and pre-empt the cynical curators of opinion (the content creators), let us lay bare the feebleness of scientific data and the prevarications that often derive from them. Because, primarily, there is a need for consistency in interpretation of results which implies a common ability to detect the limitations of data that can make science appear reticent or even contradictory.
Without any disrespect, I refer to those on the periphery as outsiders or amateurs - that is who I am talking to, my old acquaintances back home (as if they still exist). I am just like them, only I have seen behind the curtain and can say: your penchant for scepticism is sound, I'm there with you, but you must be careful who you listen to.
A striking example to get us started is the apparent waning of the estimated effect of the COVID vaccine as we moved from Pharma-funded randomised controlled trials (RCT) to so-called real-world data (RWD), i.e. post-approval, observational data that are out of the hands of Pharma. A podcaster pondering the disconnect between RCT and RWD estimates said he felt the RCT overestimated the effect, thus indicating a deficiency in the study design and begging the question: why did the regulator allow it? Another insinuated something underhanded is going on; collusion between Pharma and the regulator to yield the inflated and artificial estimate.
This misunderstanding even ended up in court precedings in Canada, i.e., in the cross examination of Celia Lourenco, Director General of Health Canada, in the constitutional challenge brought by the Leader of the People’s Party of Canada et al.:
“[Dr Lourenco] confirms that the [RCTs] are required to demonstrate that the vaccine reduces symptomatic COVID by at least 50\%. The lawyer asks Lourenco if given the waning efficacy provided by the Public Health Agency of Canada ... would [the injections] still qualify for authorization ... if that had been observed in the trial”. [The 50% threshold is nominal and used to design the study.]
Dr Lourenco’s affirmation was offered as a shocking revelation by sceptics who appear swayed by the RWD descriptor. But RCT and RWD are not addressing the same question, i.e. statistically, they are not estimating the same quantity. The purpose of the RCT is to establish efficacy and accumulate exposure to explore safety; hence emphasis on patient follow-up, adherence to a study protocol and prespecified, conservative analyses. The RWD has a different purpose entirely, and the effect may be representative yet diluted; the data suffer from quality issues, missing data and other limitations.
Also note that we do not have RWD until the drug is approved and on the market; ipso facto, RWD cannot inform approval, and the lawyer's question is nonsensical. It is the same with saftey, i.e. the safety profile of a drug is not fully articulated until it enters the market against a backdrop of concomitant medications including off-label use. Safety detection, ultimately, amounts to a type I/type II error problem (over-reacting to limitless spurious signals and failing to react swiftly to legitimate signals), and the arbitrary setting of these thresholds. (I have noticed, on a drug's Wikipedia page, the list of potential side effects expand after the patent expires.)
It is ironic that some on the political right contrast RWD with RCT to suggest a compromised approvals process (as in the case above). In 2005 conservatives in America were pushing for reform that would permit other forms of data for consideration, i.e. non-RCT data, to speed up the approvals process. In other words, they were seeking to lower the threshold of evidentiary strength required for approval, a sentiment that contradicts conservative voices today.
At the time, the Society for Clinical Trials responded to this call from Republicans with a position paper; this paper provides concise examples of non-RCT data leading us astray:
“The theory that beta-carotene could prevent lung cancer was widely accepted based on retrospective (uncontrolled) epidemiologic studies of dietary beta-carotene consumption, yet two very large cancer prevention studies in the 1990s demonstrated convincingly that beta-carotene supplementation to smokers actually increases the incidence of lung cancer, to the great surprise of the medical community. Combined estrogen and progestin therapy was widely believed to reduce the risk of coronary heart disease (CHD) in post-menopausal women, yet the Women’s Health Initiative Study showed convincingly the opposite conclusion, that the treatment actually increased risk of myocardial infarction and CHD death compared to placebo treatment.”
Conclusions extracted from RWD are suggestive: the data are messy and effects are entangled (causal inference experts would chime in here, but their methods produce wide confindence intervals). The appeal of the large-scale RCT is that it yields a primary analysis that is straightforward (a clean estimate of the effect is calculable) and easily communicated. Although there are certainly poorly designed and premature RCTs which we should hesitate to overinterpret. And even high quality RCTs are routinely misinterpreted by the amateurs in their substacks and podcasts.
Consider MidwesternDoc's substack Forgotten Side of Medicine with over 100 thousand subscribers. In a post titled The Great Ozempic Scam and The Safe Ways to Lose Weight that was retweeted by RFK Jr (generating 1.5M views), referring to the pivotal RCT of semaglutide (Ozempic), the Doc declares: “Most of the participants could not stay on the drugs for a prolonged period”, alluding to safety and compliance concerns.
But the Doc has simply misunderstood the plot (this is not surprising; the Doc is an outsider). This trial, the SELECT trial, had an interim analysis; patients were recruited in a staggered fashion; then the interim analysis occured at the pre-specified time according to the accrual of cardiovascular events; this analysis determined that the study should be terminated. This explains why the number of patients is trailing off: the follow-up time is variable, it has nothing to do with compliance. The Doc also wanted his readers to know that 8% of those on semaglutide had serious adverse events. He failed to note that the corresponding number for the control group was 12%.
Aside from RWD heralded by the amateurs during COVID, we also saw meta-analysis offered as the pinnacle of evidentiary strength. Those who swallowed Goldacre's book Bad Pharma will reiterate this point-of-view; e.g. Bret Weinstein was excited by the meta-analysis of ivermectin. More recently, Jordan Peterson described the Cochrane collaboration as “the gold standard” when Cochrane's systematic review of masking landed. Others subsequently stole Peterson's sentiment and it spread as fact. Are they right? If I have spoken positively about RCTs, surely I must have a positive opinion of a meta-analysis of RCTs?
Confidence is the hallmark of the outsider. Peterson wouldn't know, for example, that the Cochrane software contained limitations that they didn't bother to fix for years (see Senn for details), and Cochrane leaders themselves describe systematic reviewing as like “searching through rubbish” and their reviews often conclude with a call for a well-desinged RCT (why would they want to take a step down their heirarchy of evidence?).
The reality is, meta-analysis has been on the ropes since 1997 when the Lancet published a meta-analysis of 89 homeopathy trials and declared homeopathy superior to placebo. Because homeopathy is a placebo, this was not a test of homeopathy but a test of meta-analysis which reliably compounded biases across trials. This result was considered a serious blow from which retrospective meta-analysis hasn't recovered. (The problem is intractable: you cannot repair low quality data by increasing statistical sophisticaion; if you try to solve the quality and heterogeneity problem you introduce a subjectivity problem.)
When the ivermectin meta-analysis was called into question due to data quality issues (exactly as predicted by Yuri Deigin) the insiders did not flinch, while the amateurs scrambled to rerun their analyses in Cochrane's plug-and-play online tool (a toy no statistician would ever use). It was obvious to insiders at the time, and should be obvious to all now, that the EMA's position on ivermectin was not dismissive. They were spot on. Those who offer themselves as sceptics were not sufficiently sceptical of meta-analysis. Promoting scepticism on some topic only meant re-allocating their credulity elsewhere.
There is something else, finally, I would like to clear up. All the talk (from Prasad, Peterson et al.) about the movement of staff between the medicines agency and Pharma: it is never clear what these commentators are concerned about. Perhaps it will suffice to note that those in regulation look askance at their counterparts in industry, and they are equipped to scrutinise them: they have all the company's data and code, they can run their own analyses on them - the academics' peer-review process looks futile in comparison.
The allusions to an easy-ride for Pharma betray the outsiders' ignorance, e.g., the sight of a drug company approaching an advisory committee meeting, they are almost out of their minds with anxiety. (I have seen a company produce over 30 thousand outputs in preparation.) The only time I have seen a representative plead with the agency to approve a drug was when I met with the academic investigator responsible for a trial in a rare disease (the key data supporting approval). This is worth repeating, as an antidote to the bleating anti-Pharma types: an academic was pleading with us to approve based on his flimsy data (he would not want to be known as the guy who loudly touted a drug that turned out to be a dud).
Let us bear these various details in mind, and we can fortify and hone the sceptical perspective. Data are coy, suggestive, they tease. Early phase results may be hyped in academia in an effort to win funding, but there is no analogous incentive for Big Pharma: why would the company want to fool itself into investing further and progressing to Phase III by hyping phase II results? In the public sphere ambiguity begets disagreeemnt and the outsiders will try to persuade you that it is their distance (i.e. their ignorance) from the institutions they seek to reform that confirms their integrity. You must instead find honest and informed insiders (scientists, not executives) to listen to. Unlike the outsiders, the insiders are not guessing, and thus not impressionable.