In my previous post on the paper by Gu, et al. [1], that identified risk of diabetes in red meat consumption, I had included a counter example as well as a statistical analysis showing how the conclusion violated common sense. What was really wrong with the study, however was the lack of a meaningful idea behind the endeavour. There is lately some bias against red meat which one has to live with but biological plausibility was absent and the paper was supported only by formal statistics. As such, it should not have been published.
Statistics
The Introduction to a good statistics text will tell you that “what we do in statistics is to put a number on our intuition.” I remember reading the exact line at one time but I could never track down the original source. The idea is that you start from the science, from the question to be answered, and what you suspect the outcome will look like. You may then propose or apply a mathematical model to the results of your experiment. In other words, the medical or scientific question comes first. Applied statistics always represents an interpretation. Often, however, medical literature is doing the opposite — many papers are trying to come up with an intuition to fit a number, trying to derive the science from the statistics. Sometimes the number may not even be the stimulus for an insight. The p value or some arbitrary number that indicates the results are “statically significant” may taken as the conclusion itself. The implication, in these cases, is that your experiment did not have independent justification and the significance was revealed by the data as processed. The corollary is that the type or class of experiment becomes more important than its quality. An experiment with a large n is considered better simply because of the size of the population studied. It is sometimes said, for example, that random controlled trials are good for generating hypotheses, sometimes “only” good for generating hypotheses. That is not generally right. Such a strategy would often be considered a fishing expedition. There are an infinite number of things to test. The experiment you actually perform will follow your hypothesis. In Einstein’s words, your theory determines the experiment you do.
That an association is found to be statistically significant is a mathematical conclusion and does not tell you whether the result has any biological importance. Often repeated, an association does not necessarily imply causality (although the disclaimer is not always included). Your intuition, typically framed as your hypothesis, will point to what kind of association does imply causality. Your starting hypothesis should be reasonable, should follow from observations, from previous work or deductions from the mathematics. A related point is that there is an obligation of an author to only publish associations that do support causality or at least ones that allow a meaningful scientific conclusion.
Of course, there are hunches — Enrico Fermi is famous for explaining that the way that he had come up with a particular hypothesis was con intuito formidabile. Few of us have Fermi’s intuition, however. Or his luck when we get it wrong. Fermi won the Nobel Prize for “proving” his hypothesis that bombarding Uranium with neutrons would lead to new heavier transuranic elements. His actual accomplishment was to have demonstrated nuclear fission which was not understood until some time later. (It is undoubtedly good that Hitler did not come to power with nuclear fission as a known entity). Generally, though, if your theory is a hunch, is far-fetched or unconventional, your experiment will have to meet very high standards. The “experiment” — it was epidemiology — in Gu, et al., finding red meat as a diabetes risk was unreasonable and does not stem from or provide any intuitive ideas. My first post on this subject gave a counter example — the dramatic reduction in red meat consumption during the increase in diabetes in the past fifty years. In the general case, much research shows that replacing protein with carbohydrate — the likely effect of the practice of replacing red meat with “plant-based” protein — is detrimental. With the justification of statistical significance in hand, the authors were able to come up with something about red meat as a justification. In the extreme, Gu, et al. invoked the presence of heme, literally the essence of our life’s blood. The cases where excess heme are pathological are diseases and, as far as I know, are not caused by diet but by genetics. Even if there is some risk of excess heme iron, does anybody think that it will exceed the risk of being iron-deficient? And if you want to claim a nutritional cause, you have to refute the strengths of the established candidate, dietary carbohydrate.
Standards and rules
It is not just the statistics. The collection of standards and rules and hierarchy of experiments and levels of evidence in medical research is largely unknown in the physical sciences where the value of the experiment resides in how well it answers the particular question. Perhaps, because medicine is largely applied science, there is a desire for explicit principles and physicians often have the idea that medicine is a different kind of thing from science.
I usually describe the problem by imagining a physician, whose behavior and personality is a pastiche of various well-known attributes. No particular person in mind but we have a caring, serious practitioner who has coupled insight with learning from patients experiences and who will provide the best medical care. There is a moment where they undertake formal clinical research. Suddenly all flexibility is put aside for standards of care, “level of evidence,” “gold standards” and accepted practices that may extend to bizarre activities like “intention to treat.”
Now there are useful principles and many problems can be approached in a very systematic way. The tendency to trust the rules more than your ideas, however, is the danger. I related the presumably apocryphal story in Nutrition in Crisis [2]. A guy comes to Mozart to get advice on becoming a composer. Mozart says that he should study theory for a couple of years. He should learn orchestration and become proficient at the piano. He goes on like this until finally, the man says “but you wrote your first symphony when you were 8 years old.” Mozart says “Yes, but I didn’t ask anybody.”
Mukherjee’s First Law
As I was finishing the draft of this post, I came across The Laws of Medicine by Siddhartha Mukherjee [3]. The author of several captivating essay and books, Mukherjee’s The Emperor of All Maladies is a popular science masterpiece. Laws promised to be exactly along the lines of this post: the search for principles that might define when Medicine is identifiable as a science The First Law:
“A strong intuition is much more powerful than a weak test.”
More or less a variation of the precept at the beginning of this post, the “law” was discovered by chance. Mukherjee described a patient with unexplained weight loss and fatigue. The patient had no risk factors, did not smoke and there was no family history of cancer. CAT scans, colonoscopy and the intervention of all kind of specialists provided no clue. In the end, a chance observation of the patient’s behavior outside the clinic led him to an hypothesis that would dictate the appropriate tests which allowed a diagnosis and treatment. (The solution is in a Comment for readers who want to think about it first and to avoid spoiler alerts).
The main idea was that “every scrap of evidence — a patients’s medical history, a doctor’s instincts,…physical examination,…behaviors, gossip —raises or lowers the probability. Once the probability tips over a certain point, you order a confirmatory test — and then you read the test in the context of the prior probability.” In other words, with an intuition formed, you now try to put a number on it. Notice that the doctor (experimenter) is at the heart of the law.
Some readers may recognize, from the phrase in the last sentence — “ prior probability” — where this was going. It was about Bayesian statistics. A Bayesian approach may allow us to deal with the problem of intuition and statistics. I will describe it in an upcoming post.
The solution to Mukherjee's diagnostic puzzle is in the below excerpt from his book "The Laws of Medicine," as follows:
"Over the next four weeks, we scoured his body for signs of cancer. CAT scans were negative. A colonoscopy, looking for an occult colon cancer, revealed nothing except for an occasional polyp. He saw a rheumatologist—for the fleeting arthritic pains in his fingers—but again, nothing was diagnosed. I sent out another volley of lab tests. The technician in the blood lab complained that Mr. Carlton's veins were so pinched that she could hardly draw any blood.
"For a while nothing happened. It felt like a diagnostic stalemate. More tests came back negative. Mr. Carlton was frustrated; his weight kept dropping, threatening to go all the way down to zero. Then, one evening, returning home from the hospital, I witnessed an event that changed my entire perspective on the case.
"Boston is a small town—and the geography of illness tracks the geography of its neighborhoods (I'll risk admonishment here, but this is how medical interns think). To the northeast lie the Italian neighborhoods of the North End and the rough-and-tumble shipyards of Charlestown and Dorchester, with high densities of smokers and asbestos-exposed ship workers (think lung cancer, emphysema, asbestosis). To the south are desperately poor neighborhoods overrun by heroin and cocaine. Beacon Hill and Brookline, sitting somewhere in the middle, are firmly middle-class bastions, with the spectra of chronic illnesses that generally affect the middle class.
"What happened that evening amounted to this: around six o'clock as I left the hospital after rounds, I saw Mr. Carlton in the lobby, by the Coffee Exchange, conversing with a man whom I had admitted months ago with a severe skin infection related to a heroin needle inserted incorrectly into a vein. The conversation could not have lasted for more than a few minutes. It may have involved something as innocuous as change for a twenty-dollar bill, or directions to the nearest ATM. But on my way home on the train, the image kept haunting me: the Beacon Hill scion chatting with the Mission Hill addict. There was a dissonant familiarity in their body language that I could not shake off—a violation of geography, of accent, of ancestry, of dress code, of class. By the time I reached my station, I knew the answer. Boston is a small town. It should have been obvious all along: Mr. Carlton was a heroin user. Perhaps the man at the Coffee Exchange was his sometime dealer, or an acquaintance of an acquaintance. In retrospect, I should also have listened to the blood-lab worker who had had such a hard time drawing Mr. Carlton's blood: his veins were likely scarred from habitual use.
"The next week, I matter-of-factly offered Mr. Carlton an HIV test. I told him nothing of the meeting that I had witnessed. Nor did I ever confirm that he knew the man from Mission Hill. The test was strikingly positive. By the time the requisite viral-load and the CD4 counts had been completed, we had clinched the diagnosis: Mr. Carlton had AIDS."
I had to stop and comment after the first paragraph. It should be printed and framed, and copies should hang on the walls of research institutions. Biological plausibility comes first. It has to be strong. You don't clinical trial in order to generate plausibility hypotheses. Most of all, you have to weigh the marginal effect of your "intervention" in the clinical scenario BEFORE you clinical trial. Please go to https://thethoughtfulintensivist.substack.com/