My previous post on the paper by Gu, et al. [1], finding risk of diabetes in red meat provided a counter example and some statistical analysis showing how the conclusion violated common sense. What was really wrong is that there was no meaningful idea behind the study. There is some bias against red meat but biological plausibility was absent and the paper was supported only by formal statistics. As such, it should not have been published.
Statistics
The Introduction to a good statistics text will tell you that “what we do in statistics is to put a number on our intuition.” I remember reading the exact line at one time but I could never track down the original source. The idea is that you start from the science, from the question to be answered and what the outcome will look like. You propose or apply a mathematical model to the results of your experiment. In other words, the medical or scientific question comes first. Applied statistics always represents an interpretation. A major defect in the medical literature is that often the opposite is what’s going on — many papers are trying to come up with an intuition to fit a number, trying to derive the science from the statistics. Sometimes the number may not even be the problem. The p value or some arbitrary number that indicates the results are “statically significant” may taken as the conclusion. The implication, in these cases, is that your experiment did not have independent justification and the significance was revealed by the statistics. The corollary is that the type of experiment becomes more important than its quality. An experiment with a large n is considered better simply because of the size of the population studied. It is sometimes said, for example, that random controlled trials are good for generating hypotheses, sometimes said to be “only” good for generating hypotheses. That is not right. Such a strategy would be considered a fishing expedition. There are an infinite number of things to test. The experiment you actually perform will follow your hypothesis. In Einstein’s words, your theory determines the experiment you do.
That an association is found to be statistically significant is a mathematical conclusion and does not tell you whether the result has any biological importance. Often repeated, an association does not necessarily imply causality (although the disclaimer is not always included). Your intuition, typically framed as your hypothesis, will point to what kind of association does imply causality. Your starting hypothesis should be reasonable, should follow from observations, from previous work or deductions from the mathematics. A related point is that there is an obligation of an author to only publish associations that do support causality or at least ones that allow a meaningful scientific conclusion.
Of course, there are hunches — Enrico Fermi is famous for explaining that the way that he had come up with a particular hypothesis was con intuito formidabile. Few of us, however, have Fermi’s intuition. Or his luck when we get it wrong. Fermi won the Nobel Prize for “proving” his hypothesis that bombarding Uranium with neutrons would lead to new heavier transuranic elements. What he had actually had accomplished was nuclear fission which was not understood until some time later. (It is undoubtedly good that Hitler did not come to power with nuclear fission as a known entity). Generally, though, if your theory is a hunch, is far-fetched or unconventional, your experiment will have to meet high standards. The “experiment” in Gu, et al., finding risk of diabetes in red meat is unreasonable and does not stem from or provide any intuitive ideas. My first post on this subject gave a counter example and, in the general case, much research shows that replacing protein with carbohydrate — the the likely effect of replacing red meat with “plant-based” protein — is detrimental. With the justification of statistical significance in hand, the authors were able to come up with something about red meat as a justification. In the extreme, Gu, et al. invoked the presence of heme, literally the essence of our life’s blood. The cases where excess heme are pathological are diseases and, as far as I know, are not caused by diet but by genetics. Even if there is some risk of excess heme iron, does anybody think that it will exceed the risk of being iron-deficient? And if you want to claim a nutritional cause, you have to refute the strengths of the established candidate, dietary carbohydrate.
Standards and rules
It is not just the statistics. The collection of standards and rules and hierarchy of experiments and levels of evidence in medical research is largely unknown in the physical sciences where the value of the experiment resides in how well it answers the particular question. Perhaps, because medicine is largely applied science, there is a desire for explicit principles and physicians often have the idea that medicine is a different kind of thing from science.
I usually describe the problem by imagining a physician, whose behavior and personality is a pastiche of various well-known attributes. No particular person in mind but we have a caring, serious practitioner who has coupled insight with learning from patients experiences and who will provide the best medical care. There is a moment where they undertake formal clinical research. Suddenly all flexibility is put aside for standards of care, “level of evidence,” “gold standards” and accepted practices that may extend to bizarre activities like “intention to treat.”
Now there are useful principles and many problems can be approached in a very systematic way. The tendency to trust the rules more than your ideas, however, is the danger. I related the presumably apocryphal story in Nutrition in Crisis [2]. A guy comes to Mozart to get advice on becoming a composer. Mozart says that he should study theory for a couple of years. He should learn orchestration and become proficient at the piano. He goes on like this until finally, the man says “but you wrote your first symphony when you were 8 years old.” Mozart says “Yes, but I didn’t ask anybody.”
Mukherjee’s First Law
As I was finishing the draft of this post, I came across The Laws of Medicine by Siddhartha Mukherjee [3]. The author of several captivating essay and books, Mukherjee’s The Emperor of All Maladies is a popular science masterpiece. Laws promised to be exactly along the lines of this post: the search for principles that might define when Medicine is identifiable as a science The First Law:
“A strong intuition is much more powerful than a weak test.”
More or less a variation of the description of statistics at the beginning of this post, the “law” was discovered by chance. Mukherjee described a patient with unexplained weight loss and fatigue. The patient had no risk factors, did not smoke and there was no family history of cancer. CAT scans, colonoscopy and the intervention of all kind of specialists provided no clue. In the end, a chance observation of the patient’s behavior outside the clinic led him to an hypothesis that would dictate the appropriate tests which allowed a diagnosis and treatment. (The solution is in a Comment for readers who want to think about it first and to avoid spoiler alerts).
The main idea was that “every scrap of evidence — a patients’s medical history, a doctor’s instincts,…physical examination,…behaviors, gossip —raises or lowers the probability. Once the probability tips over a certain point, you order a confirmatory test — and then you read the test in the context of the prior probability.” In other words, with an intuition formed, you now try to put a number on it. Notice that the doctor (experimenter) is at the heart of the law.
Some readers may recognize, from the phrase in the last sentence — “ prior probability” — where this was going. It was about Bayesian statistics. A Bayesian approach may allow us to deal with the problem of intuition and statistics. I will describe it in an upcoming post.
References
Gu, X, Drouin-Chartier, J-P, Sack, FM, Frank, FB, Rosner, B. Willett, WC. Red meat intake and risk of type 2 diabetes in a prospective cohort study of United States females and males. Am. J. Clin. Nutr.. (2023) 118, 1153-1163. https://ajcn.nutrition.org/article/S0002-9165(23) 66119-2/fulltext.
Feinman, RD Nutrition in Crisis (2019) Chelsea Green Publishing. White River Junction, Vermont.
Mukherjee, Siddhartha The Laws of Medicine, Field Notes From an Uncertain Science. (2015) Simon & Schuster, ISBN 978-1-4767-8484-7 and ISBN 978-1-4767-8485-4 (ebook)
I had to stop and comment after the first paragraph. It should be printed and framed, and copies should hang on the walls of research institutions. Biological plausibility comes first. It has to be strong. You don't clinical trial in order to generate plausibility hypotheses. Most of all, you have to weigh the marginal effect of your "intervention" in the clinical scenario BEFORE you clinical trial. Please go to https://thethoughtfulintensivist.substack.com/
If I may ask a clarifying question: how was the red meat and diabetes study not pursuing a meaningful question? Or more generally, are you saying that before you use a statistical test you should already have a more intuitive way of determining that what you measure is not just noise?