Car inspections and repairs take a small fraction of our total spending on cars, gas, roads, and parking. But imagine that we were so terrified of accidents due to faulty cars that we spent most of our automotive budget having our cars inspected and adjusted every week by Ph.D. car experts. Obsessed by the fear of not finding a defect that might cause an accident, imagine we made sure inspections were heavily regulated and subsidized by government. To feed this obsession, imagine we skimped on spending to make safer roads, cars, and driving patterns, and our constant disassembling and reassembling of cars introduced nearly as many defects as it eliminated.
This is something like our relation to medicine today. Our public today is like a king of old whose military advisors spent most of their time and budget reading omens and making sacrifices, to gain the gods’ favor, instead of hiring soldiers and talking battle strategy. These advisors knew omens and sacrifices mattered little, but they saw the king was comforted, and feared losing favor by talking of battle strategy. A truly loyal advisor would have told the king what he did not want to hear: “You are obsessing about the wrong thing.”
King Solomon famously threatened to cut a disputed baby in half, to expose the fake mother who would permit such a thing. The debate over medicine today is like that baby, but with disputants who won’t fall for Solomon’s trick. The left says markets won’t ensure everyone gets enough of the precious medical baby. The right says governments produce a much inferior baby. I say: cut the baby in half, dollar-wise, and throw half away! Our “precious” medical baby is in fact a vast monster filling our great temple, whose feeding starves our people and future. Half a monster is plenty.
Am I being too allegorical? Then let me speak plainly: our main problem in health policy is a huge overemphasis on medicine. The U.S. spends one sixth of national income on medicine, more than on all manufacturing. But health policy experts know that we see at best only weak aggregate relations between health and medicine, in contrast to apparently strong aggregate relations between health and many other factors, such as exercise, diet, sleep, smoking, pollution, climate, and social status. Cutting half of medical spending would seem to cost little in health, and yet would free up vast resources for other health and utility gains. To their shame, health experts have not said this loudly and clearly enough.
Non-health-policy experts are probably shocked to hear my claims. Most students in my eight years of teaching health economics have simply not believed me, even after a semester of reviewing the evidence. Heroic medicine is just too central to our culture, a culture where economists like me have far less authority than doctors. Worse, even most standard textbooks in health economics fail to make the point clearly.
Children are told that medicine is the reason we live longer than our ancestors, and our media tell us constantly of promising medical advances. Millions of doctors are well aware that most medical journal articles describe gains from particular medical treatments, and these doctors usually give patients optimistic views about particular treatments.
In contrast, few doctors know that historians think medicine has played at best a minor role in our increased lifespans over the centuries. And only a few health policy experts now know about the dozens of studies of the aggregate health effects of medicine. Worse, these studies can seem muddled, with some showing positive, some showing negative, and some showing neutral effects of medicine on health.
So I want to say loudly and clearly what has yet to be said loudly and clearly enough: In the aggregate, variations in medical spending usually show no statistically significant medical effect on health. (At least they do not in studies with enough good controls.) It has long been nearly a consensus among those who have reviewed the relevant studies that differences in aggregate medical spending show little relation to differences in health, compared to other factors like exercise or diet. I not only want to make this point clearly; I want to dare other health policy experts to either publicly agree or disagree with this claim and its apparent policy implications.
By “variations” I mean the large changes in medical spending often induced by observable disturbances, such as changing culture or prices, and by “aggregate” I mean studies of the health effect on an entire population of disturbances that affect a broad range of medical treatments. In contrast, the vast majority of medical studies look at the effects of particular categories of treatments on particular classes of patients.
Note that a muddled appearance of differing studies showing differing effects is to be expected. After all, even if medicine has little effect, random statistical error and biases toward presenting and publishing expected results will ensure that many published studies suggest positive medical benefits.
Let me illustrate. (A general review is found in Fuchs, Health Affairs, 2004 . A contrarian review is Hadley, Medical Care Research and Review, 2003.) The first study known to me was by Auster, Leveson, & Sarachek, Journal of Human Resources, in 1969 . It found that variations across the 50 U.S. states of 1960 age-sex-adjusted death rates were significantly predicted by variations in income, education, fractions of white collar and female workers, and the existence of a local medical school, but not by variations in medical spending, urbanization, and alcohol and cigarette consumption.
Later studies using robust controls to compare similar regions tend to give similar results. For example, a Byrne, Pietz, Woodard, & Petersen Health Economics 2007 study found no significant mortality effects of funding variations across 22 U.S. Veterans Affairs regions over six years. And a Fisher, et al. Annals of Internal Medicine 2003 study of 18,000 patients confirmed a Fisher et. al. Health Services Research 2000 study, and a related Skinner and Wennberg 1998 study, which together used the largest dataset I know of: five million Medicare patients in 1989 and 1990 across 3,400 U.S. hospital regions.
Regions that paid more to have patients stay in intensive care rooms for one more day during their last six months of life were estimated, at a 2% significance level, to make patients live roughly forty fewer days, even after controlling for: individual age, gender, and race; zipcode urbanity, education, poverty, income, disability, and marital and employment status; and hospital-area illness rates. This same study, using the same controls, also estimated that a region spending $1,000 more overall in the last six months of life gave local patients somewhere between a gain of five days of life and a loss of twenty days of life (95% confidence interval). (I’m using a fifty days lost per 1% added mortality rule of thumb.)
The tiny effect of medicine found in large studies is in striking contrast to the large apparent effects we find even in small studies of other influences. For example, a 1998 Lantz, et al. study in the Journal of the American Medical Association of 3,600 adults over 7.5 years found large and significant lifespan effects: a three year loss for smoking, a six year gain for rural living, a ten year loss for being underweight, and about fifteen year losses each for low income and low physical activity (in addition to the usual effects of age and gender).
Note that someone willing to pay $1,000 to gain 2.5 days of life should be willing to spend about $1,000,000 to gain six years by living rurally, and $2,000,000 to gain fifteen years via high exercise. These figures seem to me to overestimate the observed eagerness to live rurally or to exercise.
Of course all of these studies look at correlation, not causation, between health and medicine. So they all leave open the possibility that someday studies with better controls will show stronger effects. For this reason, discussion of the health effects of medical spending variations usually turns eventually to our clearest evidence on the subject: the RAND health insurance experiment.
From 1974 to 1982 this experiment spent about $50 million to randomly assign over two thousand non-elderly families in six U.S. cities to three to five years of a specific medical price, ranging from free to full price, provided by the same set of doctors. (See the 1983 Brook et. al. New England Journal of Medicine article, and the 1996 Newhouse et. al. book Free for All?) The experiment’s random assignments allowed it to clearly determine causality. Being assigned a low price for medicine caused patients to consume about 30% (or $300) more in per-person annual medical spending, though less for hospital spending and more for dental and “well care.”
The RAND experiment was not quite large enough to see mortality effects directly, and so the plan was to track four general measures of health, combined into a total “general health index,” and also 23 physiological health measures. Their main result: “For the five general health measures, we could detect no significant positive effect of free care for persons who differed by income … and by initial health status.” This summary isn’t fully forthcoming, however. At a 7% significance level they found that poor people in the top 80% of initial health ended up with a 3% lower general health index under free medicine than under full-priced medicine.
Among their many specific findings, the most significant was at the 0.1% level: people with free eyeglasses could see better. But it has long been obvious that eyeglasses help people see, and eyeglasses are basically physics, not “medicine.” The second most significant specific finding was that at a 1% significance level those with free medicine had about one and a half fewer days per year when they could do their normal activities. This effect was also to be expected, due to time needed for doctor appointments.
The third most significant specific finding, and strongest unexpected one, was that people with free medicine had lower blood pressure, at a 3% significance level. But a study that looks at thirty measures in total should, just by chance, find one unexpected finding that seems significant at the 3% level. So taking data mining (i.e., searching for results) into account, this blood pressure result should be set aside.
Many summaries of the RAND experiment, however, trumpet a “risk-of-dying” index result that ignores data mining effects. After seeing the experimental results, researchers choose an index based on smoking, cholesterol, and blood pressure. While overall those with free medicine did not have a significantly lower risk of dying, researchers found an index threshold such that those initially above this threshold later had a 20% lower risk under free medicine.
Some say this result is significant at the 3% level, but this calculation completely ignores data mining effects. And even if this overall risk-index effect were real, it would represent about fifty days of life gained for the average patient, paid for by roughly 30% more medical spending over a lifetime.
The RAND experiment most clearly addressed the health value of the extra medicine consumed by those with free medicine. But it gave hints about the health value of the common medicine all patients consumed: there were no significant differences in either severity of diagnosis or appropriateness of treatment between common and extra medicine. If common medicine is healthier than extra medicine, it is not because common medicine deals with more serious cases or uses more appropriate treatments.
Let us now summarize and interpret these results. Medicine is composed of a great many specific activities. Presumably some of these activities help patients, some hurt patients, and some are neutral. (Don’t believe medicine can hurt? Consider the high rate of medical errors, and see the Fisher & Welch Journal of American Medical Association 1999 theory article.)
We have observed many kinds of disturbances which change the distribution of medical activities, such as variations in local medical culture, local wealth levels, medical prices, and so on. Taken at face value, our inability to see much health impact from the disturbances we have observed suggests that such disturbances increase or decrease helpful and harmful medicine in roughly equal amounts.
This in turn suggests that if we were to reduce medical spending via a disturbance similar in character to the types of disturbances we have observed, such a spending reduction would also reduce helpful and harmful medicine in roughly equal amounts. The claim is not that there would be no harmful health effects of such a policy, but rather that harmful effects would be roughly balanced by helpful effects. And the claim is not that harmful and helpful effects would exactly balance, but rather that any net health harm will be small compared to the health gains possible by spending the savings on other health influences, and to the utility gains possible from spending the savings in other ways.
How much could we cut? For the U.S. it seems reasonable to project the 30% cut in the RAND results to a 50% cut, since the U.S. spends so much more than other nations without obvious extra health gains. I thus claim: we could cut U.S. medical spending in half without substantial net health costs. This would give us the equivalent of an 8% pay raise.
How should we cut medical spending? There are many possibilities, and I may prefer some possibilities to others. But I do not want such preferences to distract from the main point: most any way to implement such a cut would likely give big gains. The obvious first place to cut would be our government and corporate subsidies for medicine, including direct payments, tax exemptions, and regulatory requirements. Socially, we should also try to give medicine far less prestige than we now do. After these one could consider taxing medicine, limiting it by law, or nationalizing the industry and using agency budgets to limit spending.
Yes, I know, these are not politically realistic proposals. But at least health policy experts should publicly contradict those who overemphasize medicine, including politicians whose “health policy” is mainly medical policy, and newspapers whose “health” news is mainly medical news. Furthermore, health policy experts should not themselves mainly research and teach medicine.
If health policy experts hesitate on my proposals due to doubts about how much we can rely on the RAND experiment and correlation studies, then they should at the very least immediately and fully support channeling available funding into repeating the RAND experiment today, ideally with more patients treated longer. Treating ten thousand patients should cost only one part in forty thousand of annual U.S. medical spending, an incredible bargain if it has any substantial chance of overcoming resistance to cutting medical spending.
Do you have little voice in health policy or research? Then at least you can change your own medical behavior: if you would not pay for medicine out of your own pocket, then don’t bother to go when others offer to pay; the RAND experiment strongly suggests that on average such medicine is as likely to hurt as to help.
Let me now consider some objections.
What about studies suggesting larger benefits in particular areas, e.g., immunization, infant care, and emergency care?
Yes, there are categories of medicine where larger benefits seem plausible, and where empirical studies support such claims. (See, for example, Filmer & Pritchett Social Science and Medicine 1999 and Joseph Doyle 2007.) And I have no problem supporting policies to increase spending for such medicine relative to other kinds of medicine. But if your argument is that we must spend lots on medicine overall in order to gain benefits from these particular categories, then I think you have missed the whole point here. We already knew some medicine is more helpful and some more harmful than average. It doesn’t really help to know which part is which unless we are willing to somehow act on that information, and treat better parts differently from worse parts.
What about health and innovation externalities?
Your health may give positive benefits to others, but most medicine on the margin seems to have little to do with health. We can (and do) subsidize health directly, such as by paying people for each extra year of life they live. Medical innovation may well increase the possibly high value of the first half of medical spending, and I do not suggest cutting research budgets. I am, however, skeptical about the innovation benefits of the second half of medical spending practice; how much can a practice environment that tolerates as much harmful as helpful medicine really encourage practice to become more helpful?
What if everything has changed recently?
Overreliance on medicine seems to be quite ancient and widespread; historians suggest that until recently our ancestors would have been better off avoiding doctors. Yes, perhaps most of the apparently useful treatments we know were developed in recent years. But we should expect this from the fact that medical spending as a fraction of GDP has been doubling roughly every thirty years, mostly via spending on new treatments. At all times during such an expansion most treatments should be new. But if there are any doubts, please, let us redo the RAND experiment.
How could we be this wrong about medicine?
If you wonder how the usual medical literature could give such a misleading impression of aggregate medical effects on health, I will point to funding and publication selection biases, statistical tests ignoring data mining, leaky placebo effects, differences between lab and field environments, and the fact that most treatments today have no studies. If you wonder how medicine could suffer so much more from such problems than other subjects, I’ll point you to my forthcoming Medical Hypotheses article, wherein I suggest humans long ago evolved a tendency to use medicine to “show that we care,” rather than just to get healthy.
Briefly, the idea is that our ancestors showed loyalty by taking care of sick allies, and that, for such signals, how much one spends matters more than how effective is the care, and commonly-observed clues of quality matter more than private clues. So today we spend enough to distinguish ourselves from people who don’t care as much as we do, and we pay little attention to private clues about the health effectiveness of medicine. Since loyalty signals can be privately beneficial and yet socially wasteful, my proposal to cut medical spending in half could still be a good idea.