Super Freakonomics - Part 8
Library

Part 8

By this time, Feied and Smith had between them treated more than a hundred thousand patients in various emergency rooms. They found one commodity was always in short supply: information. A patient would come in-conscious or unconscious, cooperative or not, sober or high, with a limitless array of potential problems-and the doctor had to decide quickly how to treat him. But there were usually more questions than answers: Was the patient on medication? What was his medical history? Did a low blood count mean acute internal bleeding or just chronic anemia? And where was the CT scan that was supposedly done two hours ago?

"For years, I treated patients with no more information than the patients could tell me," Feied says. "Any other information took too long, so you couldn't factor it in. We often knew what information we needed, and even knew where it was, but it just wasn't available in time. The critical piece of data might have been two hours away or two weeks away. In a busy emergency department, even two minutes away is too much. You can't do that when you have forty patients and half of them are trying to die."

The problem agitated Feied so badly that he turned himself into the world's first emergency-medicine informaticist. (He made up the phrase, based on the European term for computer science.) He believed that the best way to improve clinical care in the ER was to improve the flow of information.

Even before taking over at WHC, Feied and Smith hired a bunch of medical students to follow doctors and nurses around the ER and pepper them with questions. Much like Sudhir Venkatesh hired trackers to interview Chicago street prost.i.tutes, they wanted to gather reliable, real-time data that were otherwise hard to get. Here are some of the questions the students asked:

Since I last talked to you, what information did you need?

How long did it take to get it?

What was the source: Did you make a phone call? Use a reference book? Talk to a medical librarian?*

Did you get a satisfactory answer to your query?

Did you make a medical decision based on that answer?

How did that decision impact patient care?

What was the financial impact of that decision on the hospital?

The diagnosis was clear: the WHC emergency department had a severe case of "datapenia," or low data counts. (Feied invented this word as well, stealing the suffix from "leucopenia," or low white-blood-cell counts.) Doctors were spending about 60 percent of their time on "information management," and only 15 percent on direct patient care. This was a sickening ratio. "Emergency medicine is a specialty defined not by an organ of the body or by an age group but by time," says Mark Smith. "It's about what you do in the first sixty minutes."

Smith and Feied discovered more than three hundred data sources in the hospital that didn't talk to one another, including a mainframe system, handwritten notes, scanned images, lab results, streaming video from cardiac angiograms, and an infection-control tracking system that lived on one person's computer on an Excel spreadsheet. "And if she went on vacation, G.o.d help you if you're trying to track a TB outbreak," says Feied.

To give the ER doctors and nurses what they really needed, a computer system had to be built from the ground up. It had to be encyclopedic (one missing piece of key data would defeat the purpose); it had to be muscular (a single MRI, for instance, ate up a ma.s.sive amount of data capacity); and it had to be flexible (a system that couldn't incorporate any data from any department in any hospital in the past, present, or future was useless).

It also had to be really, really fast. Not only because slowness kills in an ER but because, as Feied had learned from the scientific literature, a person using a computer experiences "cognitive drift" if more than one second elapses between clicking the mouse and seeing new data on the screen. If ten seconds pa.s.s, the person's mind is somewhere else entirely. That's how medical errors are made.

To build this fast, flexible, muscular, encyclopedic system, Feied and Smith turned to their old crush: object-oriented programming. They set to work using a new architecture that they called "data-centric" and "data-atomic." Their system would deconstruct each piece of data from every department and store it in a way that allowed it to interact with any other single piece of data, or any other 1 billion pieces.

Alas, not everyone at WHC was enthusiastic. Inst.i.tutions are by nature large and inflexible beasts with fiefdoms that must be protected and rules that must not be broken. Some departments considered their data proprietary and wouldn't surrender it. The hospital's strict purchasing codes wouldn't let Feied and Smith buy the computer equipment they needed. One top administrator "hated us," Feied recalls, "and missed no opportunity to try to stonewall and prevent people from working with us. He used to go into the service-request system at night and delete our service requests."

It probably didn't help that Feied was such an odd duck-the contrarianism, the Segway, the original Miro prints on his office wall-or that, when challenged, he wouldn't rest until he found a way to charm or, if need be, threaten his way to victory. Even the name he gave his new computer system seemed grandiose: Azyxxi (uh-ZICK-see), which he told people came from the Phoenician for "one who is capable of seeing far"-but which really, he admits with a laugh, "we just made up."

In the end, Feied won-or, really, the data won. Azyxxi went live on a single desktop computer in the WHC emergency room. Feied put a sign on it: "Beta Test: Do Not Use." (No one ever said he wasn't clever.) Like so many Adams and Eves, doctors and nurses began to peck at the forbidden fruit and found it nothing short of miraculous. In a few seconds they could locate practically any information they needed. Within a week, the Azyxxi computer had a waiting line. And it wasn't just ER docs: they came from all over the hospital to drink up the data. At first glance, it seemed like the product of genius. But no, says Feied. It was "a triumph of doggedness."

Within a few years, the WHC emergency department went from worst to first in the Washington region. Even though Azyxxi quadrupled the amount of information that was actually being seen, doctors were spending 25 percent less time on "information management," and more than twice as much time directly treating patients. The old ER wait time averaged eight hours; now, 60 percent of patients were in and out in less than two hours. Patient outcomes were better and doctors were happier (and less error-p.r.o.ne). Annual patient volume doubled, from 40,000 to 80,000, with only a 30 percent increase in staffing. Efficiencies abounded, and this was good for the hospital's bottom line.

As Azyxxi's benefits became clear, many other hospitals came calling. So did, eventually, Microsoft, which bought it, Craig Feied and all. Microsoft renamed it Amalga and, within the first year, installed the system in fourteen major hospitals, including Johns Hopkins, New YorkPresbyterian, and the Mayo Clinic. Although it was developed in an ER, more than 90 percent of its use is currently in other hospital departments. As of this writing, Amalga covers roughly 10 million patients at 350 care sites; for those of you keeping score at home, that's more than 150 terabytes of data.

It would have been enough if Amalga merely improved patient outcomes and made doctors more efficient. But such a ma.s.sive acc.u.mulation of data creates other opportunities. It lets doctors seek out markers for diseases in patients who haven't been diagnosed. It makes billing more efficient. It makes the dream of electronic medical records a straightforward reality. And, because it collects data in real time from all over the country, the system can serve as a Distant Early Warning Line for disease outbreaks or even bioterrorism.

It also allows other, non-medical people-people like us, for instance-to repurpose its data to answer other kinds of questions, such as: who are the best and worst doctors in the ER?

For a variety of reasons, measuring doctor skill is a tricky affair.

The first is selection bias: patients aren't randomly a.s.signed to doctors. Two cardiologists will have two sets of clientele who may differ on many dimensions. The better doctor's patients may even have a higher death rate. Why? Perhaps the sicker patients seek out the best cardiologist, so even if he does a good job, his patients are more likely to die than the other doctor's.

It can therefore be misleading to measure doctor skill solely by looking at patient outcomes. That is generally what doctor "report cards" do and, though the idea has obvious appeal, it can produce some undesirable consequences. A doctor who knows he is being graded on patient outcomes may "cream-skim," turning down the high-risk patients who most need treatment so as to not tarnish his score. Indeed, studies have shown that hospital report cards have actually hurt patients precisely because of this kind of perverse physician incentive.

Measuring doctor skill is also tricky because the impact of a doctor's decisions may not be detectable until long after the patient is treated. When a doctor reads a mammogram, for instance, she can't be sure if there is breast cancer or not. She may find out weeks later, if a biopsy is ordered-or, if she missed a tumor that later kills the patient, she may never find out. Even when a doctor gets a diagnosis just right and forestalls a potentially serious problem, it's hard to make sure the patient follows directions. Did he take the prescribed medication? Did he change his diet and exercise program as directed? Did he stop scarfing down entire bags of pork rinds?

The data culled by Craig Feied's team from the WHC emergency room turn out to be just the thing to answer some questions about doctor skill. For starters, the data set is huge, recording some 620,000 visits by roughly 240,000 different patients over nearly eight years, and the more than 300 doctors who treated them.

It contains everything you might want to know about a given patient-anonymized, of course, for our a.n.a.lysis-from the moment she walks, rolls, or is carried through the ER door until the time she leaves the hospital, alive or otherwise. The data include demographic information; the patient's complaint upon entering the ER; how long it took to see a doctor; how the patient was diagnosed and treated; whether the patient was admitted to the hospital, and the length of stay; whether the patient was later readmitted; the total cost of the treatment; and if or when the patient died. (Even if the patient died two years later outside the hospital, the death would still be included in our a.n.a.lysis as a result of cross-linking the hospital data with the Social Security Death Index.)

The data also show which doctor treated which patients, and we know a good bit about each doctor as well, including age, gender, medical school attended, hospital where residency was served, and years of experience.

When most people think of ERs, they envision a steady stream of gunshot wounds and accident victims. In reality, dramatic incidents like these represent a tiny fraction of ER traffic and, because WHC has a separate Level I trauma center, such cases are especially rare in our ER data. That said, the main emergency room has an extraordinary array of patient complaints, from the life-threatening to the entirely imaginary.

On average, about 160 patients showed up each day. The busiest day is Monday, and weekend days are the slowest. (This is a good clue that many ailments aren't so serious that they can't wait until the weekend's activities are over.) The peak hour is 11:00 A.M., which is five times busier than the slowest hour, which is 5:00 A.M. Six of every ten patients are female; the average age is forty-seven.

The first thing a patient does upon arrival is tell the triage nurse what's wrong. Some complaints are common: "shortness of breath," "chest pains," "dehydration," "flulike symptoms." Others are far less so: "fish bone stuck in throat," "hit over the head with book," and a variety of bites, including a good number of dog bites (about 300) and insect or spider bites (200). Interestingly, there are more human bites (65) than rat bites and cat bites combined (30), including 1 instance of being "bitten by client at work." (Alas, the intake form didn't reveal the nature of this patient's job.)

The vast majority of patients who come to the ER leave alive. Only 1 of every 250 patients dies within a week; 1 percent die within a month, and about 5 percent die within a year. But knowing whether a condition is life-threatening or not isn't always obvious (especially to the patients themselves). Imagine you're an ER doc with eight patients in the waiting room, one each with one of the following eight common complaints. Four of these conditions have a relatively high death rate while the other four are low. Can you tell which ones are which?

Here's the answer, based on the likelihood of a patient dying within twelve months:*

Shortness of breath is by far the most common high-risk condition. (It is usually notated as "SOB," so if someday you see that abbreviation on your chart, don't think the doctor hates you.) To many patients, SOB might seem less scary than something like chest pains. But here's what the data say:

So a patient with chest pains is no more likely than the average ER patient to die within a year, whereas shortness of breath more than doubles the death risk. Similarly, roughly 1 in 10 patients who show up with a clot, a fever, or an infection will be dead within a year; but if a patient is dizzy, is numb, or has a psychiatric condition, the risk of dying is only one-third as high.

With all this in mind, let's get back to the question at hand: given all these data, how do we measure the efficacy of each doctor?

The most obvious course would be to simply look at the raw data for differences in patient outcomes across doctors. Indeed, this method would show radical differences among doctors. If these results were trustworthy, there would be few factors in your life as important as the ident.i.ty of the doctor who happens to draw your case when you show up at the ER.

But for the same reasons you shouldn't put much faith in doctor report cards, a comparison like this is highly deceptive. Two doctors in the same ER are likely to treat very different pools of patients. The average patient at noon, for instance, is about ten years older than one who comes in the middle of the night. Even two doctors working the same shift might see very different patients, based on their skills and interests. It is the triage nurse's job to match patients and doctors as best as possible. One doc may therefore get all the psychiatric cases on a shift, or all the elderly patients. Because an old person with shortness of breath is much more likely to die than a thirty-year-old with the same condition, we have to be careful not to penalize the doctor who happens to be good with old people.

What you'd really like to do is run a randomized, controlled trial so that when patients arrive they are randomly a.s.signed to a doctor, even if that doctor is overwhelmed with other patients or not well equipped to handle a particular ailment.

But we are dealing with one set of real, live human beings who are trying to keep another set of real, live human beings from dying, so this kind of experiment isn't going to happen, and for good reason.

Since we can't do a true randomization, and if simply looking at patient outcomes in the raw data will be misleading, what's the best way to measure doctor skill?

Thanks to the nature of the emergency room, there is another sort of de facto, accidental randomization that can lead us to the truth. The key is that patients generally have no idea which doctors will be working when they arrive at the ER. Therefore, the patients who show up between 2:00 and 3:00 P.M. on one Thursday in October are, on average, likely to be similar to the patients who show up the following Thursday, or the Thursday after that. But the doctors working on those three Thursdays will probably be different. So if the patients who came on the first Thursday have worse outcomes than the patients who came on the second or third Thursday, one likely explanation is that the doctors on that shift weren't as good. (In this ER, there were usually two or three doctors per shift.)

There could be other explanations, of course, like bad luck or bad weather or an E. coli outbreak. But if you look at a particular doctor's record across hundreds of shifts and see that the patients on those shifts have worse outcomes than is typical, you have a pretty strong indication that the doctor is at the root of the problem.

One last note on methodology: while we exploit information about which doctors are working on a shift, we don't factor in which doctor actually treats a particular patient. Why? Because we know that the triage nurse's job is to match patients with doctors, which makes the selection far from random. It might seem counterintuitive-wasteful, even-to ignore the specific doctor-patient match in our a.n.a.lysis. But in scenarios where selection is a problem, the only way to get a true answer is, paradoxically, to throw away what at first seems to be valuable information.

So, applying this approach to Craig Feied's ma.s.sively informative data set, what can we learn about doctor skill?

Or, put another way: if you land in an emergency room with a serious condition, how much does your survival depend on the particular doctor you draw?

The short answer is...not all that much. Most of what looks like doctor skill in the raw data is in fact the luck of the draw, the result of some doctors getting more patients with less-threatening ailments.