Easy Problems, Hard Problems and Healthcare.gov

by

a screenshot from healthcare.gov

This Reuters article about Healthcare.gov has been getting some attention today. Alas, it’s not very good, focusing on client-side optimizations that are probably unrelated to the site’s early woes. Healthcare.gov’s problems are almost certainly occurring at a deeper level of the system, making it very difficult, if not impossible, for an outsider to gauge their seriousness.

To explain, let’s do one of those analogy things. Say that Kathleen is planning a birthday party for herself.

There are a bunch of tasks associated with the party that need to be done. For instance, guests have to be told where and when the party is and whether to bring gifts. This is a pretty easy task to manage: Kathleen prints up a bunch of flyers with the relevant information and asks some friends to hand them out.

This task can be done well or poorly, of course. Maybe she foolishly printed the different bits of information on different pieces of paper instead of a single flyer. Maybe she only asked one friend to hand them out and he’s a flake. These could become real issues if more people want to come to the party than Kathleen anticipated.

But these are easy problems to solve. Printing more flyers is simple. You can hire people to hand out the flyers if your friends aren’t reliable. There’s no real need for these distributors to coordinate or have knowledge of one another.

Some tasks require Kathleen herself, though. Receiving happy birthday wishes, for instance: there could be a huge number of guests, but there’s only one Kathleen. If she doesn’t plan for this properly, it’s conceivable that she’ll be busy receiving congratulations nonstop, unable to enjoy the party. Perhaps her guests will have to waste their time queued up waiting for her, too.

Many web application optimization problems can be categorized similarly. Some processes can be run in parallel, without central coordination. These processes might be implemented wastefully or unprofessionally, but you can usually fix them by throwing more resources at the problem. Cloud hosting architectures often make this trivially easy.

Other problems require coordination or centralization. This produces bottlenecks, and they can be quite severe. You can respond by rewriting, redesigning, tuning or, yes, throwing more resources at the affected systems. Sometimes this works and sometimes it doesn’t; it requires time and expertise, though, not just a credit card and an Amazon account. Sometimes your only real option is to design around these problems: queue the expensive tasks for later execution, or accept a loss of synchronization across your system. You can decide to make do with a birthday card instead of a birthday hug, in other words.

The Reuters article spends a lot of time on how many static resources are loaded into the browser by Healthcare.gov. Sometimes there are good reasons for loading a bunch of that kind of stuff and sometimes there aren’t. The fact that there’s usually room for improvement–as any web optimization tool will tell you–means that it’s pretty simple to construct a critique of virtually any site. That doesn’t guarantee that the critique will have anything to do with the site’s most serious problems, however.

Besides, this class of problems usually exhibits itself through symptoms that are different that the ones afflicting Healthcare.gov. And many of the Healthcare.gov assets in question are served through the Akamai Content Delivery Network, which is probably the best-known brand name when it comes to making sure your servers can handle gigantic quantities of static asset requests.

It is much more likely that Healthcare.gov’s problems are due to more expensive operations related to the insurance application process itself. Checking users’ eligibility and filing their applications requires integration with a separate and more complex set of systems–ones that have little to do with your web browser. Fixing those sorts of bottlenecks can be easy or difficult; the boring truth is that it’s hard to say definitively from outside the system. Much harder than carping about uncompressed Javascript, at any rate.

Parts of Healthcare.gov are down right now, presumably so that technical fixes can be made. Hopefully they improve the system’s throughput. Traffic is likely to even out after the initial crush, which should also help. Before long, I suspect that the site will work just fine.

It’s unfortunate that Healthcare.gov hasn’t made a great first impression. But it still has time to get things right. Once it does, there’ll be lessons to be learned. But they’re probably not going to be ones you can generate automatically from a browser plugin.