The Sheer Size Of Healthcare.gov, Ctd

A few experts weigh in regarding the amount of code that’s been written for the ACA site:

Over many years of research, programmer productivity in lines of code has been observed to range from 3,200 lines per year for small projects, down to just 1,600 lines per year for very large projects. Using the typical numbers for large projects, 500 million lines of code would require 312,500 man-years of programming effort. If true, that would involve the participation of just about all programmers in the US for a full year, and at an average $100K in salary and benefits, an investment of an amount approaching the entire defense budget!

Besides – with regard to your notion of “sheer scale” – you should understand the ACA pretty well right now. What exactly would account for that scale in the exchange website? It should be intuitively clear that Healthcare.gov is far less complex than the entirety of the code and utilities associated with Windows Server, for example, which public sources say involves about 50 million lines of code.

The story is different but in some ways even more embarrassing; the Healthcare.gov web site is a project of moderate scale and complexity. If it really were such a monster, the failure could be excused – but given the modest scale of the actual project, screwing it up so badly is inexcusable!

Another:

Just about every developer who reads you (and I’m guessing it’s a lot) is hitting his head against his keyboard right now. That “500 million” number is pure bullshit from someone who doesn’t understand our craft.

First off, judging any piece of software by lines of code written is scoffed at in our profession for a many reasons, not the least of which being it’s impossible to agree on what really counts as “a line of code”. But let’s play along: The old rule of thumb states that the average programmer really only contributes about 10 lines per day in the long run (as programming is so much more than typing code). But it’s an old rule of thumb, and our tools have gotten much better, so I’d argue many programmers these days can bang out several dozens line of code. I’m willing to be more generous still and allow the Healthcare.gov programmers some 100 lines of code per day per developer.

I have no idea how many programmers were employed at any one time, but I’d eat my hat if even approaches 100 developers simultaneously hammering out code (a few dozen is more likely). Still, let’s be super crazy and say it’s 1000 programmers each writing 100 lines of code each and every day. It would take this impossible egghead army 5,000 days to author one-half-billion lines of code. That’s nearly 14 years, weekends included, of the largest software team this world has ever seen writing code much faster than is generally accepted as possible.

How many lines of code is in Healthcare.gov? Who knows. But to some dipshit who couldn’t write “Hello World” in Java, I suppose 500,000,000 is a big enough number to pull from your ass. That’s the only way that number came about. For my money, the real story here is just how shrill and ridiculous the debate over Obamacare has become.

Another:

I am a software engineer who has worked on both sides of the commercial/government contractor fence. I’d like to stress that I speak for myself, not for my employer.

Source Lines of Code (SLOCs) is a controversial and convoluted metric for evaluating software complexity. The Wikipedia entry gives a good overview of the difficulty of getting non-coders on the same page over the size and scope of a project. Given the stated number is five hundred million lines, I find it unlikely that a qualified engineer did a manual analysis. The quickest and easiest method, not to mention the one that would produce the most jaw-dropping numbers, is to run all source files through a line-counting program (wc -l).

In addition, government specifications and rules inflate the count artificially and produce hurdles that the other commercial projects Healthcare.gov is being compared to would not need to face.

The rules can place shackles on what third party tools can be used for the purposes of development and efficiency. Freely available code libraries such as numpy, matplotlib, or jquery can allow those familiar with them to perform complicated tasks and analysis in a handful of lines. However, unless those tools and libraries have gone through an approval process, the developers working on these projects must re-implement problems that have been solved countless times over, and therefore run into the same pitfalls that existing tools ran into and fixed decades ago.

Also, the strict coding guidelines adopted by some projects make individual files seem longer than they actually are. This can be caused by variable or function naming conventions (citizen.numChildren() versuscurrent_user.get_number_of_children()) and allowed/forbidden acronym lists (IRS.query() versusInternalRevenueService.query()), paired with outdated rules on number of characters allowed per line (Eighty? One hundred and sixty?). Required copyright, code ownership, authorship, clearance notifications, and version change lists prepended to each file can easily add over a hundred lines of non-functional text to files that a software engineer would simply collapse (ignore), but the uninitiated or politically motivated would count towards the total scope of the project.

Developers tend to be split on the necessity of these seemingly superficial limitations. No one likes having to waste time clearing syntax checker warnings about a line of code being 81 characters long, or needing extra white space between two function parameters. However, being handed an unreadable mess with variables named as though during a national vowel shortage is devastating to developer morale. With government projects passing between multiple subcontractors and development groups, code readability is important.

Software engineering has few metrics that would be both meaningful and fit in an easily understood soundbyte. What is coming out is not helpful. What would be helpful? Release the requirements and the source code. Sanitized of sensitive information and personally identifiable information, an army of interested software engineers from both sides would be able to evaluate the project, provide real analysis, and start working on ways to make sure taxpayer funds are more effectively used.