Local Navigation

Tending the Software Garden

As a recent enterprise software project reached maturity I was asked the age-old question:
"The software is complete and running well in production. Why do we still need developers?"...

I gave the usual answers one gives when the voice of direct experience is unequivocal; comments along the lines of "for maintenance", or "to fix bugs" or "to keep it running" -- a couple of decades in the industry has long since convinced me that projects that lose their developers usually die an unpleasant death within the year.
But my business questioners were very persistent. Why are we still paying these development people? After all, the business folks pointed out repeatedly, the software was scalable, performing well, and feature complete – new work was, at least for the near term, not on the table. Why keep expensive developers around? Let them go, and bring in new ones if and when new features are needed or problems manifest.
Eventually I gave up the ghost and admitted this was a “good question”. I’ve been working on it seriously for about three months. And like most good questions I’m finding the answer in a careful examination of the unstated "story" -- the framing and assumptions about reality that lie behind the question and determine its form and to some degree its answer.
Once you buy into a "story" about a problem domain, with or without conscious thought, conclusions tend to tumble out like puppies on a warm day. And if the story is flawed, incomplete or a poor fit for the facts, seemingly natural conclusions will be very far from reality.
In the case of software and developers the story that I reverse engineered from the conversations goes like this:
"Once upon a time people made hardware. They noticed that the hardware was, well, hard. If built well it lasted a long time, ran without difficulty and required specific and easily predictable operations to keep it working -- maintenance. Tasks like oiling the joints, replacing wear points on a predictable schedule, etc. The maintenance operations were fundamentally different from hardware design and construction, and were done by different people with different and less expensive skill sets.
Software's like hardware. You build it, and if you did a good job it should run forever, or at least for a very long time. Absent actual flaws software will require minor periodic operations analogous to oiling or replacing wear points, such as clearing log space and defragmentation, but that's about all that is needed. The people that do these operations need few or none of the skills of the people who designed and built the software"
I'm not saying that the "software as hardware" story is entirely false; there are certainly aspects of software that approximate hardware behavior, and the older the software and the lower the level of abstraction that it is written at, the more the hardware story fits. But for modern application software the story chafes in a number of respects, and the automatic conclusions contained within it, like "Why do we still need developers" are seriously misleading.
What's Wrong with the Hardware Story?
Conceptual categories shape thought. To the extent that we think about software as similar to hardware we’ll expect it to exhibit similar characteristics. But hardware and modern application software differ in some very interesting respects.
Software Cannot be Adequately Described in Words
In order to maintain either hardware or software, project members have to understand both correct and anomalous behavior and operation. Documentation fills this need, describing how something is made, how it can be fixed when broken, and how it operates when working well. But to thoroughly describe and document you need to use a descriptive language that is finer-grained and more complex than the thing you are describing. Otherwise the thing you are describing won't "fit" – the complexity of the documented process will overwhelm the language it is documented in, and important detail and nuance will be lost.
Hardware can usually be fully documented in a human language, because human language is almost always more finely nuanced and complex than the hardware it describes.
Moderately complex application software overwhelms the human languages used to describe it. Semi-human languages like the UML attempt to bridge the gap by capturing software structure and dynamics in a form that both humans and machines can read, but most people who have actually worked with UML will tell you that it is of significantly coarser grain than the software it describes, generates or reverse engineers. Much detail is lost for the sake of high level understanding and human readability.
At best software documentation takes on the character of the old Zen imagery ... a finger pointing at the moon. Good documentation can get a competent developer started towards understanding the software, but full understanding of software ultimately involves observing its behavior and examining its code. This means the person doing the examining is a developer.
As an aside, I think this containment problem is one reason it is so hard to fund and actually get people to write software documentation. It may be that at some unconscious level the people we ask to write it realize that the expectation of completeness is hopeless.
Of course, you can always make the case that there’s no need to understand fully operational software to maintain it. This assumes near zero defects and a more stable operational environment than reality provides. It also assumes that the understanding of the software can be recovered in a cost-effective way at a future time. I doubt that.
It takes a Team of Developers to Understand a Program
I’m deliberately paralleling the well-known African proverb about requiring a village to raise a child to make a point; certain complex constructs are not only too big for a book of documentation; they are too big for an individual brain.
In the high-functioning software projects I’ve seen the actual knowledge of the software is not in one person or another’s head; it’s somewhere in the discussions and conversations between the people on the team, each of which has an important piece of the puzzle. I’m not just saying that each person in a good team specializes on one module or service of a software product; I’m saying that people tend to approach the same service from different points of view and aspects, each of which provides a critical point of understanding.
One person may be very pattern-oriented and carry the knowledge of how design patterns are woven into various aspects of the software, and where they were adapted and why. Another may think in terms of state and transition, and a third in terms of events. I suspect there are more ways to approach and understand software than there are words to describe the ways, although a study of this area would be fascinating if our society ever becomes interested in basic research again.
In the case of the project that inspired this essay, I was the architect, and did nearly all of the basic design of what was an only slightly complex three-tier system. I happen to be blessed with a very good memory for technical detail, but even when the project team was in full swing I was forever “doing archaeology on my own thinking”, going back to a time past and reconstructing what I had in mind for a particular decision. Sometimes documentation helped, but mostly the memory or memory prompts came from other members of the team that were either present or had conferred and discussed with people who had been present at the time the construction was done.
In short, the project team was a village, moving forward in time, with traditions, rituals of memory, cultural transmission and all the hallmarks of a mini-society.
I would maintain that a functioning society of developers is the context in which software is molded and understood; that when the society is strong the software is alive and healthy, and that conversely when the society is disrupted, the context dies and the software withers without a community of minds to support it.
There’s an assumption that knowledge once held is retained, but I find the shelf life of software understanding to be unusually short. I call as witness anyone who has taken a technical class, returned to work without using the knowledge immediately, then tried to access it later. Even when members of the development team are kept in the organization their utility as memTending the Software Gardenbers of the society that held the software degrades very rapidly; soon their utility as reconstructors of the society begins to approach the utility of talented but uninitiated strangers to the project. Individuals in a project are not at all replaceable units, but they approach that status as they move off a project and onto other things. Myself included.
As a closing aside, my understanding of a society of developers as the home or substrate of a software project is one reason I find open source software products to be generally speaking superior to commercial products. A commercial team is assembled deliberately and can be very good, but in the current economic frame of belief it will rarely be better than it needs to be to do the bare basic job due to the insistence on maximal cash returns in a minimal time frame. Open source projects that survive tend to attract the best minds who work out of love and interest, and typically do far more than is needed, making serendipitous discoveries along the way. It’s a pity there’s no living in open source, or I’d be doing it fulltime myself. The kids need shoes.

Entropy and Catastrophe Reverse Themselves in Software

If we take entropy to refer to the wear-and-tear aspects of maintenance, and catastrophe to refer to the dramatic breakdowns that come from a sudden change, we see very different patterns with hardware and software.

The article's been interrupted by a new job with all the usual expected and unexpected time demands. Here's the gist of the next section:

  • In hardware the typical pattern is entropy followed by catastrophe.
    • For example a part wears slowly, or more slowly if maintained well,
    • But eventually will need to be replaced or will fail catastrophically when maintenance efforst are inadequate
  • In software the pattern reverses because of the exponential nature of paths thru a software program. Catastrophic failures are likely to precede and happen outside of maintenance efforts
    • Due to unanticipated and ... I would maintain, unanticipatable uses of the software
    • In short, even if you can predict the things your users will do that will break your software today ... which I think you cannot, you certainly cannot predict their well-intentioned but catastrophic actions tomorrow

In practice this means that software is prone to unexpected failure when subjected to a use case environment (encompassing the software's users and their needs of whatever nature) that is more various and is growing more rapidly than the ability of the software to contain its behaviour over the set of use cases.

Of course if we could do perfectly deterministic testing we could simply rule out all unanticipated use cases and give a graceful error. I've known customers that expect this -- in fact, it's a common expectation. To the best of my knowledge this kind of testing does not exist outside the imaginations of test tool marketeers and space shuttle software designers, and even then ...

Well, back to this another day perhaps. For now I return to the construction of good but imperfect software for good but imperfect humans in a climate that encourages the belief that money can purchase perfection.

Comments

A Morning with Really Good Documentation

I spent this morning with a much-loved overheating Honda. Spent a fair part of my well-spent youth as a mechanic, and while cars have changed a lot since then, I'm not beneath saving a $100/hour diagnostic.

45 minutes to work thru an incredibly good technical manual in PDF, 5 minutes to diagnose, 5 to fix. Modern cars, unlike the VW's I used to work with, have two fuse blocks, one by the battery in the engine compartment, to serve critical engine components. It was a fuse. Hah.

I could have known that ... but I didn't. The community of modern mechanics, of which I'm no longer a part, would have known it from the start.

The manual, describing a not-quite modern car in exquisite detail, was about 1400 pages long. It made most software documentation I've seen look pretty deficient. And it assumes a common understanding of what a car is, how it works, and all the similarities that cars have with each other that software does not.

Wonder how long the hardware documentation for the space shuttle is? Or for that matter the software documentation.

So maybe ... just maybe ... its possible to document software that well. Say software for a lethal weapon, or for a space ship. Something with a budget beyond the comprehension of ordinary mortals.

But even if documentation that works is cost effective for, maybe, very expensive one-off software (lethal things seem to still get the big investments on Earth) or widely replicated software, it's not cost effective for ordinary software. It's probably not even cost effective for widely used middleware like, say, WebSphere, whose documentation, although voluminous, is never even close to enough.