From a lecture by Professor John Ousterhout.
The greatest performance improvement of all is when a system goes from not-working to working
Programmers tend to worry too much and too soon about performance. Many college-level Computer Science classes focus on fancy algorithms to improve performance, but in real life performance rarely matters. Most real-world programs run plenty fast enough on today’s machines without any particular attention to performance. The real challenges are getting programs completed quickly, ensuring their quality, and managing the complexity of large applications. Thus the primary design criterion for software should be simplicity, not speed.
Occasionally there will be parts of a program where performance matters, but you probably won’t be able to predict where the performance issues will occur. If you try to optimize the performance of an application during the initial construction you will add complexity that will impact the timely delivery and quality of the application and probably won’t help performance at all; in fact, it could actually reduce the performance (“faster” algorithms often have larger constant factors, meaning they are slower at small scale and only become more efficient at large scale). I’ve found that in most situations the simplest code is also the fastest. So, don’t worry about performance until the application is running; if it isn’t fast enough, then go in and carefully measure to figure out where the performance bottlenecks are (they are likely to be in places you wouldn’t have guessed). Tune only the places where you have measured that there is an issue.
Use your intuition to ask questions, not to answer them
Intuition is a wonderful thing. Once you have acquired knowledge and experience in an area, you start getting gut-level feelings about the right way to handle certain situations or problems, and these intuitions can save large amounts of time and effort. However, it’s easy to become overconfident and assume that your intuition is infallible, and this can lead to mistakes. So, I try to treat intuition as a hypothesis to be verified, not an edict to be followed blindly.
For example, intuition works great when tracking down bugs; if I get a sense for where I think the problem is I can quickly go to the code and verify whether this really is the problem. For more abstract tasks such as design I find that intuition can also be valuable (I get a vague sense that a particular approach is good or bad), but the intuition needs to be followed up with a lot of additional analysis to expose all the underlying factors and verify whether the intuition was correct. The intuition helps me to focus my analysis, but it doesn’t eliminate the need for analysis.
One area where people frequently misuse their intuition is performance analysis. Developers often jump to conclusions about the source of a performance problem and run off to make changes without making measurements to be sure that the intuition is correct (“Of course it’s the xyz that is slow”). More often than not they are wrong, and the change ends up making the system more complicated without fixing the problem.
Another reason for constantly challenging and validating your intuitions is that over time this will sharpen your intuitions so that they work even better for you. Ironically, people who are most dogmatic about their intuitions often seem to have least well-developed intuitions. If they would challenge their intuitions more, they would find that their intuitions become more accurate.
Facts precede concepts
A fact is a piece of information that can be observed or measured; a concept is a general rule that can be used to predict many facts or a solution to many problems. Concepts are powerful and valuable, and acquiring them is the goal of most learning processes. However, before you can appreciate or develop a concept you need to observe a large number of facts related to the concept. This has implications both for teaching and for working in unfamiliar areas.
In teaching it’s crucial to give lots of examples when introducing a new concept; otherwise the concept won’t make sense to the students. Edward Tufte describes this process as “general-specific-general“: start by explaining the concept, then give several specific examples to show where the concept does and does not apply, then reiterate the concept by showing how all the examples are related.
I also apply this principle when I’m working in a new area and trying to derive the underlying concepts for that area. Initially my goal is just to get experience (facts). Once I have a collection of facts to work from, then I start looking for patterns or themes; eventually these lead to concepts. For example, a few years ago I started working on my first large Web application. My goal was to develop a library of reusable classes on which to base the application, but being new to Web development I had no idea what those classes should be. So, I built the first simple version of the application without any shared code, creating each page separately. Once I had developed a dozen pages I was able to identify areas of functionality that were repeated over and over in different pages, and from this I was able to develop a set of classes that implemented the the shared functionality. These classes represented the key concepts of that particular application.
If you don’t know what the problem was, you haven’t fixed it
Here’s a scenario that I have seen over and over:
- A developer is tracking down a difficult problem, often one that is not completely reproducible.
- In a status meeting the developer announces that the problem has been fixed.
- I ask “what was the cause of the problem?”.
- The developer responds “I’m not really sure what the problem was, but I changed xyz and the problem went away.”
Nine times out of ten this approach doesn’t really fix the problem; it just submerges it (for example, the system timing might have changed so that the problem doesn’t happen as frequently). In a few weeks or months the problem will reappear. Don’t ever assume that a problem has been fixed until you can identify the exact lines of code that caused it and convince yourself that the particular code really explains the behavior you have seen. Ideally you should create a test case that reliably reproduces the problem, make your fix, and then use that test case to verify that the problem is gone.
If you do end up in a situation where you make a change and the problem mysteriously goes away, don’t stop there. Undo the change and see if the problem recurs. If it doesn’t, then the change is probably unrelated to the problem. If undoing the change causes the problem to recur, then figure out why. For example, try reducing the scope of the change to find the smallest possible modification that causes the problem to come and go. If this doesn’t identify the source of the problem, add additional tracing to the system and compare the “before” and “after” traces to see how the change affected the behavior of the system. In my experience, once I have a code change that makes a problem come and go I can always find the source of the problem fairly quickly.
If it hasn’t been used, it doesn’t work
This is one of the biggest frustrations of software development. You design and implement a new feature or application, you test it carefully, and you think you are done. Unfortunately you aren’t. No matter how carefully you have tested, there will be problems as soon as QA gets their hands on it or someone tries to use the feature or application for real work. Either there will be bugs that you missed, or some of the features will be clumsy, or additional features will be needed. Sometimes the entire architecture turns out to be wrong. Unfortunately, the problems come out at a time when you are ready to move on to the next thing (or perhaps you already have moved on), so it’s frustrating to go back and spend more time on a project that you thought was finished. And, of course, you didn’t budget time for this so the cleanup work causes delays in your next project.
I don’t know any solution to this problem except to realize its inevitability and plan for it. My rule of thumb is that when you think you are finished with a software project (coded, tested, and documented, and ready for QA or production use) you are really only 50-75% done. In other words, if you spent 3 months in initial construction, plan on spending another 4-8 weeks in follow-up work. One way to minimize this problem is to get your new software in use as soon as possible. If you can create a skeletal version that is still useful, get people trying it out so you can find out about problems before you think you’re finished. This is one of the ideas behind Agile Development.
Sometimes people just refuse to do the follow-up work: “It’s not my highest priority” or “I will get to it when I have time”. If you take this approach you’ll produce mediocre software. No software is ever gotten right the first time. The only way to produce high-quality software is to keep improving and improving it. There are 2 kinds of software in the world: software that starts out crappy and eventually becomes great, and software that starts out crappy and stays that way.
The only thing worse than a problem that happens all the time is a problem that doesn’t happen all the time
Not much to say about this one: it’s painful to debug a problem that isn’t reproducible. I have spent as long as 6 months tracking down a single nondeterministic bug. Conversely, in my experience any problem that can be easily reproduced can also be tracked down pretty quickly.
The three most powerful words for building credibility are “I don’t know”
Many people worry that not knowing something is a sign of weakness, and that if a leader seems not to have all the answers they will lose the confidence of their team. Such people try to pretend they have the answer in every situation, making things up if necessary and never admitting mistakes.
However, this approach ultimately backfires. Sooner or later people learn the truth and figure out that the person never admits when they don’t know. When this happens the person loses all credibility: no-one can tell whether the person is speaking from authority or making something up, so it isn’t safe to trust anything they say.
On the other hand, if you admit that you don’t know the answer, or that you made a mistake, you build credibility. People are more likely to trust you when you say that you do have the answer, because they have seen that you don’t make things up.
Coherent systems are inherently unstable
A coherent system is one where everything is the same in some respect; the more things that are uniform or shared, the more coherent the system is. For example, a typical cornfield in Iowa is highly coherent: every corn stalk is from the same strain; they’re all planted at the same time, fertilized at the same time, and harvested at the same time. The world of computing is also fairly coherent: most of the world’s computers run one of a few versions of Windows, and almost any computer in the world can be reached using the IP/TCP protocol. Human-engineered systems tend to be coherent.
Natural systems tend not to be coherent. For example, consider the ecosystem of a wetland: there are numerous different species of plant and animal sharing the same area, but behaving very differently with complex interrelationships. The behavior of the overall system is hard to predict from the behavior of any individual in it.
Coherent systems often have advantages of efficiency, which is why humans gravitate towards them. For example, it’s easier to plant the same seed everywhere in a cornfield, and given that some seeds are better than others, it’s more efficient to use the best seed everywhere. It’s also easier to harvest if all of the corn ripens at the same time. It’s more efficient to have a single operating system running most of the world’s computers: once a new facility is implemented for that system, everyone in the world can benefit from it. If there were dozens of different operating systems, then new applications would have to be reimplemented for each of them.
Unfortunately, coherent systems are unstable: if a problem arises it can wipe out the whole system very quickly. For example, a new plant disease could quickly take out a large fraction of U.S. grain production. Computer viruses are another example: a virus that takes advantage of a bug in Windows can potentially impact most of the world’s computers. The U.S. stock market exhibits a certain degree of coherency in the way people think and trade, which results in huge swings up and down as investors move en masse to buy the latest fad or sell when a recession looms.
The incoherence of natural systems give them greater stability. For example, a particular plant disease could probably only affect a small fraction of the species in a wetland.