Automa Blog

The number one rule for dealing with unstable builds

We'll soon have some pretty revolutionary news for you. While we're working hard to make them happen, here is a little post for you with our experiences of how to best deal with an unstable build.

Suppose you have a build that passes most of the time but fails with irreproducible errors every so often. Maybe the build accesses an unreliable service, or performs some automated GUI tests that are not as stable as they should be.

The next time the build fails, do not simply run it again. This is worth repeating: Do not simply re-run a broken build in the hope that it will magically succeed.Yes, you are just working on something else and really don't have time to deal with this instability now. But if you ignore instabilities like this too often, you start going down a very dangerous route.

Every time the build fails, you are given the unique opportunity to fix one of your build's (or even software's!) instabilities. This is great! You get to experience first-hand how a bug that will most likely affect your users manifests itself. Now is your chance to fix it. If you simply re-run the build, you will most likely lose valuable information, such as log files.

If you make it a habit to simply re-run the build when an instability occurs, it is likely that more and more instabilities will sneak into your program. The failing builds cost you more and more time, and you feel even more like the last thing you have time for is to fix the occasional failing build. A vicious circle.

If you really don't have time to investigate an instability when it occurs, at least make time to save all information such as log files that might give you a clue as to what caused the problem. Once you have completed your pressing, pending task, go back and investigate what caused the build to fail.

Sometimes when an instability occurs, you don't have enough information to find out what caused it. Do not take this as an excuse to ignore the instability. Add more logging that might let you find out what the cause of the problem is the next time the instability occurs.

One approach we have had very good experiences with was to keep logs of when each particular instability occurred. The problem with an instability is that, since it is not reproducible, it is often not possible to test whether an attempted fix actually works. If you have logs of when the instability occurred, you can estimate roughly how often it occurs (eg. once a week, or on every 10th build). If after your fix attempt you then don't see the instability for two weeks or 20 builds in the previous example, you can be pretty confident that your fix attempt was indeed successful.

We'll be at the Agile Testing Days in Potsdam, Germany tomorrow. If you see us around, do come and say hi!

Happy automating! :-)

blog comments powered by Disqus