I took over the software itself from another programmer. His coding was quite nice with only one or two minor oddities which I am happy to overlook. However the unit tests that were available did not consist of much and were not documented (which is fair enough).
The software has had about 20 minutes downtime in 10 years so the quality expectations were very high. I assumed I would make mistakes and so took multiple steps to try to reduce the risk. This article concentrates on the testing step.
Use the Tools You Get For Free
I used to write an Apache module in C++ which handled about $800 million a year in advertising clicks at its peak, called the clickserver.One of the most amazing tools for C++ is valgrind, specifically memcheck. It is vital that C++ software be run through valgrind memcheck with representative data. Yes valgrind does vomit up thousands of references to the STL string library which is extremely annoying. The most important thing to pay attention to, in my experience, is uninitialised variables. Uninitialised will almost certainly cause logic errors in your code. Rather than memory leaks which are serious but not as serious as logic errors.
Play It Again Sam
One of the most successful strategies I used when developing the clickserver was to replay real world interactions from real visitors. This made it easy to find any regression issues with the new software.I would frequently catch one or two major bugs: both regression and in the new features.
The previous programmer wrote a script to replay the log but the difference tool just used diff, which ignored the subtleties of the log format: it should always find a difference (e.g. timestamp). I wrote a more sophisticated tool which would diff the log files and allow elimination of false positives.
There were a few challenges with the replay scenario: environment and false positives.
The output of the clickserver (redirect location, log entry and outgoing cookies) was determined by the input URL, incoming cookies, cached data from the database and configuration. All four inputs must be identical to the original.
The input URL and incoming cookies can be reconstructed from the Apache log, with care to adjust timestamps that were contained in the inputs.
Fortunately the database caches were in file form and were archived. Files were used in case the database went offline and could easily be rolled back to an archive if things were messed up enough. Fortunately that never happened.
The false positives were another challenge. The issue is that bugs are fixed and new features are added in the new software. This will mean that there are differences that make sense. Initially I added an option to my log difference tool to ignore differences in various fields.
The problem is that I only want to ignore various fields if it is definitely a bug fix or new feature. To achieve that I added the rhino engine (ScriptEngine) that is built into Java to allow a more nuanced elimination of false positives. Performance was a real challenge using ScriptEngine. I eventually split the contents of the ignore script into two parts: one function returned a list of fields to ignore (for example completely new fields or those that changed every time) and one did a one-by-one analysis on each result that was a positive match. I did not design the script system to be able to override false negatives.
My Eyes Are Glazing Over...
As anyone who has done any form of in-depth testing can attest to, having too much information to check can cause one's brain to go into neutral. Particularly where we are looking for an error in a thousand or a hundred thousand entries.Having a closed loop unit test is critical in that regard. An automated unit test should require no human intervention. It sounds obvious but many fall into the trap of creating a bunch of stimuli that a human has to go back to check.
A human should only be involved in filtering out false positives and debugging problems. Of course nothing can be perfect.
Closed Loop Automated Testing
In addition to the log difference utility I also wrote a suite of unit tests. Athough they simply used the interfaces presented by the software (http and log file) and did not test each class directly, which could be considered "functional testing", it used a low-level white box approach which should qualify as "unit testing". I used my knowledge of the internals of the software to create tests which should test the difficult parts of the software.The tests were written in pyunit (unittest2). The tests were built up from a number of sources:
- the (undocumented) unit tests that had been present before
- use cases for visitors. Including what the visitor should do and what the visitor can do
- protocol tests: both following our protocols, following our protocols in edge cases and breaking our protocols
- bugs (and their fixes)
- new features
Furthermore writing documentation shook out a number of inconsistencies and misunderstandings. I will go into further detail later.
I wrote a number of support classes, including a python implementation of tail. The end result ran in a few seconds, which was important because I wanted to be able to run it frequently. When the tests were slower I ran them less frequently.
Testing And Documentation?
Fortunately my former employer was forward thinking enough to allow me to do a testing project. I also included documentation in this for two good reasons. Firstly it was something else that needed to be done.More importantly documentation and testing are intertwined. What is the correct behaviour? What if there are conflicts between what makes sense as "correct" behaviour in two different instances? I once heard that generally each software feature is trivial, it is the interactions between those features that generates complexity.
Furthermore in my experience good documentation improves a design. Having requirements, design, technical documentation or user documentation that has condition upon condition is usually a code smell for bad design. It requires users, programmers and application support hold more information in their minds.
Some people argue that the code is the documentation. If that is the case then the code by definition cannot have any bugs because it is being tested against itself. Furthermore less technical readers cannot read code. Even technical documentation may be read by application support, testers or sysadmins.
Documentation is required to describe a "contract" that the software will adhere to. The protocols and business requirements also form a "contract" that must be included. These contracts are then used in testing. These contracts are not enough to completely describe the software. I am not an advocate of design by contract (I will write about this in another post) but I am happy to pinch an idea here and there.
So documentation gives you something to test against.
The Future: Statistical Methods
Just before I left my last work I was investigating using statistical methods for monitoring new software which was deployed.I started using Student's t-test but found that our data was too noisy to get a decent separation.
We knew anecdotally that our data was affected by a number of factors: local time of day, day of the week, day of the month, day of the year, client expenditure and many others.
Thus a multivariate analysis would have been likely to be helpful but I didn't have a chance to try it.
No comments:
Post a Comment