Debugger is not a naughty word (by )

Complexity

To a certain extent, software has to be complex. There's a lot demanded of it, having to fulfil complicated requirements in ever-changing environments, while having only a very simple set of basic operations provided for you by the hardware and operating system. The web browser you are reading this with manages to do what it does given only the basic rudiments of the ability to communicate with web servers to fetch the pages, the ability to draw text and pictures on the screen, and basic arithmetic (more or less).

But it's often overcomplicated. Usually, this is a consequence of changing requirements. New features are added with the minimum of effort on top of the existing system, and the existing system usually wasn't designed to support the new feature. So one either rewrites parts of the existing system to make it better suited to the new functionality while preserving the old, or one writes the new feature in an unnecessarily complicated way in order to fit it on top of a wonky foundation, or a mixture of the two.

And with deadlines being what they are, the tradeoff is usually a bit too much towards the former. Which is a fine tradeoff in the short term, getting you the desired function sooner and with less cost, but in the long run, the complexity mounts up until you start to have serious troubles with bugs, and find it harder and harder to add new features without breaking some existing ones.

This is my main problem with Extreme Programming. Software agility (the ability to adapt the application to changing requirements rapidly), I feel, relies far more on a good up-front design that's designed for change than just hacking away at things rapidly.

Also, I feel that designing software to be easily changed is a matter of skill - mainly for the reason that it's not been studied enough yet, at the right level of scrutiny, for theories and guidelines to be fully settled. The best advice I've found so far is to divide your application into modules and make the interfaces between the modules as general as possible. Try and see how little you can assume and still meet the requirements. Decouple those modules as much as you can, ideally with a system of plug ins. Encouraging the idea that the configuration of the system is very dynamic early on makes it an underlying philosophy of expecting change.

The ease of causing far-reaching unintended consequences

It's all too easy to do something that breaks things somewhere, in modern programming languages. This has given rise to a wide range of conventions on restricting how you use the features of programming languages, in order to try and avoid practices that carry a high risk. As you might imagine, these are often slavishly followed without true understanding.

"Global variables are bad"

The rationale for this rule is that having global mutable state means that the side-effects of an operation are not constrained to only affecting those things explicitly passed to it as arguments. In other words, when invoking an operation, the programmer specifies at that point what bits of data the operation should be performed upon. However, if the operation may also affect other 'global' bits of data, that is not made clear, so the consequences are less obvious.

However, this problem is not really down to global variables: it's down to mutable state in general. We'll look into a more far-reaching solution that problem below under "functional programming".

In particular, the kind of problem this tries to address is the one I had with the reminder crash bug, but I wasn't using global variables. Nonetheless, because complex data structures were involved, there was unexpected consequences. The Get function is passed a client context, which refers to the DB4 environment, which refers to every table opened in the environment, which refers to the open cursor. However, the references from the DB4 environment to the table to the cursor are maintained by the DB4 library and not by my code, and the Get function notionally only uses the DB4 environment to access the data tables to read data from (the resetting of the environment being a special case), so I failed to notice the link.

"Structured programming is good; GOTO considered harmful"

Back in 1968, it was suggested that the "GOTO statement (should be) considered harmful", suggesting that software could be written with less bugs if the rules of "structured programming" were followed. At that point, many programs were written using "GOTO", meaning that they were a sequence of operations punctuated by "conditional jumps" - things of the form "If then go to step 7". This meant that programs were threaded by all these conditional jumps, and it was often hard to see what they did without printing them out and actually drawing in the "gotos" as arrows from the line requesting them to the target line. And a hard-to-understand program is one that you are very likely to introduce bugs into when making changes, as it's hard to work out the consequences, and easy to miss things.

Structured programming, however, eschewed 'go to', and divided the program into chunks (usually called 'blocks'). These blocks were then joined together to make larger blocks by using structured programming constructs, the usual set of which are:

if <some condition holds> then <do this block> else <do this other block>

while <some condition holds> <do this block over and over again>

There are others, but they are mainly just variations of the above two. Since the blocks within the constructs were usually written indented on the page, it was very easy to see the beginning and end of each, so the structure of the code was obvious without having to draw any arrows.

However, it is still a restriction, and some problems don't fit very well into the structured programming model. The usual case is where operations may fail, in which case we want the whole program (or subroutine) to quickly give up, perform some clean-up action, then stop.

Written in a structured way, this might come out as:

Do operation A
if it succeeded:
  Do operation B
  if it succeeded:
    Do operation C
    if it succeeded:
      Clean up after success
    else:
      Clean up after errors
  else:
    Clean up after errors
else:
  Clean up after errors

Clearly, the "Clean up after errors" case ends up occurring three times, since either of operations A or B or C may fail and need clearing up after. Duplicating code is often a bad idea, since if some change occurs later that alters what cleanup is required, we have to remember to change both copies - or we've just introduced a bug that will only become visible when one particular sub-operation fails.

There are ways around this, but they all introduce their own problems.

So some programming style guides recommend that "go to" be allowed, purely for the case of error handling, reduce the above to:

Do operation A
If it failed:
  go to 'clean up after error'
Do operation B
If it failed:
  go to 'clean up after error'
Do operation C
If it failed:
  go to 'clean up after error'
Clean up after success
go to 'end'

clean up after error:
  Clean up after errors
  go to 'end'

end:

However, I personally feel that modelling complex imperative algorithms as state machines is a promising approach. It's worked well for the electrical engineers, who almost invariably model complex discrete behaviour as state machines. But that's the subject of another article.

However, clearly, none of this would have avoided the reminder crash bug.

Functional programming

Functional programming is a different style of programming entirely, rather than just a set of guidelines for using the features of a programming language. Although there are "functional languages", it's usually possibly to program in a moderately functional style in any language that provides recursive functions; some languages make it easier than others.

Functional programming is the ultimate solution to the problem of unexpected changes occuring: in functional programming, one never changes anything. Rather than writing programs to act upon objects (eg, opening a database table), one simply writes functions that take data as input, and produce an output based upon the input data and read-only globals.

Although this takes a bit of getting used to, for programs that mainly process data, it's actually quite intuitive in the end - and often results in drastically fewer bugs than a normally-written program. However, "takes a bit of getting used to" is fatal. In the short term, that means unproductive time in training. Less ready-trained programmers to choose from. And so on.

And, in this industry, far too much weight is given to short-term considerations.

On the downside, of course, functional programming doesn't handle problems that inherently revolving around altering state so well. However, there is a slight hack that lets you write some parts of a functional program in an imperative style, if you need to, known as a Monad, and a more elegant but less developed technique known as Uniqueness typing.

There are many other attributes of functional programming that help to avoid bugs: one of which is the use of recursion for iterative behaviour, which tends to make it easier to avoid Off-by-one errors.

It's hard to say what effect functional programming would have had on the reminder bug - because the structure of the whole system would have been quite different under a functional paradigm... Much as it would be hard to consider the effect of an aspect of our design for a house of cards upon a design for a house made of lego bricks.

Garbage collection, range checking, and other language features

Many of the security bugs that worms have exploited are buffer overflows. These arise from a fixed-size area of memory being set aside for something (the buffer) then, due to a failure to check that the data you're dealing with will fit within the buffer, data overflowing the buffer into adjoining bits of memory.

The reason this happens so often is that many programs are written in C, and C doesn't have any inbuilt concept of fixed-size buffers. In C, it's entirely up to the programmer to not overrun their buffers. While many other languages - Java, LISP, Python, and so on - have an inbuilt concept of a buffer (under any of a number of names), and their inbuilt implementation has buffer size checks built in, so any buffer operation written in that language will not allow overrunning.

That's not the only programming language feature that can help avoid bugs: another is garbage collection, whereby regions of memory that are no longer needed by the program are automatically detected and reused for other things. Without garbage collection, the programmer has to work out at what points in his program things will never be needed again - which can be very difficult. If you get it wrong and don't declare things unused when they are, your program can suffer a memory leak and use ever-growing amounts of memory until it runs out. But worse is when you mistakenly mark a region of memory as unused incorrectly, and then try and use it again. If it hasn't been used by anything else since, it may well still work perfectly - only to break erratically (and usually very, very, mysteriously) when the same piece of memory happens to be used for two different things at once.

But with garbage collection, you're saved from both problems - and saved a lot of effort doing manual memory management, too, meaning you can get on with developing features (and thus free up some time to write unit tests...)

Making things private

Many modern programming languages offer the facility to actually hide things in one part of the program from other parts of the program. The idea is that the program should be written as a set of modules, each responsible for a particular task, and with clearly-defined interfaces to other modules. The other modules need only know the interface, and should know nothing of the internal details of the modules they use. That way they can only affect the other modules in clearly-defined ways, obviously reducing the scope for accidental consequences.

Few people disagree with that, but there is still some suspicion about the idea of enforcing this by making it impossible to access things defined in another module if they are marked as private. Some suggest that this implies the programmers are not trusted, and derisive terms such as "bondage and discipline language" have arisen.

However, I take a slightly different tack. Don't focus on making things private: just focus on making things public. Think carefully about what parts of a module you do expose, for they are where trouble can arise.

Defining of a boundary (between two modules, or between two nations) in terms of a fortified fence is perhaps less meaningful than defining the boundary in terms of the passages across that boundary. To remove the fence entirely is to make it very hard to describe the boundary.

Regardless of whether you enforce your boundaries through features of the programming language or just through convention.

Pages: 1 2 3 4 5

7 Comments

  • By @ndy Macolleague, Tue 15th Jan 2008 @ 2:02 pm

    Hi,

    Here is a quick "highest number we've seen all numbers less than or equal" algorithm that I just cooked up whilst reading this. It's pretty much at the pseudo code level and I've not even tried to compile it but I think it illustrates a point.

    I've don't store the numbers that come in: I just store a set of flags so that we know if we have seen a given number or not.

    int main (void) { int i; /* Most recent value read / int blk, idx; / The block that i lives in and the index into block / int mem[1024]; / Some space. Highest value is 1024 * sizeof(int) * 8. / int h = 0; / The block that we are up to / int val; / The answer so far */

        while (h < 1024) {
                i = read();
                blk = i / (sizeof(i) * 8);
                idx = i % (sizeof(i) * 8);
    
                mem[blk] |= (1 << idx);
    
                for ( ; h<1024, mem[h] != 0xFFFF; h++);
    
                val = ((h-1) * (sizeof(int) * 8)) + pop_count(mem[h]);
    
                /* pop_count is number of bits that are 1 */
    
                printf("Highest with all less than or equal: %d.\n", val);
        }
    

    }

    Seeing as we are tracking h we could probably throw away / reuse the early parts of mem as they fill up. That would mean that mem would only have to be big enough to store the range of values that might be un decided at any given time. To do that would make the illustration above less clear so I didn't bother as it'd be mostly memory management rather than part of the core algorithm.

  • By @ndy Macolleague, Tue 15th Jan 2008 @ 2:04 pm

    Hmph! The formatting of the variable declarations has been messed up. Here it is again:

    int main (void) {

        int i; /* Most recent value read */
    
        int blk, idx; /* The block that i lives in and the index into block */
    
        int mem[1024]; /* Some space. Highest value is 1024 * sizeof(int). */
    
        int h = 0; /* The block that we are up to */
    
        int val; /* The answer so far */
    
    
        while (h < 1024) {
                i = read();
                blk = i / (sizeof(i) * 8);
                idx = i % (sizeof(i) * 8);
    
                mem[blk] |= (1 << idx);
    
                for ( ; h<1024, mem[h] != 0xFFFF; h++);
    
                val = ((h-1) * (sizeof(i) * 8)) + pop_count(mem[h]);
    
                /* pop_count is number of bits that are 1 */
    
                printf("Highest with all less than or equal: %d.\n", val);
        }
    

    }

  • By Gavan Fantom, Tue 15th Jan 2008 @ 2:12 pm

    I would have expected that for nearly-sequential input, storing ranges (ie extents) would have been a reasonable compression. The answer is then the size (well, strictly speaking, the higher end) of the first extent.

    The size of the extent map would then be a reasonable metric for the extent of disorder in the input.

    Depending on the quality of the input, an extent map of all numbers not seen so far may also be a suitable choice.

  • By @ndy Macolleague, Tue 15th Jan 2008 @ 2:19 pm

    Hi,

    This thing probably has loads of bugs... The for loop will prevent the while loop terminating. So change it to "while (val < (1024 * sizeof(int) * 8))" or just change the memory management model and make it "while (1)".

  • By @ndy Macolleague, Tue 15th Jan 2008 @ 2:27 pm

    Well... the for loop should have prevented the while loop from terminating as it should be h<1023 not h<1024 otherwise we end up with a bounds overflow on the next line. Perhaps I should have compiled this... Maybe I still should...

    Oh dear.

  • By Gavan Fantom, Tue 15th Jan 2008 @ 2:36 pm

    In my experience, the biggest factor in improving quality in software is visibility. You touched on a few of the "rules" which have been proposed over time, but the overriding rule as far as I'm concerned is "Write clear and readable code". If you can read it and understand it, you have more chance of spotting when something is not right. This usually extends to having a clear write-up or diagram describing the design clearly so that someone trying to understand the code can understand the design at the same time.

    But that's not the be all and end all of visibility. Unit tests can help here, but only if you actually run them.

    Amazingly, large codebases with multiple developers often suffer from the problem of the head of the repository not even compiling. When everybody is updating all the time it gets noticed and fixed regularly, but when you have people doing the bulk of their development on a snapshot or a branch and only updating to the head or the trunk occasionally, this can become a really big problem. Again the way to solve this is by greater visibility - in this case autobuilding.

    Another frequent failure is to spot a bug (or a potential bug), not fix it straight away, and then forget about it. Two years later it rears its ugly head, and by that time you've completely forgotten about it and have to debug from scratch. It is only after spending weeks debugging that you realise that you've already seen this, and then you wish you'd fixed it at the time, or at least written a TODO item, or tracked it in a bug database.

    And if it's in a bug database, it's visible. It's measurable. You can generate a report of all known bugs.

    So once you have readable code, documented design, (some) unit tests which you run daily right after your nightly builds, and a bug database full of all the bugs which you spotted and didn't have time to fix, you still have bugs that you don't know about. What about them?

    Well, software is not fault-tolerant except in extremely specific cases. So if a fault does occur, make sure your code spots it early and fails noisily. This will increase visibility of the problem. It will also prevent knock-on failures, as the code will simply stop executing once bad data has been detected. And also, hopefully, it will make it easier to isolate where the problem occurred. This is all extremely good news when it comes to debugging. A failed assertion is much more useful than a Segmentation violation.

    The more you can see, and the clearer it is, the higher the quality of the resulting software, given a competent and conscientious programmer.

  • By alaric, Thu 17th Jan 2008 @ 10:29 am

    @ndy: That's similar to the principle I used, I guess. Since the algorithm runs continuously with an ever-increasing (and, eventually, wrapping back to 0) sequence of numbers (yes, before people ask, this is for packet sequence numbering in a network client!), I indeed have a buffer, but set up so that it slides along the space of sequence numbers, so it only stores the "frothy" region; the "solid" region where we have seen all the sequence numbers gets pushed off of the bottom of the buffer ASAP to free up space for froth.

    Gavan: Yep, good points, thanks

Other Links to this Post

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales