Debugger is not a naughty word (by alaric)

Software reliability

As you can see, bugs often arise due to unintended interactions between things. Making my little change to the client library, I didn't anticipate that it might affect the reminder at all, so didn't stop and think about the reminder in enough detail to realise that it would. When writing the reminder, I made an assumption - that my cursor would remain valid unless I, in the reminder, did something to stop it from being so, which I didn't. And when writing the table reload functionality, I made another assumption, that nobody would care if I close and reopened the DB4 environment. But the software system as a whole is sufficiently complex that, to be honest, it's not reasonable for a human to actually consider the consequences of a change for every one of the thousands of lines of code in the system. And, so, things get missed, and bugs created.

Which is why software is so unreliable and buggy: it's very complex, and it's very easy to cause far-reaching unintended effects with a small change. It's like building a house of cards, except that the consequences of a mistake aren't immediately apparent. Perhaps it's more like designing a complex house of cards on paper, bound by a picture of what the final result should look like, which other people try to build under varying conditions. One small subtle mistake can cause the whole thing to tumble, perhaps only in certain situations. In the case of my reminder bug, there wasn't really a bug in the reminder, and there wasn't really a bug in the client library. Both did their jobs perfectly. It's just that the client library, in doing its job, happened to do something that made the reminder, when doing its job, crash. It'd be hard to say that there was really a bug in either: the bug was in the system as a whole.

What can we do about it?

Well, I just listed two reasons why software is buggy: complexity and the ease of causing unintended consequences.

Pages: 1 2 3 4 5

Computing | alaric | Mon 14th Jan 2008 11:44 pm

7 Comments

By @ndy Macolleague, Tue 15th Jan 2008 @ 2:02 pm

Hi,

Here is a quick "highest number we've seen all numbers less than or equal" algorithm that I just cooked up whilst reading this. It's pretty much at the pseudo code level and I've not even tried to compile it but I think it illustrates a point.

I've don't store the numbers that come in: I just store a set of flags so that we know if we have seen a given number or not.

int main (void) { int i; /* Most recent value read / int blk, idx; / The block that i lives in and the index into block / int mem[1024]; / Some space. Highest value is 1024 * sizeof(int) * 8. / int h = 0; / The block that we are up to / int val; / The answer so far */
```
    while (h < 1024) {
            i = read();
            blk = i / (sizeof(i) * 8);
            idx = i % (sizeof(i) * 8);

            mem[blk] |= (1 << idx);

            for ( ; h<1024, mem[h] != 0xFFFF; h++);

            val = ((h-1) * (sizeof(int) * 8)) + pop_count(mem[h]);

            /* pop_count is number of bits that are 1 */

            printf("Highest with all less than or equal: %d.\n", val);
    }
```
}

Seeing as we are tracking h we could probably throw away / reuse the early parts of mem as they fill up. That would mean that mem would only have to be big enough to store the range of values that might be un decided at any given time. To do that would make the illustration above less clear so I didn't bother as it'd be mostly memory management rather than part of the core algorithm.

By @ndy Macolleague, Tue 15th Jan 2008 @ 2:04 pm

Hmph! The formatting of the variable declarations has been messed up. Here it is again:

int main (void) {

    int i; /* Most recent value read */

    int blk, idx; /* The block that i lives in and the index into block */

    int mem[1024]; /* Some space. Highest value is 1024 * sizeof(int). */

    int h = 0; /* The block that we are up to */

    int val; /* The answer so far */


    while (h < 1024) {
            i = read();
            blk = i / (sizeof(i) * 8);
            idx = i % (sizeof(i) * 8);

            mem[blk] |= (1 << idx);

            for ( ; h<1024, mem[h] != 0xFFFF; h++);

            val = ((h-1) * (sizeof(i) * 8)) + pop_count(mem[h]);

            /* pop_count is number of bits that are 1 */

            printf("Highest with all less than or equal: %d.\n", val);
    }

}

By Gavan Fantom, Tue 15th Jan 2008 @ 2:12 pm

I would have expected that for nearly-sequential input, storing ranges (ie extents) would have been a reasonable compression. The answer is then the size (well, strictly speaking, the higher end) of the first extent.

The size of the extent map would then be a reasonable metric for the extent of disorder in the input.

Depending on the quality of the input, an extent map of all numbers not seen so far may also be a suitable choice.
By @ndy Macolleague, Tue 15th Jan 2008 @ 2:19 pm

Hi,

This thing probably has loads of bugs... The for loop will prevent the while loop terminating. So change it to "while (val < (1024 * sizeof(int) * 8))" or just change the memory management model and make it "while (1)".
By @ndy Macolleague, Tue 15th Jan 2008 @ 2:27 pm

Well... the for loop should have prevented the while loop from terminating as it should be h<1023 not h<1024 otherwise we end up with a bounds overflow on the next line. Perhaps I should have compiled this... Maybe I still should...

Oh dear.
By Gavan Fantom, Tue 15th Jan 2008 @ 2:36 pm

In my experience, the biggest factor in improving quality in software is visibility. You touched on a few of the "rules" which have been proposed over time, but the overriding rule as far as I'm concerned is "Write clear and readable code". If you can read it and understand it, you have more chance of spotting when something is not right. This usually extends to having a clear write-up or diagram describing the design clearly so that someone trying to understand the code can understand the design at the same time.

But that's not the be all and end all of visibility. Unit tests can help here, but only if you actually run them.

Amazingly, large codebases with multiple developers often suffer from the problem of the head of the repository not even compiling. When everybody is updating all the time it gets noticed and fixed regularly, but when you have people doing the bulk of their development on a snapshot or a branch and only updating to the head or the trunk occasionally, this can become a really big problem. Again the way to solve this is by greater visibility - in this case autobuilding.

Another frequent failure is to spot a bug (or a potential bug), not fix it straight away, and then forget about it. Two years later it rears its ugly head, and by that time you've completely forgotten about it and have to debug from scratch. It is only after spending weeks debugging that you realise that you've already seen this, and then you wish you'd fixed it at the time, or at least written a TODO item, or tracked it in a bug database.

And if it's in a bug database, it's visible. It's measurable. You can generate a report of all known bugs.

So once you have readable code, documented design, (some) unit tests which you run daily right after your nightly builds, and a bug database full of all the bugs which you spotted and didn't have time to fix, you still have bugs that you don't know about. What about them?

Well, software is not fault-tolerant except in extremely specific cases. So if a fault does occur, make sure your code spots it early and fails noisily. This will increase visibility of the problem. It will also prevent knock-on failures, as the code will simply stop executing once bad data has been detected. And also, hopefully, it will make it easier to isolate where the problem occurred. This is all extremely good news when it comes to debugging. A failed assertion is much more useful than a Segmentation violation.

The more you can see, and the clearer it is, the higher the quality of the resulting software, given a competent and conscientious programmer.
By alaric, Thu 17th Jan 2008 @ 10:29 am

@ndy: That's similar to the principle I used, I guess. Since the algorithm runs continuously with an ever-increasing (and, eventually, wrapping back to 0) sequence of numbers (yes, before people ask, this is for packet sequence numbering in a network client!), I indeed have a buffer, but set up so that it slides along the space of sequence numbers, so it only stores the "frothy" region; the "solid" region where we have seen all the sequence numbers gets pushed off of the bottom of the buffer ASAP to free up space for froth.

Gavan: Yep, good points, thanks

Snell-Pym

Sarah and Alaric Snell-Pym living in interesting times

Debugger is not a naughty word (by alaric)

Software reliability

What can we do about it?

7 Comments

Other Links to this Post

Leave a comment

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta

Snell-Pym

Sarah and Alaric Snell-Pym living in interesting times

Debugger is not a naughty word (by alaric)

Software reliability

What can we do about it?

7 Comments

Other Links to this Post

Leave a comment

Subscribe

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta