C (by )

I spent a lot of time programming C in my youth. I went through the canonical route; BASIC on an 8-bit home micro, moving up to Pascal then C when I managed to obtain access to implementations of either. At the time, conventional wisdom was that C was the best language about; the easy access to the underlying model of memory as an array of numbered bytes allowed the programmer to write efficient code to perform low-level data processing operations. The mantra was that there was a tradeoff between expressive power and safety; languages like Pascal made it harder to shoot yourself in the foot, at the cost of preventing you from doing interesting things.

But, with the advent of an Internet connection, I gained access to non-mainstream languages, and my explorations of the wider world of programming language technology began. I still dabbled in C or C++ from time to time, when the situation demanded it, but never on very complicated projects.

Recently, though, I've been working on a large C project. And I've found that I'd quite forgotten just how horrible it is in comparison to the languages I've been using recently...

For a start, I've been spoiled by garbage collection. My C project is a database. To capitalise on an expected locality of reference, several related data items are bundled together into a complex structure which is stored as a single unit on disk. As such, I have a routine that parses a sequence of bytes and generates a fairly complex tree of structs linked together with pointers, and a routine that takes such a tree and folds it back down into a sequence of bytes again.

Initially, I tried to save a lot of mallocing by making the data structure refer to the serialised form for its strings, by having strings zero-terminated in my serialised representation. The data structure merely provided structure to the strings that remained in the loaded char array. However, as the complexity of it grew, the need to perform read-modify-write cycles appeared; one would read a record, parse it into a structure, modify parts of the structure, serialise it to a new record, and write the result back to disk. But the modified parts of the structure would involve making pointers to new strings, which would need to be freed along with the structs at the end of the operation. And so I had to weigh the costs of various approaches:

  • Change my code to not use pointers into the existing buffer, and just strndup strings from the buffer. Whenever any string pointer is replaced by another, the old value must be freeed. Then at the end, every string must be freeed.
  • Add a flag to every string, indiciating whether it needed freeing or not. Unset the flag for strings referencing the input buffer, and set it for fresh strings. When replacing a string with another, check the flag and see if the old value needs freeing first.
  • Point into the existing buffer for loaded strings, and just replace them with malloced strings when altering things, then at the end compare pointers for every string to see if it points into the buffer (and need not be freed) or not (in which case, free it).
  • Keep the scheme of pointers into the buffer, but maintain a separate linked list of strings to free at the end, and put all new strings onto this list.

There's probably others, but of those four, I had to decide on two tightly-connected criteria: how long would it take to go through the code and change everything to abide by the new conventions - and how easy would it be to miss one and introduce a subtle memory bug? The job was compounded by the fact that there were several different functions involved in these operations, called in different patterns in different places.

In general, it's a pain having to keep track of memory accountability. Some of my functions that accept strings only read the string within the body of the function, so you can freely call them on static buffers, strings in the heap, substrings of larger strings, or whatever you like. But some require a string pointer they can store away in a data structure and use later, which had better be malloced since the users of the data structure will one day free it. And some store their string pointer argument away, but then just return it to you later, so it's up to you to handle the scope of the string memory itself. And so on.

All of this needs to be documented in comments, and those comments read again later. It's very easy to accidentally get an expectation wrong, and end up with a program that exhibits occasional obscure crashes.

And that's the other thing that I've hated with C: obscure crashes.

Pages: 1 2

1 Comment

Other Links to this Post

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales