Category: Sci/Tech

Generalisation (by )

One of the things I instinctively do when designing software, given a client's requirements, is to generalise things as much as possible, in order to make it easier to deal with changing requirements in future, or to avoid having to write special-case code to deal with more unusual situations that they already need handled.

Eg, somebody might say "I want a system to transport email and files between computers in my organisation". So you might think: Ok, I'll start by designing a general packet-switching system to transfer data across an interlinked network of computers, with routing algorithms to work out the best paths, retransmission systems to deal with failures, and so on. Then on top of that I'll build an email system and a file transfer system. That way, most of the difficult stuff is done in a single module that deals with getting data from A to B across an unreliable, changing, network. Email and file transfer are then much simpler modules, with as little duplication of work between them as possible. So it's easy to add more functions to the system in future, and any improvements to the underlying routing engine benefit email, file transfer, and any other application equally.

Standard good software engineering practice, right? Modularise and have an abstract API between layers of a system?

However, sometimes I do this, but am then faced with an uphill struggle, as the client starts wanting changes that break the abstraction layers between the modules...

For example, they might suddenly start saying that they want all the email to go via their fast but expensive transatlantic cable, so it gets their quickly, while spending as little as possible - they pay by the megabyte, but emails are small. Meanwhile, they'd like the file transfers to go via the cheap satellite link, which is slow. But nobody's in a hurry with a large file transfer.

Ok...

But the nice routing module we designed doesn't care what application is using it; it just gets given a bunch of data and told to send it somewhere.

So we have two main classes of choice:

  1. Make the routing system, at the point where it has to choose between satellite or transatlantic cable, break the layers a bit by peeking inside the bunch of data it's given to decide if it's part of a file transfer or an email, and decide how to route it based on that. This is quick and easy, but it means that the routing system now needs to know a bit about the applications, so it'll now need updating if extra applications are added or the rules change, which increases maintenance overhead and scope for error.
  2. Sit down and have a think about this requirement, and how it might impact future applications (a bit of prediction and guesswork is required here), and design a change to the API to fulfill that need. For example, adding a "type of service" field to every chunk of data given to the routing system, saying whether it needs to get there quickly or cheaply. This creates a more maintainable system in future, but is also more up-front work.

However, it really makes my life hard when people, after requesting a system with so many esoteric variant cases on a complex operation, and the expectation that more variant cases will arrive in future, that it has to be a very modularised system to control the complexity - but where one case is by far the most common - then start requesting changes to the system that totally ignore the fact that there are any exceptions to the common case.

Which is then a real headache to deal with, as you have to figure out how their feature applies to all the other variant cases as well, and try to explain this to them...

Meaningful results (by )

In this weeks New Scientist there is an artical about subconcous racism which has annoyed me - they showed the participants pictures of either black peoples faces or white peoples sublimany and then showed them a blurry picture of either an ape or a large cat and those shown the afro-carrabian faces recognised the ape picture quicker than those shown the white face first.

And this is meaningful? What profile was the cat in? Whole body outline or just face? What colour was the ape? Dark? Try it with one of the lighter coloured monkeys and see what result you get or alter the picture so that its an albino ape for goodness sake. Then it turns out that they didn't have enough black people in the study to see if they made the same association.

People are very good at rocognicing faces - we excell at it and the other primates faces are very close to ours unsuprisingly. Then you show people something similar in colour of course thye are going to process the info quicker! It said Asians taking the test recognised the apes quicker too and also that even with just the word being flashed at one group they got the same sort of association - now that is interesting and something that I would have thought would have shown an unconcous (though I'd've have thought not very) rasism. To me this picture test is worse than useless and is stirring feelings that probably didnt exist until it was published.

The comments about the evolution of man pictures being partially to blame annoyed me too as I was always teasing my very corcasion uncle that he was the missing link becuase - quiet frankly with his brow ridge he looks far more like the missing link than any of my ethnic friends.

p.s. I'm not entirely sure what terms are politically correct and which arent as I have never dwelt too long on the subject as I think its stupid - people are people. I do find myself wondering what the results would be like for an experiment like this done properlly over different cultures mind - to be fair to them it maybe a good study and the write up just a bit iffy but I find myself doubting it. If anyone knews more about this let me know.

Appologies if I've offended anyone.

Debugger is not a naughty word (by )

Computers are famed for harbouring bugs, and the high rate of failures in software compared to other industries is a constant cause of embarrassment. I'd like to explore why this is, with an example. And what we might be able to do about it.

Note: Although a lot of the details of the remainder crash are unfortunately very technical, I have done my best to explain things in a way that lay people should be able to make some sense of. However, some things would require a lot of background information, in which case I've just plowed on without explanation. So if you come across things that you don't understand, feel free to skip ahead a bit; you shouldn't lose too much.

Read more »

Bah, back to Rails! (by )

Gah. I was happily thinking that I was slowly getting rid of my last Ruby on Rails projects, when I've ended up on another one. Despite extensive hype, Rails is a bit of a dog... Ruby has some niceties, but also has a lot of misfeatures, and the implementation is still maturing. But Rails is dreadful, dreadful code. It helps you if you're working within its limited model, but as soon as you come across a problem that doesn't fit into that model, you're quickly working at pretty much the same level as PHP and everything else: writing Ruby code that is called upon an HTTP request and has to output some HTML back, albeit with access to a reasonable template language... And it's full of bugs and gotchas that keep you up all night screaming "Why doesn't this wooooorrrkkkkk?!?!?!?".

You see, the project is a major extension of a Rails project we'd mainly finished. We had the choice of rewriting the whole thing in something else (which would involve some time spent catching up with where we already are rather than producing anything new), continuing with Rails, or making a hybrid system, writing the new parts in something else while reusing parts in Rails (which would be a bit ugly).

Since the client wants to see interesting new things appearing after the first week or two, sadly, it had to be Rails...

Excessive mail filtering (by )

I've been taking advantage of some Christmas downtime to bring the Warhead mail system up to scratch.

We now have many layers of defence.

  1. When a remote mail server tries to connect to us to send email, if they are a known blacklisted spammer or have a wrongly configured mail server, we reject them up front.
  2. If they get through that, then unless they are a known good mail server, they are told to go away and come back later. Many spammers don't bother retrying mails if asked to, so this cuts out a lot of spam.
  3. If they are a known good mail server or they do come back later to redeliver the email, then the message is accepted.
  4. It's then sent through a content filter, which checks it for known bad signatures (viruses, scams, some spam, and phishing attempts). If it matches any, it's bounced back to the sender.
  5. The content filter then runs it through SpamAssassin's battery of message scoring tests, which rate the chances of the message being spam. If it looks spammy, it's marked as looking spammy with ***SPAM*** in the subject line, but still delivered (since SpamAssassin's tests are statistical in nature, they can snag false positives)
  6. Finally, the message is forwarded on, or delivered to a local mailbox, depending on the recipient.

From my existing statistics, I know that of about 15,000 messages a day, 13,000 are stopped by the first step alone (which is good, since blocking at this stage saves a whole lot of resources on our mail servers).

I'm looking forward to seeing how many of the surviving 2,000 make it past the rest of the filters 😉

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales