What I've since become more interested in is less linear operations between the S-boxes. Analyses of the implications of a small change in the input, as I've done, rely to a certain extent on the simplicity of the diffusion between layers. While the S-boxes provide a high level of good old key-dependent unguessability, I'm interested in finding functions that efficiently diffuse changes across as much width as possible, with things like key dependence being lower priorities.

See: http://www.snell-pym.org.uk/archives/2005/07/04/avalanche-functions/

I posted my own further examinations of the S-box layering later:

http://www.snell-pym.org.uk/archives/2006/09/24/cryptanalysis-2/#more-311

]]>In option 1, you're adding one extra round of S-boxes and diffusion. This buys you an extra avalanche, and therefore a lot more complexity for the attacker. Essentially you've got a three round block cipher with two complete avalanches. No matter how strong your S-boxes, the small number of avalanches will leave it vulnerable to differential cryptanalysis (and possibly other exotic attacks, like impossible linear, and meet in the middle). Your best bet is to have many rounds, so as to increase your total number of avalanches. For me, I'm not satisfied with anything less than 6 complete avalanches. I believe DES had about 5.5, and Blowfish has 7.5. In your case, it takes two passes of S-boxes for the first avalanche, and then one avalanche per round thereafter. However, no matter how many avalanches you have, a 32-bit block cipher is susceptible to a complete "codebook" attack where the interceptor just records plaintext/ciphertext combos for all possible 32-bit inputs -- that table would fit on your iPod several times over.

So option 2 is much stronger. You've now got a 64-bit block cipher. After three rounds you have a complete avalanche, and then every two rounds thereafter (I think). With a 16 round cipher you'd have something like 7.5 full avalanches. Have you considered using several 16-bit S-boxes? In this particular design, 4 separate 16-bit S-boxes would fit quite naturally. Lots more key setup time, but then the attacker is faced with a very, very hard task.

p.s. I loved how you deduced that XORing inputs into an S-box doesn't make the S-box any stronger. In fact, if your S-box is 16 bits and based on only on the user key, then your S-box contains billions of times as much information as your subkeys. Mad entropy, like you said. With key dependent S-boxes, you can scrap subkeys -- they don't add any non-linearity or diffusion, as you so cleanly stated. In my own ciphers, which tend to be based on key dependent S-boxes, I often skip subkeys.

]]>