My first aluminium welding project: A projector protector (by )

The Scout group I help out with meets in a hall with a projector suspended from the ceiling, and because of that, we're not allowed to play ball games; the projector's pretty exposed, and one whack from a ball would probably finish it.

So, I offered to make a metal "cage" to go around it, and decided that this would be a good project to learn aluminium welding on. Aluminium doesn't corrode, and is light for its strength; light is important for a thing that will be suspended in the air above children.

Welding aluminium is a bit different to welding steel. It's much more conductive of heat, and has a lower melting point; so when you have the bit you're welding hot enough to melt, that molten puddle spreads quickly. You need a lot of heat as it's being rapidly conducted away from the actual weld, but if you linger, the entire thing you're working on will melt... So, the result is that aluminium welds tend to look rather larger and chunkier than welds in steel, and you need to work quickly!

My plan was to make a wireframe cuboid, out of 10mm aluminium square tubing. I made the top and bottom rectangles, then joined them with verticals. So my first welds were the outer corners of the box, and I quickly found that it was all too easy to melt the entire corner into a puddle - but thankfully, I could then just build the weld metal up again and use my belt sander to flatten the result back into a decent corner. So you can't really tell, but the corners are pretty much solid aluminium now...

I finished it, but then spent quite some time worrying how to actually fix it up there. In the end, I made some L-shaped brackets (with a diagonal brace inside the corner). We fixed these to the pillar supporting the projector with hose clamps, and the bottom of the vertical arms rested nicely on the plate at the bottom of the pillar so it can't possibly slide down. The long arms were then drilled, and holes drilled in the corners of the top rectangle of the cage, so they could be joined by M5 bolts. This arrangement gave us some scope to adjust it as we installed it, which was essential as I didn't have exact measurements for the pillar...

Here's the end result:

Projecter cage side Projecter cage front

Projecter cage back Projecter cage wide

The yellowy string stuff is some paracord we added at the last minute, just in case a smaller ball manages to sneak in from the sides.

I grabbed a bit of video of myself and another leader (I was way too nervous to stand up on the wobbly scaffold platform!) figuring out how to attach the thing, but I need a more powerful computer to run Kdenlive so I can properly edit videos into a decent enough state to publish!

Physics has become boring (by )

A common trope in contemporary science fiction is a small team in a lab somewhere (be it academia, industry, or just a hobbyist in a basement) fiddling around and discovering some new science - usually, some way of travelling to other universes or something (and then a whole load of plot unfolds as they accidentally unleash some terrible evil or whatever).

But... that kind of thing just doesn't happen any more. Sure, back in the 1600s, you might sit in a lab and discover electricity; back then, many of the fundamental forces that bind the Universe together were ripe for the discovering. As the centuries passed, the easy stuff was cleaned up bit by bit; in the early twentieth century, relativity and quantum theory cleared up the last mysteries that were easily recreated on a hobbyist budget. There was some fun to be had in nuclear physics, which could be experimented with if you had institutional-level funds to build particle accelerators and atomic piles; but since then, all the really cool peeling-back-the-mysteries physics requires vast financial resources.

What, then, for the nerd in a basement? Not only is exciting physics beyond their grasp, but the escapism of science fiction is dulled by the difficulty of suspending disbelief when our plucky heroes rewire a microwave oven to generate a disruption field that prevents the formation of a super-wormhole to R'lyeh. There's no shortage of fun things that can be built to demonstrate known physics, such as a desktop fusion reactor; but that's just an engineering challenge. There's no new science. Wikipedia's excellent List of unsolved problems in physics is sadly short of things that can be explored from one's bedroom; they're mostly highly theoretical problems, for a start. Perhaps a better-funded amateur could tinker with high-temperature superconductors, but the others on that list seem to either require vast machinery, or just pondering mathematical solutions to problems...

Sudden Snow (by )

When we went to bed last night, it was raining hard. So I was pretty surprised when I got up in the morning to find the world covered in about ten centimetres of snow.

And even more surprised to find that the awning over the back door had fallen down, blocking it so I had to get out via another door to investigate:

Fallen awning

The weight of the snow had been too much, causing the bracket on the left to shear off:

Left-hand bracket

This, in turn, pulled the bracket on the right out of the wall:

Right-hand bracket

And the falling awning crushed our little table:

Ruined table

Jean and I managed to undo the surviving bolts and get it off the wall without further destruction. The wall is undamaged at the left bracket:

Left-hand bracket holes

But the bricks are cracked where the bracket pulled out at the right, and the mortar has plenty of cracking too, so this will need some work:

Right-hand bracket holes

Also, our outside light is mangled and just hanging on by the cables, and will need replacing:

Damaged outside light

Miscarriage (from the father’s eyes) (by )

My family is the single most important thing in my life. I grew up lonely - it was just my mother and I, and I always found portrayals of "typical family life" in popular media slightly painful to watch; I wanted that bustling house, full of children, with grandparents and aunts and uncles visiting. Sure, I'm mad about creating things; I love tinkering with computers and electronics and metalwork and DIY, and designing things around a table with my friends, but my biggest and best creation is my family.

So, I was delighted when, a couple of weeks ago, Sarah decided to do a pregnancy test; and it came out positive. We'd put if off for a while; we had some false starts when conceiving Mary, so we didn't dare get our hopes up too soon. We waited until it looked like the periods were definitely staying away, when it started to feel like if Sarah was pregnant we'd best be getting set up with a midwife and all that. I'd already been rather hopefully resting my hand on Sarah's belly on the sofa; but once the test came back positive, it was time to start snuggling down and talking to her baby. Partly soppiness on my part, and partly because I'm told that even after the first few weeks, babies start to learn their parents' voices (and it's never too early to start learning Lojban).

However, we only had a few days of that before things started to look a little wrong. I kept talking, telling the little grain of rice that there was lots to live for; it would have two siblings, three cats, two chickens, a mummy, a daddy, and a lovely house to live in, and I had so many wonderful things to show it. But the poor thing was probably already dead by then; a few days passed before everything got a bit medical, and I was carrying a bowl full of chunks of womb lining while a nurse wheeled Sarah through a hospital in a wheelchair, wondering if I was (in some grisly sense) at least getting to carry my child in my arms, for a while.

As is usual when things go wrong, Sarah was too ill to do anything, so I went into Caring Husband mode; looking after Jean and Mary, organising meals, cancelling things, supporting Sarah. I'm not shy about my emotions, as many male people are; but while there were things that needed doing and nobody else to do them, I didn't have the mental energy to feel them, so I just got on with it all. But once Sarah was home again and I didn't need to worry about her health all the time and things had settled down a bit and I had time to think about it (thanks to my lovely colleagues at both jobs, who covered for me), I finally had my chance to cry; Sarah was working on her picture, and wanted me to choose some colours for the rainbows in the lettering. I put together a spectrum (my choices are at the top of the first L, by the way), and I remembered being a little child, choosing that my favourite colour was blue; I went for sky blue and sunrise yellow at the time (although I've since moved towards darker, purplier, blues), and I wondered what this child's favourite colour would have been, and then all the pain came up and I cried on Sarah's shoulder. I still feel a lot of pain, but a good long sob helped me to heal a lot.

Now, I just want to find ways to record the existence of the little thing. When we had the initial scan, I wrote down the dimensions they told me on the appointment letter, because I feared that and the memory of seeing it, fleetingly, on a screen might be all we got to keep. When the ashes have been scattered, I'll try to find the memorial garden in Cheltenham, and go and visit.

I wrote a poem, in Lojban of course. I could translate it to English, but I'd have to either drop all the rich attitudinal indicators (which would make it rather boring) or try and explain them in English (which would make it long and convoluted), so I'll just leave it as-is.

.i .u'anaisai mi ba'ozi te tarbi
.i .uinai mi na ba pamjai le .iu tarbi
.i .uinai mi na ba tavla le .iu tarbi
.i .uinai mi na ba bevri le .iu tarbi
.i .a'onai mi na ba zgana lo nu le .iu tarbi cu ke banro
   joi cisma
   joi klaku
   joi cadzu
   joi tavla
   joi prami
   joi jmive

.i le .iu tarbi na ba djuno fi mi

.i co'o tarbi

Processor architecture (by )

The current state of the art in processor design seems to be a reasonably complex instruction set, which is interpreted by a thing that translates it into a series of more primitive instructions which are then fed into some kind of multiple-issue pipelined thingy with speculative execution. You know, the kind of stuff x86 has been since the 386. 64-bit instructions, vector SIMD instructions, lots of cores and all that are just variations on the theme.

I'm sure this is a local maximum in the space of processor designs. So few of the transistors on each chip seem to be actual ALU doing something useful. All this translation and pipeline control seems to be a lot of logic that's just adapting to the impedance mismatch between the ALUs and instruction set...

So, I'm always interested in more exotic processor architectures, and there's two different threads I'd love to explore (as in, design and simulate in an FPGA) if I had time. The common theme is simple control logic; this means you can fit in more ALUs, or wide ALUs and registers, in the same space - or just fit more cores and more cache on the same die.

Zero-operand stack machines

The idea here is to use a stack instead of a register file. This means that instructions just need an operator (eg, "add") as the operands are implicit - the stack always provides the inputs and outputs. This means that the instructions can be very small due to the lack of operands; generally, much smaller than a machine word, so each word loaded can have several instructions in. This can mean that the memory bandwidth required to feed the chip with instructions is reduced; and since the decode and control logic becomes very simple, you can sustain a high clock rate with minimal pipelining, so reducing the memory bandwidth consumed by instruction loads is handy.

That means you can't fit literals or static addresses inside instructions, though, so you need something like a "load immediate" instruction that fetches the next word from the instruction stream and pushes it, rather than treating it as instructions. If an instruction word contains several "load immediate" instructions, then that many subsequent words of instruction stream could be literals!

One example of this approach is a Minimal Instruction Set Computer, but the concept is broader than that. Large instruction sets can be easily supported.

The control logic boils down to loading an instruction word, then treating it as a FIFO of smaller instructions to execute while the next instruction word is loading. Most instructions just engage an ALU circuit hardwired to the top element or two of the stack, whose output becomes the new top of stack. A few might transfer data to/from a memory access unit or a register, including the instruction pointer to change the flow of control. Not many gates are needed to decode an instruction, leading to the short cycle time.

Instructions that can't complete in a single cycle present a problem, though. The use of a stack tends to mean that an instruction depends on the result of the previous instruction, so it's tricky to execute several instructions in parallel and thus make progress in the presence of weighty multiply/divide instructions or memory reads.

I can think of three ways of overcoming that, and you can combine all three:

Multiple stacks

The approach taken by the 4stack processor is to have four stacks, each with its own independent ALU. Each instruction word has an instruction for each ALU in, and they execute in parallel on each clock tick. Presumably, there's some means to transfer results between the stacks - I imagine a bus joining them, an instruction to pop a value from the stack onto the bus, and an instruction to push from the bus. The timings of the bus reads and writes are such that it's possible to have an instruction word with a pop->bus from one stack and push->bus for one or more stacks that do such a transfer in a single cycle.

Due to the synchrony of instructions feeding into each ALU, we can't "stall" an ALU. If one of them executes a weighty instruction or has a cache miss on a memory read, we either stall ALL the ALUs at once, or we mandate that certain instructions are followed by a fixed number of NOPs before another instruction can execute, to allow time for it to complete.

This puts the onus on the compiler to schedule instruction-level parallelism, and means that the compiler needs to know the precise timings (and number of ALUs) of the target CPU - we can't use the same instruction set for a broad range of implementations!

Result registers

Weighty instructions might not put their results straight on the stack; instead, the instruction might cause the inputs to be pulled from the stack and the instruction starts executing. When it completes, the result is latched into a result register, and a later instruction pushes the contents of the result register (stalling if it's not ready yet). This means that the instruction stream can get on with other stuff while the lengthy instructions run. However, it requires such multi-cycle instructions to inherently work differently; and it puts some onus on the compiler to know how many instructions to wait between starting these instructions and trying to access their results for best performance.

Virtual stack

Finally, we can virtualise the values on the stack. A division instruction, for example, might read two actual values from the stack and then push a token that means "Wait for the result coming from division unit 7". If the next instruction is an addition, then it would read that token and (say) a literal value from the next stack position; since one of the inputs is a token it can't execute yet, but it still assigns an addition ALU and loads the literal value. But it tells division unit 7 to, when it completes, push the result into port 1 of addition ALU 3; and it pushes a token that means "Wait for the result coming from addition ALU 3", and so on. Basically, rather than waiting for operations to complete so you can push a value to the stack, you can instead push a reference to an operation in progress; a cluster of ALUs and memory access units connected by suitable buses then becomes a kind of dataflow machine which is fed connections from the instruction stream, in effect taking the condensed zero-operand instruction stream and using it to assign dependencies between instructions, rather than using virtual registers to assign dependencies as in current CPU designs. But this requires the kind of complex control logic that I feel current CPU designs are drowning in.

Transport-triggered architecture

Another way to simplify control logic is to build your CPU as a bunch of modules with input and output ports. Arithmetic and logic operation modules have one or two inputs and a single output; a memory reader has an address input and a data output; a memory writer has address and data inputs and no outputs; registers have an input and an output; and so on.

Each instruction contains a few bits to control whether the instruction executes conditionally on bits from a flag register, then an output port to read from, and an input port to write the result to. The decoding consists of checking the conditional execution flags then either doing nothing, or pushing the input and output port IDs onto two address busses and toggling a strobe line that causes the output port to write its contents to a data bus, and the input port to load from it.

As with the zero-operand stack machines, the instructions are small, so can probably cram several into a machine word - maybe split into groups that share a single set of conditional execution bits, for even more compactness. These instructions are all operand and no operator!

To insert literal values in the instruction stream, one can again have an output port on the instruction fetch module, that when read pulls a literal value from the instruction stream and stops it from being interpreted as instructions.

The output of each module is a register, where a value appears as soon as it's ready and waits until it's read - so there's no need to explicitly store it in a general purpose register. However, the CPU might have a few general-purpose registers anyway to store stuff in, as well as the usual instruction pointer, flags, and machine control registers.

This makes it easy to exploit parallelism; the instruction stream can trigger lots of modules and then come back later to read their output registers. The compiler might need to know the cycles required to do various things and not read the outputs until they're ready, or there might be handshaking on the internal bus so that instructions stall until an output is ready, which makes it easier to deal with things like memory reads that can take widely varying numbers of cycles to complete. Even then, the compiler can still benefit from knowing cycle timings in order to schedule stuff better.

Modules could be pipelined. Rather than having four multipliers, you might have one that you can feed (say) four sets of inputs into and then, later, read the output register four times to get the results. The compiler might need to know how deep the pipeline is to avoid overflowing it with results; or the hardware spec might mandate that up to sixteen multiplies can be pipelined, and put a FIFO on the output register to make up the extra capacity needed beyond the number of pipeline stages it has.

The downside is that the compiler needs to know how many modules there are and what port numbers are wired up to what. This, again, makes it hard to have a single executable that can run on a wide range of implementations of the design.

However, this looks rather like the execution model behind the virtual-stack machine discussed above - so perhaps we could have a generic stack-based instruction set that is executed by a virtual stack to generate instructions for an underlying transport-triggered machine...

Modules could be quite complex; for instance, an index register module might comprise a register coupled directly to a memory access system. By accessing different input or output registers, it could update the contents of the register, or write to the memory address stored in the register, or read from the memory address in the register; and different input/output ports could be accessed that cause it to pre- or post-increment or -decrement the index register at the same time, allowing for efficient operations on contiguous blocks of memory. Also, the internal data bus might be arbitrarily wide, allowing ALUs to operate on, and registers to store, vectors of several machine words; modules that only operate on a single word at a time might sacrifice a few input-port-select bits in their instructions to select which word from the vector on the data bus to read into their input port.

To save on space taken up by literals, we can have a simple module with output ports that produce some useful constants (0, 1, -1); or dedicate a single bit of the instruction to selecting whether the input port number field specifies an input port, or is a literal to just load onto the data bus. An input port number will be much smaller than a machine word, so this will only cater for small literals, but most literals are small and we can fall back onto fetching an entire word from the instruction module for larger literals. We might want to sign-extend a small literal, however.

The data bus might become a bottleneck, but that's OK - we can have several of them, and make the instructions specify an input and output port number for each bus; we then trigger multiple transfers in each instruction cycle. This is very similar to having several instructions in a machine word, except that they execute in parallel rather than serial. We just now need to specify what happens if the same port is read or written by two parallel transfers!

Conclusions

A general theme with many of the above approaches is that the compiler ends up needing to know more about the details of the chip implementation, because the compiler is responsible for more scheduling.

Perhaps this is no bad thing - runtime code generation is becoming the norm anyway, and it would be possible to bootstrap the system by having an initial "minimal instruction set" which is standardised, and allows access to a description of the current chip architecture; the runtime code generator can then be compiled (using a compiler written in the minimal instruction set), and then the processor switched into normal mode. This might even be implemented by having a simple version of the stack architecture as a front-end processor that starts executing code while the main CPU is dormant; it then has an instruction that hands an initial instruction pointer value to the dormant main CPU and starts it up. Multicore systems would need only one front-end processor to bring the whole system up!

Another approach might be to have a transport-triggered architecture with a small set of guaranteed modules available at well-known port numbers in every implementation, with variation occurring in the rest of the port-number space. But this requires the instruction format to have enough bits for the port numbers to allow for the largest imaginable processor, leading to unnecessarily wide instructions for smaller devices. Perhaps this can be handled by having the instruction decoder support both standard narrow instructions and implementation-specific wider instructions, again starting off in standard mode and allowing switching to wide mode once the processor definition has been read and used to compile the compiler that can exploit the full capabilities.

Either way, I think that future processor architectures might be more tightly coupled to the compilers than we're used to.

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales