Category: Computing

Processor architecture (by )

The current state of the art in processor design seems to be a reasonably complex instruction set, which is interpreted by a thing that translates it into a series of more primitive instructions which are then fed into some kind of multiple-issue pipelined thingy with speculative execution. You know, the kind of stuff x86 has been since the 386. 64-bit instructions, vector SIMD instructions, lots of cores and all that are just variations on the theme.

I'm sure this is a local maximum in the space of processor designs. So few of the transistors on each chip seem to be actual ALU doing something useful. All this translation and pipeline control seems to be a lot of logic that's just adapting to the impedance mismatch between the ALUs and instruction set...

So, I'm always interested in more exotic processor architectures, and there's two different threads I'd love to explore (as in, design and simulate in an FPGA) if I had time. The common theme is simple control logic; this means you can fit in more ALUs, or wide ALUs and registers, in the same space - or just fit more cores and more cache on the same die.

Zero-operand stack machines

The idea here is to use a stack instead of a register file. This means that instructions just need an operator (eg, "add") as the operands are implicit - the stack always provides the inputs and outputs. This means that the instructions can be very small due to the lack of operands; generally, much smaller than a machine word, so each word loaded can have several instructions in. This can mean that the memory bandwidth required to feed the chip with instructions is reduced; and since the decode and control logic becomes very simple, you can sustain a high clock rate with minimal pipelining, so reducing the memory bandwidth consumed by instruction loads is handy.

That means you can't fit literals or static addresses inside instructions, though, so you need something like a "load immediate" instruction that fetches the next word from the instruction stream and pushes it, rather than treating it as instructions. If an instruction word contains several "load immediate" instructions, then that many subsequent words of instruction stream could be literals!

One example of this approach is a Minimal Instruction Set Computer, but the concept is broader than that. Large instruction sets can be easily supported.

The control logic boils down to loading an instruction word, then treating it as a FIFO of smaller instructions to execute while the next instruction word is loading. Most instructions just engage an ALU circuit hardwired to the top element or two of the stack, whose output becomes the new top of stack. A few might transfer data to/from a memory access unit or a register, including the instruction pointer to change the flow of control. Not many gates are needed to decode an instruction, leading to the short cycle time.

Instructions that can't complete in a single cycle present a problem, though. The use of a stack tends to mean that an instruction depends on the result of the previous instruction, so it's tricky to execute several instructions in parallel and thus make progress in the presence of weighty multiply/divide instructions or memory reads.

I can think of three ways of overcoming that, and you can combine all three:

Multiple stacks

The approach taken by the 4stack processor is to have four stacks, each with its own independent ALU. Each instruction word has an instruction for each ALU in, and they execute in parallel on each clock tick. Presumably, there's some means to transfer results between the stacks - I imagine a bus joining them, an instruction to pop a value from the stack onto the bus, and an instruction to push from the bus. The timings of the bus reads and writes are such that it's possible to have an instruction word with a pop->bus from one stack and push->bus for one or more stacks that do such a transfer in a single cycle.

Due to the synchrony of instructions feeding into each ALU, we can't "stall" an ALU. If one of them executes a weighty instruction or has a cache miss on a memory read, we either stall ALL the ALUs at once, or we mandate that certain instructions are followed by a fixed number of NOPs before another instruction can execute, to allow time for it to complete.

This puts the onus on the compiler to schedule instruction-level parallelism, and means that the compiler needs to know the precise timings (and number of ALUs) of the target CPU - we can't use the same instruction set for a broad range of implementations!

Result registers

Weighty instructions might not put their results straight on the stack; instead, the instruction might cause the inputs to be pulled from the stack and the instruction starts executing. When it completes, the result is latched into a result register, and a later instruction pushes the contents of the result register (stalling if it's not ready yet). This means that the instruction stream can get on with other stuff while the lengthy instructions run. However, it requires such multi-cycle instructions to inherently work differently; and it puts some onus on the compiler to know how many instructions to wait between starting these instructions and trying to access their results for best performance.

Virtual stack

Finally, we can virtualise the values on the stack. A division instruction, for example, might read two actual values from the stack and then push a token that means "Wait for the result coming from division unit 7". If the next instruction is an addition, then it would read that token and (say) a literal value from the next stack position; since one of the inputs is a token it can't execute yet, but it still assigns an addition ALU and loads the literal value. But it tells division unit 7 to, when it completes, push the result into port 1 of addition ALU 3; and it pushes a token that means "Wait for the result coming from addition ALU 3", and so on. Basically, rather than waiting for operations to complete so you can push a value to the stack, you can instead push a reference to an operation in progress; a cluster of ALUs and memory access units connected by suitable buses then becomes a kind of dataflow machine which is fed connections from the instruction stream, in effect taking the condensed zero-operand instruction stream and using it to assign dependencies between instructions, rather than using virtual registers to assign dependencies as in current CPU designs. But this requires the kind of complex control logic that I feel current CPU designs are drowning in.

Transport-triggered architecture

Another way to simplify control logic is to build your CPU as a bunch of modules with input and output ports. Arithmetic and logic operation modules have one or two inputs and a single output; a memory reader has an address input and a data output; a memory writer has address and data inputs and no outputs; registers have an input and an output; and so on.

Each instruction contains a few bits to control whether the instruction executes conditionally on bits from a flag register, then an output port to read from, and an input port to write the result to. The decoding consists of checking the conditional execution flags then either doing nothing, or pushing the input and output port IDs onto two address busses and toggling a strobe line that causes the output port to write its contents to a data bus, and the input port to load from it.

As with the zero-operand stack machines, the instructions are small, so can probably cram several into a machine word - maybe split into groups that share a single set of conditional execution bits, for even more compactness. These instructions are all operand and no operator!

To insert literal values in the instruction stream, one can again have an output port on the instruction fetch module, that when read pulls a literal value from the instruction stream and stops it from being interpreted as instructions.

The output of each module is a register, where a value appears as soon as it's ready and waits until it's read - so there's no need to explicitly store it in a general purpose register. However, the CPU might have a few general-purpose registers anyway to store stuff in, as well as the usual instruction pointer, flags, and machine control registers.

This makes it easy to exploit parallelism; the instruction stream can trigger lots of modules and then come back later to read their output registers. The compiler might need to know the cycles required to do various things and not read the outputs until they're ready, or there might be handshaking on the internal bus so that instructions stall until an output is ready, which makes it easier to deal with things like memory reads that can take widely varying numbers of cycles to complete. Even then, the compiler can still benefit from knowing cycle timings in order to schedule stuff better.

Modules could be pipelined. Rather than having four multipliers, you might have one that you can feed (say) four sets of inputs into and then, later, read the output register four times to get the results. The compiler might need to know how deep the pipeline is to avoid overflowing it with results; or the hardware spec might mandate that up to sixteen multiplies can be pipelined, and put a FIFO on the output register to make up the extra capacity needed beyond the number of pipeline stages it has.

The downside is that the compiler needs to know how many modules there are and what port numbers are wired up to what. This, again, makes it hard to have a single executable that can run on a wide range of implementations of the design.

However, this looks rather like the execution model behind the virtual-stack machine discussed above - so perhaps we could have a generic stack-based instruction set that is executed by a virtual stack to generate instructions for an underlying transport-triggered machine...

Modules could be quite complex; for instance, an index register module might comprise a register coupled directly to a memory access system. By accessing different input or output registers, it could update the contents of the register, or write to the memory address stored in the register, or read from the memory address in the register; and different input/output ports could be accessed that cause it to pre- or post-increment or -decrement the index register at the same time, allowing for efficient operations on contiguous blocks of memory. Also, the internal data bus might be arbitrarily wide, allowing ALUs to operate on, and registers to store, vectors of several machine words; modules that only operate on a single word at a time might sacrifice a few input-port-select bits in their instructions to select which word from the vector on the data bus to read into their input port.

To save on space taken up by literals, we can have a simple module with output ports that produce some useful constants (0, 1, -1); or dedicate a single bit of the instruction to selecting whether the input port number field specifies an input port, or is a literal to just load onto the data bus. An input port number will be much smaller than a machine word, so this will only cater for small literals, but most literals are small and we can fall back onto fetching an entire word from the instruction module for larger literals. We might want to sign-extend a small literal, however.

The data bus might become a bottleneck, but that's OK - we can have several of them, and make the instructions specify an input and output port number for each bus; we then trigger multiple transfers in each instruction cycle. This is very similar to having several instructions in a machine word, except that they execute in parallel rather than serial. We just now need to specify what happens if the same port is read or written by two parallel transfers!

Conclusions

A general theme with many of the above approaches is that the compiler ends up needing to know more about the details of the chip implementation, because the compiler is responsible for more scheduling.

Perhaps this is no bad thing - runtime code generation is becoming the norm anyway, and it would be possible to bootstrap the system by having an initial "minimal instruction set" which is standardised, and allows access to a description of the current chip architecture; the runtime code generator can then be compiled (using a compiler written in the minimal instruction set), and then the processor switched into normal mode. This might even be implemented by having a simple version of the stack architecture as a front-end processor that starts executing code while the main CPU is dormant; it then has an instruction that hands an initial instruction pointer value to the dormant main CPU and starts it up. Multicore systems would need only one front-end processor to bring the whole system up!

Another approach might be to have a transport-triggered architecture with a small set of guaranteed modules available at well-known port numbers in every implementation, with variation occurring in the rest of the port-number space. But this requires the instruction format to have enough bits for the port numbers to allow for the largest imaginable processor, leading to unnecessarily wide instructions for smaller devices. Perhaps this can be handled by having the instruction decoder support both standard narrow instructions and implementation-specific wider instructions, again starting off in standard mode and allowing switching to wide mode once the processor definition has been read and used to compile the compiler that can exploit the full capabilities.

Either way, I think that future processor architectures might be more tightly coupled to the compilers than we're used to.

Cool things I have worked on: Clustered analytic database (by )

In my last blog post, I talked about how I was involved in building a database optimised for low latency on single-record reads.

So it was a bit of a change to later work on a database optimised for high throughput on bulk querying! Rather than getting single records, it was all about finding all the records that match some criterion and doing something with them as fast as possible, where "fast" is measured in "records per second". When I started there, the minimum time to perform a trivial query (eg, SELECT 1) was about three seconds, let alone any that fetched any data. However, on the right hardware, it could process untold numbers of gigabytes of data per second once it had gotten going. That three-second minimal round trip time was negligible when dealing with queries that processed hundreds of terabytes of data in ten minutes. That said, one of my first projects was to fix some low-hanging fruit, which brought the minimal round trip time down to around 300 milliseconds. Read more »

The Hackspace Grand Opening (by )

Those of you who follow me on various social media will have seen the photos of Cheltenham Hackspace new premises being spruced up and that is because tomorrow (well today now!) is the grand opening 😀

Sunday 4th of December - if you are local then please pop along - hackspaces or createspaces are made by and for the community and we'd love to meet you 🙂

I'll be taking along steam punk and textile stuff to be working on and have plenty of spare so others can have a go! There is plenty to see and people to talk to and skill share with 🙂

Also I have been making craft videos for Advent as it turns out it is the tenth Christmas of Salaric Crafts How To Blog!!!

Of course I only have 2 vids up and it is now technically day 4 - annoyingly obsolete laptop is obsolete and is being a pain in the backside but I shall continue limping along with it and hopefully get the other two vids to you some time tomorrow!

I also have a craft fayre Monday at the Costa Coffee on Metz Way in Gloucester 🙂 Mainly taking Wiggly Pet Press stuff with a bit of Steam Punk etc...

Cool things I have worked on: Low-latency highly-available NoSQL data store (by )

I've worked on a bunch of cool things in the past; and since I'm always explaining them to people, I realised it'd be good if I just blogged about them so I can just share the link.

I was asked to help design and then build a database to back a social networking system. The main requirement was to be able to satisfy lots of single-record primary-key fetches from a single thread as quickly as possible, because a PHP web app would be requesting lots of data to build the page; we had to do a few hundred of these "random point queries" in a small fraction of a second to get the page rendered in time. It had to be able to scale sideways, by adding more machines to the cluster. It also needed to be able to update this data, but at a much lower load - updates and creations of single records happened when a user went and did them, but people browsing around the site would hit several pages, each of which would request hundreds of records. Read more »

Programming with state machines (by )

State machines are a really nice way of thinking about systems with state. As a notation for expressing imperative code, I much prefer this:

To this:

x = readchar();
while(x != EOF) {
   if(x == 'a') {
      doA1();
      x = readchar();
      while(x != EOF && x != 'x') {
        doA2(x);
        x = readchar();
      }
      doA3();
   } else if (x == 'b') {
      doB1();
      x = readchar();
      while(x != EOF && x != 'x') {
        doB2(x);
        x = readchar();
      }
      doB3();
   } else {
      doC();
      x = readchar();
   }
}

I've always wished I could write that sort of code as a state machine rather than a bunch of nested structured programming primitives. Although there is some charm to the idea of drawing a diagram on the computer and having that be compiled as executable code, though, that can also be fiddly and isn't well supported by conventional programming tools; so I'd rather actually describe my state machine in text. What I want is a syntax for writing state machines, rather than writing imperative code that implements the state machine.

For instance, the diagram I put above wasn't drawn by me; it was drawn by a tool called PlantUML from the following source code:

@startuml

state Processing {
      Ready --> A : a/doA1
      Ready --> B : b/doB1
      Ready --> Ready : not a or b/doC

      A --> A : not x/doA2
      A --> Ready : x/doA3

      B --> B : not x/doB2
      B --> Ready : x/doB3
}

[*] --> Ready

Processing --> [*] : EOF

@enduml

But the UML state diagram is meant for abstract modelling of systems, and lacks the precision and detail required for actual executable code. The state diagram doesn't make it clear that doA2 and doB2 need to be passed the character that was read, for instance. PlantUML's syntax, therefore, also lacks that kind of detail, so we can't just use that.

Also, in a programming situation, it would be nice to be able to express types of state machines: to be able to say that a group of state machines has some useful properties in common, defined by the "type". The type of a state machine defines some high-level states and what inputs or outputs are possible in each state; any state machine of that type has to follow that behaviour, even though it might break the states defined in the type into many substates. For instance, the example state machine above always accepts any character until it receives an EOF, then stops; that might be expressed as a type having a single Ready state with two transitions, one mapping an EOF to the end state and another mapping a character back to the Ready state, with no outputs. That type would be suitable for any state machine that accepts characters until they finish, so a generic "character input stream" might accept a state machine of that type to process them, without needing to know anything about As and Bs.

These thoughts have led me to the following model of a state machine that's suitable for use as part of a programming language (although I've not defined an actual syntax yet):

State machine type

A state machine type is a set of states. Each state has a name, and a list of message types which are acceptable in that state (I am assuming that the underlying type system is smart enough to include things like "character equal to a" as a type, rather than merely "character"). This list must be disjoint.

For each input message type, there's a list of possible next states and an output message type (which in our example above would be a void type, as we never output messages).

There is also an initial state. It's impossible for the initial state to be the target of a state transition on a message, so it only has transitions out to other states. Those transitions all represent possible constructors of this state machine, because the initial state represents nonexistance.

For instance, a type" for vending machine controller state machines might be (given a Result type which has two values, success and fail):

Initial state:
   input: (Create) output: (Void) next state: (Idle)

Idle state:
   input: (Coin(Value)) output: (Value) next state: (Credit)

Credit state:
   input: (Coin(Value)) output: (Value) next state: (Credit)
   input: (Refund) output: (Value) next state: (Idle)
   input: (Select(Product)) output: (success::Result) next state: (Idle)
   input: (Select(Product)) output: (fail::Result) next state: (Credit)

The Value objects output in response to Coin or Refund inputs happen to reflect the current credit in the machine, a fact which the state machine type alone can't represent (that would need to be explained in a design-by-contract invariant of some kind).

From this type can be derived the type of the state machine's transition functions, which in this case will be:

coin :: Idle -> Value -> (Credit,Value) |
        Credit -> Value -> (Credit,Value)
refund :: Credit -> (Idle,Value)
select :: Credit -> Product -> (success::Result,Idle)|(fail::Result,Credit)

Note that the Create transition isn't represented here, because this state machine can't actually be instantiated - it doesn't specify the behaviour enough. There's no way of saying which possible select result should happen, or of what values are returned as outputs. But the functions we have are automatically implemented by the state transition definition, and will cause the appropriate transitions to occur in any state machine that's a subtype of this one; every function accepts an instance of the state machine in a given state, and an input value, and returns a new instance of the state machine and the output value.

Subtyping

State machine types can be subtypes of other state machine types. A state in the parent type may be split into several states in the subtype; any transition to a state A in the parent type must be represented by a transition in the subtype to some substate of A, and all transitions possible from state A in the parent type must be represented by transitions from every substate of A in the child.

For instance, a subtype of the vending state machine might have an infinite number of substates of "Idle", one for each possible configuration of available things to vend inside the machine. If it vends three different products, then these substates might be called Idle(A,B,C) where A,B and C are nonnegative integers representing the stock of each item. The only transition out of "Idle" is when a coin is inserted, so every Idle(A,B,C) state must have a transition from a coin being inserted, returning a Value, and taking us into the Credit state.

However, our Credit state also tracks the stock, and also tracks the credit inserted so far - so we need an infinite set of states of the form Credit(A,B,C,X) where A,B,C are the stocks and X is another nonnegative integer for the number of pennies of credit. So the parent-type transition from Idle to Credit has to be replaced by a transition from every Idle(A,B,C) to Credit(A,B,C,X) where X is the Value of the coin inserted. All the transitions from Credit in the parent type need to be represented as transitions from every Credit(A,B,C,X) state to other Credit(A,B,C,X) or Idle(A,B,C) types as appropriate. But with that done, it can be shown that all the states and transitions of the parent type correspond to one or more in the subtype.

The one exception is the initial state - the creation transitions from that in the subtype need not correspond to those of the parent type.

Given an extension to the syntax to create "parameterised states" such as Idle(A,B,C), which are shorthand for a potentially infinite set of "concrete states" such as "Idle(4,0,1)", we might represent that subtype like so:

Initial state:
   input: (Create(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger))
     output: (Void) next state: (Idle(A,B,C))

Idle(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger)
   state implements Idle:
   input: (Coin(x::Value)) output: (x::Value) next state: (Credit(a,b,c,x))

Credit(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger,x::NonNegativeInteger)
   state implements Credit:
   input: (Coin(x'::Value)) output: ((x+x')::Value) next state: (Credit(a,b,c,x+x'))
   input: (Refund) output: (x::Value) next state: (Idle(a,b,c))

   input: (Select(Product==A)) when: a>0 output: (Success) next state: (Idle(a-1,b,c))
   input: (Select(Product==A)) when: a==0 output: (Fail) next state: (Credit(a,b,c,x))

   input: (Select(Product==B)) when: b>0 output: (Success) next state: (Idle(a,b-1,c))
   input: (Select(Product==B)) when: b==0 output: (Fail) next state: (Credit(a,b,c,x))

   input: (Select(Product==C)) when: c>0 output: (Success) next state: (Idle(a,b,c-1))
   input: (Select(Product==C)) when: c==0 output: (Fail) next state: (Credit(a,b,c,x))

(We really should have represented the stock as a Map(Product,NonNegativeInteger) type and then not duplicated the select rules...)

Converting a parent state into a parameterised subtype state is only one way of splitting a state in a subtype, too. The syntax above would permit the creation of separate states that all say "implements Credit", too. A very simple vending machine that can only hold one instance of one product might split Idle into two states, Empty and Full, for instance; and probably split Credit into CreditFull(Value) and CreditEmpty(Value) to represent the Credit state while also parameterising CreditFull and CreditEmpty with the current credit.

The above syntax for parameterised states can get unweildy if there's a lot of parameters, especially if generally only a few of them are changed in any given transition (imagine the above example of there were ten different products to keep track of). Therefore, as syntactic sugar, it makes sense for it to be possible to define parameters shared implicitly by one or more states. Transitions into those states from states outside the group need to explicitly specify all the parameter values, but transitions within that group can use a different syntax to represent the next state, which only specifies what parameters have changed. All the rest are passed in unchanged, automatically. It might look like this:

Initial state:
   input: (Create(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger))
    output: (Void)
    next state: (Idle(A,B,C))

parameters a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger {

Idle(a,b,c) state implements Idle:
   input: (Coin(x'::Value)) output: (x'::Value) next state: (Credit(x <- x'))

Credit(a,b,c,x::NonNegativeInteger) state implements Credit:
   input: (Coin(x'::Value)) output: ((x+x')::Value) next state: (Credit(x <- x+x'))
   input: (Refund) output: (x::Value) next state: (Idle())

   input: (Select(Product==A)) when: a>0 output: (Success) next state: (Idle(a <- a-1))
   input: (Select(Product==A)) when: a==0 output: (Fail) next state: (Credit())

   input: (Select(Product==B)) when: b>0 output: (Success) next state: (Idle(b <- b-1))
   input: (Select(Product==B)) when: b==0 output: (Fail) next state: (Credit())

   input: (Select(Product==C)) when: c>0 output: (Success) next state: (Idle(c <- c-1))
   input: (Select(Product==C)) when: c==0 output: (Fail) next state: (Credit())
}

Implementing state machine types

Note that we've suddenly started giving names to the Values flying around, and specifying an output value rather than just a type, and splitting transitions into different cases with disjoint boolean "when:" expressions. This, too, further constrains the type of the state machine; indeed, this machine can actually be implemented and will provide a simple model of a vending machine - although nothing will be vended.

This means that this new state machine's state transition functions will include a "create" function from the initial stock levels to an instance of Idle(NonNegativeInteger,NonNegativeInteger,NonNegativeInteger).

A real implementation could further subtype this, adding a uniquely-typed IO interface to the Idle and Credit states, passed into the create transition. The Success-returning select operations can then also use pure IO functions to mutate the IO context to one in which the physical mechanism of vending has happened.

But the implementation of a state machine exists in the expressions that return the output value, parameterise the next state, and the boolean expressions forming the "when:" guards. These are full expressions in the pure functional language I'm imagining this thing implemented on top of, so are fully Turing complete.

There is no syntactic distinction between a type and an implementation; a state machine type can be instantiated if all of its outputs have expressions giving them values (ok, singleton-typed outputs such as Void can be implied!). This is a bit like concrete versus abstract classes in an object-oriented programming language.

Chaining transitions

So far, we've only looked at transitions that accept an input value, return an output value and move to a new state. This means that state machine behaviour is defined entirely in terms of its reactions to external values. Since the values we generate for the output and the parameters of the new state are Turing-complete expressions, that is sufficient to describe any behaviour, but it's not always convenient to express complicated behaviour in an expression. Sometimes a complicated operation of a compound data structure such as a list would be better defined as a state machine in its own right.

In order to support that kind of thing, we can also support chaining transitions, which can drive state machine behaviour purely based on internal state, and so do not cause the generation of state transition functions. These come in three types.

The first kind is one that starts with an input value, but which then leads to a next state without producing a value. This next state must be a special "transient state", the rules for which are explained below.

The second kind is one from a transient state to another transient state. This has no input value and no output value.

The third kind is a transition from a transient state to a non-transient state, which produces an output value.

Transient states may not have any non-chaining transitions point to and from them. Every transient state must be referenced from at least one non-transient state through a chain of chaining transitions, starting with a kind-1 chaining transition then zero or more kind-2 chaining transitions. Every transient state must also lead to at least one non-transient state through a chain of zero or more kind-2 chaining transitions, then a kind-3 chaining transition.

Every such possible chain from a kind-1, through zero or more kind-2, to a final kind-3, is equivalent to a single transition from the initial to the final non-transient states; with the kind-1 transition defining the input value and the kind-3 transition defining the output value. They might be implemented as a set of state-transition functions that just tail-call each other, but the transient states are therefore "hidden" inside the state machine from external perspectives. Those state-transition functions are lexically local to the exposed state-transition functions implementing the equivalent non-chaining transition.

But although they are hidden from users of the state machine, a subtype of the state machine can indeed "see" them for purposes of splitting them into substates, adding transitions between them, etc.

Termination

The final thing I've not covered above is terminating state machines. There's an implicit "final" state available in every state machine, if at least one transition to it exists; no transitions from that state can exist. Any transition to the final state appears in the state transition functions as a function that returns the output value, but not paired with a new state machine instance, because there isn't one. It's gone.

It is entirely possible for a state machine to have no non-transient states apart from the initial and final states. For instance, a state machine that traverses a tree and sums up the labels on the nodes might be implemented in terms of transient parameterised states for the current position in the tree and the running total, with an kind-1 creation transition that accepts the tree as an input and a kind-3 finalisation transition that returns the count. This machine will therefore have a single state transition function - from Tree to Number. The actual state of the machine is entirely represented as state parameters passed between mutually recursive functions implementing the chaining transitions, and never appears as a value visible outside of that.

Conclusion

I think I've defined a workable semantic model for state machines that can be used as a way to define sets of pure functions between instances of state machines, that can handle both synchronous state machines driven by external events, and those that bumble around a data structure or iterate mathematical operations.

It needs a proper syntax; personally, I'd prefer one based on s-expressions or IRON, but suitably consistent forms could be defined for any language.

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales