Cool things I have worked on: Low-latency highly-available NoSQL data store (by )

I've worked on a bunch of cool things in the past; and since I'm always explaining them to people, I realised it'd be good if I just blogged about them so I can just share the link.

I was asked to help design and then build a database to back a social networking system. The main requirement was to be able to satisfy lots of single-record primary-key fetches from a single thread as quickly as possible, because a PHP web app would be requesting lots of data to build the page; we had to do a few hundred of these "random point queries" in a small fraction of a second to get the page rendered in time. It had to be able to scale sideways, by adding more machines to the cluster. It also needed to be able to update this data, but at a much lower load - updates and creations of single records happened when a user went and did them, but people browsing around the site would hit several pages, each of which would request hundreds of records.

And it needed to be highly available. The app was to be spread over two datacentres for redundancy purposes, and as long as at least one server was reachable, the whole thing had to keep working as much as possible. We couldn't use "quorum" systems where a majority of the nodes needed to be reachable to make an update, for instance.

This, combined with the projected dataset size being a few terabytes at most, meant we decided to go for full replication. Every node had a copy of the database, and rather than being a daemon you spoke to over a socket like a conventional database, we'd write a shared library that would be loaded into the PHP process, so as to avoid context switch overheads for each query. The shared library would read the data files directly from disk.

The on-disk format was B-trees, but with a multi version concurrency control setup: update transactions (more on those later) would write all the new data into unused space until it was ready, so reads could still be happening while an update was being prepared, so as not to delay those all-important reads.

Updates were done with a reliable multicast protocol. The shared library would broadcast up dates to all nodes currently reachable from the process sending the update. Those updates were received by a daemon process running on every node, which took updates from the network and applied them to the local disk storage. But there were many devils in the details here.

Because every daemon had to process the entire update load of the system in parallel, they had to try and maximise the use of disk bandwidth. We couldn't afford to be rushing around updating every record with its own transaction, with all the sync overheads; we had to batch them somehow to amortize the overheads of committing them to disk reliably. Also, as updates were handled asynchronously and sent from a large pool of PHP processes, we needed to stop the PHP application from sending updates faster than we could process them and overrunning the network buffers. And just to make matters worse, updates could come from the network in unusual orders for various reasons, but we needed to apply them in a consistent order on every node so multiple updates to the same record all produced the same result in the end.

So, every daemon had an update queue. Its highest priority was pulling updates off the network and into the queue; its second priority was processing updates from the queue. But the queue wasn't just a FIFO. We gave every update a sequence number when it was generated, from a sequence number counter in shared memory on every node; and to ensure global consistency, we implemented Lamport timestamps - when an update was received from another node, we immediately updated the local sequence number counter to make sure it was more than the received sequence number, so that all updates issued from a node with knowledge of a given update would carry later timestamps, thereby ensuring that any transaction which read a current value and then updated that to a new value would be applied "later" than the update that provided the old current value.

So our queue ordered incoming updates by sequence number (using the ID of the issuing node as a tie breaker, to ensure a global ordering). We also stored the update sequence number on every record on disk, and refused to update a record to an "older" value, to deal with re-orderings over a longer time period than those which met in the queue. The queue also noticed updates to the same record group and merged them into a single update, always replacing older data with newer according to the sequence numbers. This meant that a flurry of updates to the same record would, as long as they all arrived before the queue was flushed, all "coalesce" into a single update to the on-disk data.

Wait, I said "record group"; what's that? Well, in the data model, often a bunch of records would represent parts of one object. Imagine an invoice in the traditional SQL format, with an invoice record and a bunch of line item records that are components of it. We encouraged the use of small records, as we updated records by providing a new value for the whole thing, and lots of small records meant that finer-grained bits of a record can be updated in parallel without trashing each other's updates. So as part of the "schema" we defined a mapping from record primary keys to record groups; many records would go in the same group, and was stored as a single entry in the on-disk B-tree, so we wanted to coalesce updates to the same record group into one, thereby getting more records updated in a single read-modify-write cycle.

Under the hood, the queue was two data structures - a priority heap full of record-group update lists, ordered by sequence number; and a hash table from record-group IDs to the same update lists. Incoming updates were mapped to record-group IDs and looked up in the hash; if an update to that group already existed then the update would be merged into it, and if the sequence number was older than the one already in that queue entry, we'd move the entry up the queue by re-inserting it into the priority heap. Otherwise, we'd create a new entry with a single update to the record group, and pop it in the hash and the heap. Keeping the two in synch in all cases was a bit of work, but once we'd covered all the cases, it looked after itself.

The queue also helped us prevent overload, too. Every node measured how big the backlog was, in three different ways: There was a maximum number of record group updates we wanted to hold in the queue, a maximum number of bytes of new record data in the queue, and a maximum average latency (in seconds, between the PHP code requesting the update and it getting written to disk on the node). We tracked the latter by putting wall-clock timestamps in the update messages and keeping an exponentially-weighted moving average of the difference (like a UNIX load average), and the former two came directly from the queue data structure. On each node, we compared each of the three measurables to our target and scaled it to a 0-100% capacity measure; we then took the worst of the three as the backlog score for that node, a measure of how "far behind" it was.

These were broadcast to the other nodes in the network, to compute an overall "worst backlog of any node" for the entire cluster. We were a bit clever with this - if a node's backlog wasn't the worst, then there was no point in it telling any other nodes about it, so only the nodes that were at the worse backlog level in the system were broadcasting their state. Again, there were lots of edge cases in this, which we carefully handled to make sure the system as a whole knew the worst backlog state of any node in the system, with the minimum of network noise.

So each node now knew how backlogged the cluster as a whole was; and we used that information to apply back pressure to the PHP clients. The backlog score was stored in the shared memory segment, and the function that was called from PHP to send an update consulted it. If the backlog was more than a minimum, then gentle pressure was applied to slow the PHP application down - before sending the message, we would sleep for a duration that depended upon the backlog level. If it was at or more than 100%, we'd actually sleep in a loop until it abated; but otherwise, we'd sleep for a time that increased smoothly from zero to infinity as the backlog went from 0% to 100% (accounting for the situation where the backlog dropped while we were sleeping, to a point where we'd already slept enough). This meant that if the PHP app was trying to generate writes faster than we could get them to disk, there'd be a certain amount of "stretchiness" in the system as the implicit queue in the network and our update queue took up the slack. But if the update queue kept growing, the PHP app would face increasing delays in getting updates applied. This created a negative feedback loop (as there was a finite limit to how many PHP processes could exist across the cluster), and we tuned the shape of the feedback curve such that the system would settle to a stable backlog level (and, thus, a stable sleep duration, and thus a stable write rate) rather than oscillating.

This meant that when the PHP code did an update, it would be some time before that update was visible on every node. For some parts of the system, that was fine, but for other parts, an immediate response to updates was required, so we gave the caller of the read function a choice of "consistency level". At the most basic and fastest level, it would just check the local state, and potentially miss out on updates that were still flying through the network. At the "immediate" level, however, it would first hash the record key and use that to select a "primary node" for that record. That node would be responsible for storing, in-memory, the latest state of that record if it was currently being updated; because the update function also offered a consistency level setting. The most basic one just send the update out onto the network, and the "immediate" level first sent the update to the primary node's in-memory update buffer. The use of specific nodes for this threatened our reliability guarantee; if that node was unavailable, or crashed and restarted while an update was in progress, we'd lose the update from the buffer - but it would still proceed through the normal mechanism. So if updates and reads to a record both requested "immediate" consistency, we would not see update latency (except when nodes failed), but at the cost of network round trips harming the latency.

Updates had a third consistency level, too, which was meant for creating new records with an external primary key (such as a user record, keyed by email address): if this was selected, then a two-phase commit process was used. The record creation would be proposed on every reachable node, which would cause that primary key value to be marked as reserved in the database, or the proposal would be rejected if it already existed or was already reserved. If a majority of the nodes in the system were reachable and OK with the update, then it would be actually performed; otherwise, it would be rejected. These were slow, and we tried to avoid them, but some parts of the application required that level of atomic global consensus.

You might be worrying that these asynchronous updates could get lost between a PHP process sending them to the network and at least one node receiving them, but all updates were received by the local node intrinsically; and once an update is on one node, it'll get to all the others. The reliable multicast protocol lets each node know if it's missed some updates from others (by using a per-node monotonic sequence number on every message, and looking for gaps). If that happens, it would try and retransmit if they were still in the send buffers; if not, then a network break would be declared, and the nodes would know they'd missed some updates from some other nodes. This would start a process by which all nodes that had missed some updates would declare the range of global sequence numbers they seemed to be missing, and the nodes that had updates in that range would then re-transmit them all (by using a special index from update sequence number to record across all tables). They would be retransmitted with the original sequence numbers, not new ones, so no matter what order which nodes got them in, they would eventually get to all nodes and end up in the same global ordering, so every node would end up with the same final state.

A node that starts up from a reboot or from fresh used the same process - the knew the last sequence numbers they'd been fully in synch with the cluster at, so they'd be able to request the latest state of all records updated since then. A totally new node would have seen no updates, so would request all records changed since timestamp zero, which would be all records. In these cases, the node would request that the playback be unicast to it alone, as opposed to the network partition case where nodes would cooperate to request broadcasts of all the missed updates to get the whole network back together.

And so we had a system that could (on my 2008-era laptop in a VM) do several thousand random record reads per second from a single thread, while handling bursty asynchronous updates of a thousand or so records per second (many many more if they all happened to fall into a smaller number of record groups!), while nodes came and went from the cluster, and the cluster was occasionally split into arbitrary sub-clusters due to network link failures. When the cluster was working properly, it would provide a completely consistent view of the database; but when the cluster was in a failure state, all nodes could service reads and updates (except the super-consistent unique primary key creation updates, which would all fail if a quorum of nodes couldn't be contacted to agree), but updates could then be delayed. I've not seen a system since that can provide that level of service.

Programming with state machines (by )

State machines are a really nice way of thinking about systems with state. As a notation for expressing imperative code, I much prefer this:

To this:

x = readchar();
while(x != EOF) {
   if(x == 'a') {
      doA1();
      x = readchar();
      while(x != EOF && x != 'x') {
        doA2(x);
        x = readchar();
      }
      doA3();
   } else if (x == 'b') {
      doB1();
      x = readchar();
      while(x != EOF && x != 'x') {
        doB2(x);
        x = readchar();
      }
      doB3();
   } else {
      doC();
      x = readchar();
   }
}

I've always wished I could write that sort of code as a state machine rather than a bunch of nested structured programming primitives. Although there is some charm to the idea of drawing a diagram on the computer and having that be compiled as executable code, though, that can also be fiddly and isn't well supported by conventional programming tools; so I'd rather actually describe my state machine in text. What I want is a syntax for writing state machines, rather than writing imperative code that implements the state machine.

For instance, the diagram I put above wasn't drawn by me; it was drawn by a tool called PlantUML from the following source code:

@startuml

state Processing {
      Ready --> A : a/doA1
      Ready --> B : b/doB1
      Ready --> Ready : not a or b/doC

      A --> A : not x/doA2
      A --> Ready : x/doA3

      B --> B : not x/doB2
      B --> Ready : x/doB3
}

[*] --> Ready

Processing --> [*] : EOF

@enduml

But the UML state diagram is meant for abstract modelling of systems, and lacks the precision and detail required for actual executable code. The state diagram doesn't make it clear that doA2 and doB2 need to be passed the character that was read, for instance. PlantUML's syntax, therefore, also lacks that kind of detail, so we can't just use that.

Also, in a programming situation, it would be nice to be able to express types of state machines: to be able to say that a group of state machines has some useful properties in common, defined by the "type". The type of a state machine defines some high-level states and what inputs or outputs are possible in each state; any state machine of that type has to follow that behaviour, even though it might break the states defined in the type into many substates. For instance, the example state machine above always accepts any character until it receives an EOF, then stops; that might be expressed as a type having a single Ready state with two transitions, one mapping an EOF to the end state and another mapping a character back to the Ready state, with no outputs. That type would be suitable for any state machine that accepts characters until they finish, so a generic "character input stream" might accept a state machine of that type to process them, without needing to know anything about As and Bs.

These thoughts have led me to the following model of a state machine that's suitable for use as part of a programming language (although I've not defined an actual syntax yet):

State machine type

A state machine type is a set of states. Each state has a name, and a list of message types which are acceptable in that state (I am assuming that the underlying type system is smart enough to include things like "character equal to a" as a type, rather than merely "character"). This list must be disjoint.

For each input message type, there's a list of possible next states and an output message type (which in our example above would be a void type, as we never output messages).

There is also an initial state. It's impossible for the initial state to be the target of a state transition on a message, so it only has transitions out to other states. Those transitions all represent possible constructors of this state machine, because the initial state represents nonexistance.

For instance, a type" for vending machine controller state machines might be (given a Result type which has two values, success and fail):

Initial state:
   input: (Create) output: (Void) next state: (Idle)

Idle state:
   input: (Coin(Value)) output: (Value) next state: (Credit)

Credit state:
   input: (Coin(Value)) output: (Value) next state: (Credit)
   input: (Refund) output: (Value) next state: (Idle)
   input: (Select(Product)) output: (success::Result) next state: (Idle)
   input: (Select(Product)) output: (fail::Result) next state: (Credit)

The Value objects output in response to Coin or Refund inputs happen to reflect the current credit in the machine, a fact which the state machine type alone can't represent (that would need to be explained in a design-by-contract invariant of some kind).

From this type can be derived the type of the state machine's transition functions, which in this case will be:

coin :: Idle -> Value -> (Credit,Value) |
        Credit -> Value -> (Credit,Value)
refund :: Credit -> (Idle,Value)
select :: Credit -> Product -> (success::Result,Idle)|(fail::Result,Credit)

Note that the Create transition isn't represented here, because this state machine can't actually be instantiated - it doesn't specify the behaviour enough. There's no way of saying which possible select result should happen, or of what values are returned as outputs. But the functions we have are automatically implemented by the state transition definition, and will cause the appropriate transitions to occur in any state machine that's a subtype of this one; every function accepts an instance of the state machine in a given state, and an input value, and returns a new instance of the state machine and the output value.

Subtyping

State machine types can be subtypes of other state machine types. A state in the parent type may be split into several states in the subtype; any transition to a state A in the parent type must be represented by a transition in the subtype to some substate of A, and all transitions possible from state A in the parent type must be represented by transitions from every substate of A in the child.

For instance, a subtype of the vending state machine might have an infinite number of substates of "Idle", one for each possible configuration of available things to vend inside the machine. If it vends three different products, then these substates might be called Idle(A,B,C) where A,B and C are nonnegative integers representing the stock of each item. The only transition out of "Idle" is when a coin is inserted, so every Idle(A,B,C) state must have a transition from a coin being inserted, returning a Value, and taking us into the Credit state.

However, our Credit state also tracks the stock, and also tracks the credit inserted so far - so we need an infinite set of states of the form Credit(A,B,C,X) where A,B,C are the stocks and X is another nonnegative integer for the number of pennies of credit. So the parent-type transition from Idle to Credit has to be replaced by a transition from every Idle(A,B,C) to Credit(A,B,C,X) where X is the Value of the coin inserted. All the transitions from Credit in the parent type need to be represented as transitions from every Credit(A,B,C,X) state to other Credit(A,B,C,X) or Idle(A,B,C) types as appropriate. But with that done, it can be shown that all the states and transitions of the parent type correspond to one or more in the subtype.

The one exception is the initial state - the creation transitions from that in the subtype need not correspond to those of the parent type.

Given an extension to the syntax to create "parameterised states" such as Idle(A,B,C), which are shorthand for a potentially infinite set of "concrete states" such as "Idle(4,0,1)", we might represent that subtype like so:

Initial state:
   input: (Create(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger))
     output: (Void) next state: (Idle(A,B,C))

Idle(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger)
   state implements Idle:
   input: (Coin(x::Value)) output: (x::Value) next state: (Credit(a,b,c,x))

Credit(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger,x::NonNegativeInteger)
   state implements Credit:
   input: (Coin(x'::Value)) output: ((x+x')::Value) next state: (Credit(a,b,c,x+x'))
   input: (Refund) output: (x::Value) next state: (Idle(a,b,c))

   input: (Select(Product==A)) when: a>0 output: (Success) next state: (Idle(a-1,b,c))
   input: (Select(Product==A)) when: a==0 output: (Fail) next state: (Credit(a,b,c,x))

   input: (Select(Product==B)) when: b>0 output: (Success) next state: (Idle(a,b-1,c))
   input: (Select(Product==B)) when: b==0 output: (Fail) next state: (Credit(a,b,c,x))

   input: (Select(Product==C)) when: c>0 output: (Success) next state: (Idle(a,b,c-1))
   input: (Select(Product==C)) when: c==0 output: (Fail) next state: (Credit(a,b,c,x))

(We really should have represented the stock as a Map(Product,NonNegativeInteger) type and then not duplicated the select rules...)

Converting a parent state into a parameterised subtype state is only one way of splitting a state in a subtype, too. The syntax above would permit the creation of separate states that all say "implements Credit", too. A very simple vending machine that can only hold one instance of one product might split Idle into two states, Empty and Full, for instance; and probably split Credit into CreditFull(Value) and CreditEmpty(Value) to represent the Credit state while also parameterising CreditFull and CreditEmpty with the current credit.

The above syntax for parameterised states can get unweildy if there's a lot of parameters, especially if generally only a few of them are changed in any given transition (imagine the above example of there were ten different products to keep track of). Therefore, as syntactic sugar, it makes sense for it to be possible to define parameters shared implicitly by one or more states. Transitions into those states from states outside the group need to explicitly specify all the parameter values, but transitions within that group can use a different syntax to represent the next state, which only specifies what parameters have changed. All the rest are passed in unchanged, automatically. It might look like this:

Initial state:
   input: (Create(a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger))
    output: (Void)
    next state: (Idle(A,B,C))

parameters a::NonNegativeInteger,b::NonNegativeInteger,c::NonNegativeInteger {

Idle(a,b,c) state implements Idle:
   input: (Coin(x'::Value)) output: (x'::Value) next state: (Credit(x <- x'))

Credit(a,b,c,x::NonNegativeInteger) state implements Credit:
   input: (Coin(x'::Value)) output: ((x+x')::Value) next state: (Credit(x <- x+x'))
   input: (Refund) output: (x::Value) next state: (Idle())

   input: (Select(Product==A)) when: a>0 output: (Success) next state: (Idle(a <- a-1))
   input: (Select(Product==A)) when: a==0 output: (Fail) next state: (Credit())

   input: (Select(Product==B)) when: b>0 output: (Success) next state: (Idle(b <- b-1))
   input: (Select(Product==B)) when: b==0 output: (Fail) next state: (Credit())

   input: (Select(Product==C)) when: c>0 output: (Success) next state: (Idle(c <- c-1))
   input: (Select(Product==C)) when: c==0 output: (Fail) next state: (Credit())
}

Implementing state machine types

Note that we've suddenly started giving names to the Values flying around, and specifying an output value rather than just a type, and splitting transitions into different cases with disjoint boolean "when:" expressions. This, too, further constrains the type of the state machine; indeed, this machine can actually be implemented and will provide a simple model of a vending machine - although nothing will be vended.

This means that this new state machine's state transition functions will include a "create" function from the initial stock levels to an instance of Idle(NonNegativeInteger,NonNegativeInteger,NonNegativeInteger).

A real implementation could further subtype this, adding a uniquely-typed IO interface to the Idle and Credit states, passed into the create transition. The Success-returning select operations can then also use pure IO functions to mutate the IO context to one in which the physical mechanism of vending has happened.

But the implementation of a state machine exists in the expressions that return the output value, parameterise the next state, and the boolean expressions forming the "when:" guards. These are full expressions in the pure functional language I'm imagining this thing implemented on top of, so are fully Turing complete.

There is no syntactic distinction between a type and an implementation; a state machine type can be instantiated if all of its outputs have expressions giving them values (ok, singleton-typed outputs such as Void can be implied!). This is a bit like concrete versus abstract classes in an object-oriented programming language.

Chaining transitions

So far, we've only looked at transitions that accept an input value, return an output value and move to a new state. This means that state machine behaviour is defined entirely in terms of its reactions to external values. Since the values we generate for the output and the parameters of the new state are Turing-complete expressions, that is sufficient to describe any behaviour, but it's not always convenient to express complicated behaviour in an expression. Sometimes a complicated operation of a compound data structure such as a list would be better defined as a state machine in its own right.

In order to support that kind of thing, we can also support chaining transitions, which can drive state machine behaviour purely based on internal state, and so do not cause the generation of state transition functions. These come in three types.

The first kind is one that starts with an input value, but which then leads to a next state without producing a value. This next state must be a special "transient state", the rules for which are explained below.

The second kind is one from a transient state to another transient state. This has no input value and no output value.

The third kind is a transition from a transient state to a non-transient state, which produces an output value.

Transient states may not have any non-chaining transitions point to and from them. Every transient state must be referenced from at least one non-transient state through a chain of chaining transitions, starting with a kind-1 chaining transition then zero or more kind-2 chaining transitions. Every transient state must also lead to at least one non-transient state through a chain of zero or more kind-2 chaining transitions, then a kind-3 chaining transition.

Every such possible chain from a kind-1, through zero or more kind-2, to a final kind-3, is equivalent to a single transition from the initial to the final non-transient states; with the kind-1 transition defining the input value and the kind-3 transition defining the output value. They might be implemented as a set of state-transition functions that just tail-call each other, but the transient states are therefore "hidden" inside the state machine from external perspectives. Those state-transition functions are lexically local to the exposed state-transition functions implementing the equivalent non-chaining transition.

But although they are hidden from users of the state machine, a subtype of the state machine can indeed "see" them for purposes of splitting them into substates, adding transitions between them, etc.

Termination

The final thing I've not covered above is terminating state machines. There's an implicit "final" state available in every state machine, if at least one transition to it exists; no transitions from that state can exist. Any transition to the final state appears in the state transition functions as a function that returns the output value, but not paired with a new state machine instance, because there isn't one. It's gone.

It is entirely possible for a state machine to have no non-transient states apart from the initial and final states. For instance, a state machine that traverses a tree and sums up the labels on the nodes might be implemented in terms of transient parameterised states for the current position in the tree and the running total, with an kind-1 creation transition that accepts the tree as an input and a kind-3 finalisation transition that returns the count. This machine will therefore have a single state transition function - from Tree to Number. The actual state of the machine is entirely represented as state parameters passed between mutually recursive functions implementing the chaining transitions, and never appears as a value visible outside of that.

Conclusion

I think I've defined a workable semantic model for state machines that can be used as a way to define sets of pure functions between instances of state machines, that can handle both synchronous state machines driven by external events, and those that bumble around a data structure or iterate mathematical operations.

It needs a proper syntax; personally, I'd prefer one based on s-expressions or IRON, but suitably consistent forms could be defined for any language.

Our visit to Maker Station (by )

As a member of two hack spaces (and co-founder, secretary, and treasurer of one), I couldn't pass up the opportunity to visit a local hack space during our visit to South Africa. A quick web search later and I found that Cape Town's hackspace is called Maker Station. I dropped them a message asking if I could drop by and say hi as an ambassador from the UK, and asking them to suggest a time - UK hack spaces tend to have an open evening sometime in the week when random people can turn up to look around and meet people; so I was expecting something like that, but didn't see a time advertised on the Web site. But they suggested I suggest a time, so I did.

If we'd planned this a bit better, we'd have brought stickers (a traditional hack space give-away gift) from Bristol and Cheltenham Hack Spaces, but we didn't - so, instead, Sarah painted a picture from each hack space:

Bristol Hackspace picture Cheltenham Hackspace picture

For those not familiar with them, UK hack spaces tend to be run like a club or society - a constitution document of some kind sets out rules for people to become members, and for members to vote on a board who are in charge of making sure the space meets its legal obligations and controls the flow of money. Usually, members get unlimited access to the space and voting rights in exchange for a monthly membership fee (with some tools that use expensive consumables requiring extra usage fees on top to cover that). The monthly memberships go into a bank account, and the elected board choose to spend that on rent, insurance, electricity, broadband, consumables, and so on, and the money left over each month piles up until it's enough to buy a fancy new tool requested by the members. Most members have day jobs and hack on projects in their spare time, so hack spaces tend to be quiet during the day and busy in evenings and at weekends; and, as I mention above, there's usually an open evening every week for potential new members to come along and visit, which is also the day members come along to socialise, thereby ensuring there's a good population present to welcome new people. There are no paid staff; the board are all volunteers, and members are expected to unlock and lock up if they visit at a time when nobody else is around.

So I was quite interested to find out that Maker Station was different. We were met inside the entrance by Felix, one of the founders. The entrance led directly to a cafe area, with leaflets of hackerly interest lying around, and a range of drinks and crisps and stuff (including Maker Station logo biscuits they'd made in their own rocket oven!) for sale. The space is staffed and open during business hours; the two founders are there during the day as it's actually their day job, and they have two employees to help (the cost of living in South Africa is much lower than in the UK, which is what makes this practical).

Maker Station cafe area

Chatting with Felix in the cafe

Beyond the cafe was the hack space itself. Much of the space is divided into benches (or larger studios), which are either rented by the day in a sort of hot-desking arrangement:

Somebody making things on a Maker Station

...or dedicated to a single user who pays regularly for it, so has extensive tool and work-in-progress storage dedicated to them:

One of the Maker Station studios

People normally used it during the day, but if people were still hacking when 6pm came, they'd keep it open in the evening as well. One user I spoke to there was making a commission for a client, suggesting that the member demographic was more people hacking on stuff for a living than evening hobbyists. Felix and his brother (the other founder) don't quite make enough to run the place from memberships alone; the shortfall is made up by them working on paid commissions of their own in the space. Felix showed us some current projects they were working on, an exhibit for a local science centre and a small wind tunnel for somebody experimenting with wind turbine designs:

Felix shows off a current project

I didn't get the impression there was quite the sense of community that UK hackspaces have, with their busy open evenings and highly decentralised governance; Felix said that he often found himself acting as a "broker" between people who wanted some skill and people who had it or who a good supplier was for something, while in the UK, such connections usually arise organically on the open evening, so I suggested he might like to set up a weekly social slot in the cafe (and maybe a wiki for sharing information like supplier lists, like we have at Cheltenham Hackspace).

I was very impressed by their facilities. A proper cafe! Lots of space! Many, many, tools, including a decent metal lathe, forge, foundry, and welding gear!

Welding stuff Assorted metalworking stuff Big metalworking lathe

Interestingly, they didn't tend to go in for the stock UK hack space tools of laser cutters and 3D printers. It turns out that in Cape Town there are several suppliers who will do small-job CNC cutting and lasering and 3D printing at a reasonable price using high-end equipment, within easy travel of Maker Station. As far as I can tell, it's prohibitively expensive to get that sort of thing done in the UK other than in industrial quantities, which is why UK hack spaces end up buying their own equipment!

Felix seems to be really good at community outreach and education - something we're looking to expand at Cheltenham Hackspace, not to mention a speciality of Sarah's, so we were interested to hear about that. Here's a video of Felix giving a talk to students about prototyping. One thing that impressed me was that he runs things he calls "disassembly workshops"; take a pile of unwanted appliances, and unleash a bunch of children on them with screwdrivers (and some expert help) to tear them apart. This is fun in itself, and provides an opportunity to learn how the things you're taking apart work, as well as building skills in using the tools and working out how to get things to bits.

Once you have a pile of bits, depending on the age range and abilities, you can let the kids stick the bits together to make art to take home - or teach them electronics by wiring them up to do new things, maybe even so far as building robots out of the mechanical and electronic parts.

Here's some photos from a recent disassembly workshop they did: 1 2 3 4 5.

We enjoyed our visit to Maker Station. It was refreshing to see a different take on the usual hack space financial model, and interesting to see how the differing economics of South Africa affected what a hack space needed to be and could do. And Felix was inspiring as an educator and speaker! I'm keeping a close eye on his Twitter feed for good ideas to use in my own sci/tech outreach activities :-)

Public perceptions (by )

Let us consider two arguments.

  1. Immigration drives the UK economy, creating more and better-paid jobs, and cheaper products and services. This is supported by actual data obtained from tax records and other reputable sources; immigrants have tended to come here in pursuit of work, to fill demand that is not being met locally, so have provided a valuable workforce for industry to grow upon, and consumed less benefits than native people. Growing industry creates more jobs and provides cheaper products and services, and a growing population creates more demand for products and services, which also stimulates industry.
  2. People are poor, due to unemployment or having to work for low-paid jobs as it's all they can get, and benefit cuts because there's not enough money to go round. So why are we letting foreigners in to compete for our jobs and benefits?

Clearly, the former is a more correct argument, as it's backed by facts; the latter does note cite its sources at all (Are people really poor? Compared to what? If so, is that really why people are poor? Are the "foreigners" actually competing for a fixed pool of jobs and benefits? Are there enough of them that it would actually make any difference if they weren't "competing", or are there much bigger issues we should worry about?)

Yet recent events suggest that the latter argument has swayed the public opinion better than the former.

Why is that? Why is my opinion of the two arguments so different to what the majority of my fellow Brits make of them?

I come from a culture (nerdy, educated) that values arguments backed up by data and mathematics. This seems self-evident to us: "But it's measurement of the actual world, processed through well-tested statistical techniques! What higher standard of generalisation about the world can we have than the Scientific Method?"

But what if you're not from our nerdy, educated culture, who've sat in classrooms and been shown the wonderful things the scientific method has given us?

Imagine yourself being presented with two arguments. One is given by a person who's not like you (they're all nerdy and they talk posh). It appeals to mathematics and science, which you found boring and irrelevant at school; you much preferred English, French and Art. That doesn't mean your stupid; you have applied your skills as a signwriter, and your work is highly respected, leading you to now run a thriving small business. But the nerdy woman saying that her maths show immigration is great reminds you of the kids who liked science and maths at school, and you didn't get on with them well.

The nerd is upset you don't seem to respect her viewpoint, and starts to explain that the evidence is all valid, but it's just a sea of numbers. She says the maths show a correlation with a great confidence level, and when you ask what that means, it quickly becomes apparent that you'll need to sit through maths lessons to actually have it all explained, and you had quite enough of that at school.

Meanwhile, somebody else gives you a short quip. "People are poor" (yeah, you can see that) "due to unemployment or having to work for low-paid jobs as it's all they can get, and benefit cuts because there's not enough money to go round" (yep, that the sort of thing you've heard from friends and family, and seen in the papers) "So why are we letting foreigners in to compete for our jobs and benefits?" Hell yeah!

The nerd might start complaining about the problems with that statement, but it makes perfect sense. The sort of person who comes up with sensible observations like that is clever, but not brainy. They're somebody who's cut through the bullshit and spotted the simple truth at the heart of a problem; not somebody who's built up a complicated argument with maths and abstract theories. You can respect that sort of cleverness, and want to hear what they think about other problems people face. They offer simple common-sense solutions.

And that, I think, is the problem us nerdy types have with public perception. Our methods produce correct answers, but the way we justify those answers to the majority of people who don't have backgrounds in the scientific method and statistical analysis is way off the mark. And while understanding this stuff suggests intelligence, not understanding it does not suggest absence of intelligence - yet the implication that it does is embedded into our culture, making us sigh and shake our heads at people who don't understand. "Oh, just trust us," we say. "We've worked it all out."

We need to keep using science and maths to find the truths of the world; and those scientific arguments are the best way of justifying our conclusions to each other. But having done so, we need to find better ways to explain and justify those truths to the world.

Brexit (by )

I think the best analysis of the possible consequences, whichever way the referendum went, was this: Martin Lewis' guide to voting in the EU referendum.

In other words, nobody really had any good arguments as to which was better - in or out. The EU has costs and benefits. The problem is, the referendum wasn't about whether Britain would be better in or out; it was about whether Britain should remain or leave, which is a slightly different point. The differences is: the cost of change also enters the equation. Given that the consequences of being in or out are unclear, the question becomes: Is it worth the costs of leaving?

Personally, I don't think so: Even though the consequences either way were unclear, I suspect that the average outcomes are probably slightly better if we'd stayed in. All the talk of immigration (we still need immigration to afford to look after our ageing population), sovereignty (the British parliament is hardly more accountable to us than the European one), and £350m a week were largely red herrings, spectres summoned to try and mislead the population; the real issues were far subtler and more pedestrian.

But that difference between the best predictions of the impacts of staying or leaving on our quality of live are small compared to the cost of change. Today's drop in the value of the pound and British shares is not a measure of the predicted economic weakness of a non-EU UK; it's a measure of the uncertainty as to how effective British business will be, and how easy it will be for multinational corporations to operate in Britain. The world cannot predict how fiscal and commercial relationships with Britain will be in five years, let alone ten or more, and those are the kinds of periods over which major investments are planned; so that investment will be directed to safer places. Maybe Britain will become a new economic powerhouse without EU regulations - or maybe it will become a dingy backwater. The world doesn't know, so it's moving its money elsewhere. Funnily enough, that reduces the chances of Britain being able to become an economic powerhouse, because we're poorer to begin with.

Another effect that's far larger than any predictions of the effects of being in or out is the effect of the referendum process. We are now in a situation where half the country is furious with the other half for having ruined their country, and possibly the world. Meanwhile, that half is furious with the first half for having nearly prevented them from saving their country, and possibly the world. This is a rather toxic and explosive situation to now be attempting to plan what's going to happen over the next five years. Many decisions will be made based on personal grudges rather than rational consideration. Meanwhile, in the populace at large, a lot of resentment is simmering; if living conditions drop in ways that are attributable to our leaving Europe, the half of the population that voted for it will be considered personally responsible for ruined lives. That could get nasty.

Another effect of the Leave result that probably dwarfs the actual cost of not being in the EU is that the result has emboldened the more right-wing figures in British politics. Folks who have traditionally acted in the interests of big business and the rich, while cynically appealing to the fears of the masses in order to get their way. I'm concerned that their influence - previously more rhetorical than actual - will grow in the political changes coming, which could have negative long-term consequences.

So, I'm sad on many levels about how this referendum turned out; but I wouldn't have been very much happier if we'd voted to remain.

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales