Category: ARGON

Configuring replication (by )

Storing all your data on one disk, or even inside one computer, is a risky thing to do. Anything stored in only one, small, physical location is all too easily destroyed by flood, fire, idiots, or deliberate action; and any one electronic device is prone to failure, as its continued functioning depends on the functioning of many tiny components that are not very easily replaced.

So it's sensible to store multiple copies, ideally in physically remote locations.

One way of doing this is by taking backups; this involves taking a copy of the data and putting it into a special storage system, such as compressed files on another disk, magnetic tape, a Ugarit vault, etc.

If the original data is lost, the backed-up data can't generally be used as-is, but has to be restored from the backup storage.

Another way is by replicating the data, which means storing multiple, equivalent, copies. Any of those copies can then be used to read the data, which is useful - there's no special restore process to get the data back, and if you have lots of requests to read the data, you can service those requests from your nearest copy of it (reducing delays and long-distance communication costs). Or you can spread the read workload across multiple copies in order to increase your total throughput.

Replication provides a better quality of service, but it has a downside; as all the copies are equally important, you can't use cheaper, slower, more compact storage methods for your extra copies, as you can with backups onto slower disks or tapes.

And then there's hybrid systems, perhaps were you have a primary copy and replicate onto slower disks as a "backup", while only using the primary copy for day-to-day use; if it fails then you switch to the slower "backup replica", and tolerate slower service until a new primary copy is made.

Traditionally, replicated storage systems such as HDFS require the administrator to specify a "replication factor", either system-wide or on a per-file basis. This is the number of replicas that must be made of the file. Two is the minimum to actually get any replication, but three is popular - if one replica is lost, then you still have two replicas to keep you going while you rebuild the missing replica, meaning you have to be unlucky and have two failures in quick succession before you're down to a single copy of anything.

However, this is a crude and nasty way of controlling replication. Needless to say, I've been considering how to configure replication of blocks within a Ugarit vault, and have designed a much fancier way.

For Ugarit replication, I want to cobble together all sorts of disks to make one large vault. I want to replicate data between disks to protect me against disk failures, and to make it possible to grow the vault by adding more disks, rather than having to transfer a single monolithic vault onto a larger disk when it gets full.

But as I'm a cheapskate, I'll be dealing with disks of varying reliability, capacity, and performance. So how do I control replication in such a complex, heterogeneous, environment?

What I've decided is to give each "shard" of the vault four configurable parameters.

The most interesting one is the "trust". This is a percentage. For a block to be considered sufficiently replicated, then copies of it must exist on enough shards that the sum of the trusts of the shards is more than or equal to 100%.

So a simple system with identical disks, where I want to replicate everything three times, can be had by giving each disk a trust of 34%; any three of them will sum to 102%, so every block will be copied three times.

But disks I trust less could be given a trust of 20%, requiring five copies if a block is stored only on such disks - or some combination of good and less-good disks.

That allows for simple homogeneous configurations, as well as complex heterogeneous ones, with a simple and intuitive configuration parameter. Nice!

The second is "write weighting". This is a dimensionless number, which defaults to 1 (it's not compulsory to specify it). Basically, when the system is given a block to store, it will pick shards at random until it has enough to meet the trust limit of 100%. But the write weighting is used as a weighting when making that random choice - a shard with a write weightinh of 2 will get twice as many blocks written to it as a normal block, on average.

So if I have two disks, one of which has 2TiB free and the other of which has 1TiB free, I can give a write weighting of 2 to the first one, and they'll fill so that they're both full at about the same time.

Of course, if I have disks that are now completely full in my vault, I can set their write weighting to 0 and they'll never be picked for writing new blocks to. They'll still be available for reading all the blocks they already have. If I left the write weighting untouched everything would still work, as the write requests failing would cause another shard to be picked for the write, but setting the weighting to 0 would speed things up by stopping the system from trying the write in the first place.

The third parameter is a read priority, which is also optional and defaults to 1. When a block must be read, the list of shards it's replicated on is looked up, and a shard picked in read priority order. If there are multiple shards with the same read priority, then one is picked at random. If the read fails, we repeat the process (excluding already-tried shards), so the read priority can be used to make sure we consult a fast, nearby, cheap-to-access local disk before trying to use a remote shard, for instance.

By default, all shards have the same read priority, so read requests will be randomly spread across them, sharing the load.

Finally, we have a read weighting, which defaults to 1. When we randomly pick a shard to read from, out of a set of alternatives with the same priority, we weight the random choice with this weighting. So if we have a disk that's twice as fast as another, we can give it twice the weighting, and on a busy system it'll get twice as many reads as the other, spreading the load fairly.

I like this approach, since it can be dumbed down to giving defaults for everything - 33% trust (for a three-way replication), and all the weightings and priorities at 1 (to spread everything evenly).

Or you can fine-tune it based on details of your available storage shards.

Or you can use extreme values for various special cases.

Got a "memcached backend" that offers fast storage, but will forget things? Give it a 0% trust and a high write weighting, so everything gets written there, but also gets properly replicated to stable storage; and give it a high read priority, so it gets checked first. Et voila, it's working as a cache.

Got 100% reliable storage shards, and just want to "stripe" them together to create a single, larger, one? Give them 100% trust, so every block is only written to one, but use read/write weightings to distribute load between them.

Got a read-only shard, perhaps due to its disk being full, or because you've explicitly copied it onto some protected read-only media (eg, optical) for security reasons? Just set the write weighting to 0, and it'll be there for reading.

Got some crazy combination of the above? Go for it!

Also, systems such as HDFS let you specify the replication factor on a per-file basis, requiring more replication for more important files (increasing the number of shard failures required to totally lose them) and to make them more widely avilable in the cluster (increasing the total read throughput available on that file, useful for small-but-widely-required files such as configuration or reference data). We can do that to! By default, every block written needs to be replicated enough to attain 100% trust - but this could be overriden on a per-block basis. Indeed, you could store a block on every shard by setting a trust target of "infinity"; normally, when given a trust target it can't meet (even with every shard), the system would do its best and emit a warning that the system is in danger, but a trust target of "infinity" should probably suppress that warning as it can be taken to mean "every shard".

The trust target of a block should be stored along with it, because the system needs to be able to check that blocks are still sufficiently replicated when shards are removed (or lost), and replicate them to new shards until every block has met its trust target again.

Tell me what you think. I designed this for Ugarit's replicated storage backend and WOLFRAM replicated storage in ARGON, but I think it could be a useful replication control framework in other projects, too.

The only extension I'm considering is having a write priority as well as a write weighting, just as we do with reads - because that would be a better way of enforcing all writes go to a "fast local cache" backend than just giving it a weighting of 99999999 or something, but I'm not sure it's necessary and four numbers is already a lot. What do you think?

A user interface design for a scrolling log viewer with varying levels of importance (by )

Like many people involved with computer programming and systems administration, I spend a lot of time looking at rapidly scrolling logs.

These logs tend to have lines of varying importance in them. This can fall into two kinds, that I see - one is where the lines have a "severity" (ranging from fatal errors down to debugging information). Another is where there's an explicit structure, with headings and subheadings.

Both suffer from a shared problem: important events or top-level headings whoosh past amidst a stream of minutae, and can be missed. A fatal error message can be obscured by thousands of routine notifications.

What I think might help is a tool that can be shoved in a pipe when viewing such a log, that uses some means (regexps, etc) to classify log lines with a numerical "importance" as appropriate, and then relaying them to the output.

However, it will use terminal control sequences to:

  1. Colour the lines according to their importance
  2. Ensure that the most recent entry at each level of importance remains onscreen, unless superceded by a later entry with a higher importance.

The latter deserves some explanation.

To start with, if we just have two levels of importance - ERROR and WARNING, for instance - it means that in a stream of output, as an ERROR scrolls up the screen, when it gets to the top it will "stick" and not scroll off, even while WARNINGs scroll by beneath it.

If a new ERROR appears at the bottom of the screen, it supercedes the old one, which can now disappear - letting the new ERROR scroll up until it hits the top and sticks.

Likewise, if you have three levels - ERROR, WARNING and INFO - then the most recent ERROR and WARNING will be stuck at the top of the screen (the WARNING below the ERROR) while INFOs scroll by. If a new WARNING appears, then the old one will unstick and scroll away until the new WARNING hits the top. If a new ERROR appears, then the old ERROR and WARNING at the top will become unstuck and scroll away until the new ERROR reaches the top.

So the screen is divided into two areas; the stuck things at the top, and the scrolling area at the bottom. Messages always scroll up through the scrolling area as they come, but any message that scrolls off the top will stick in the stuck things area unless there's another message at the same or higher level further down the scrolling area. And the emergence of a message into the bottom of the scrolling area automatically unsticks any message at that, or a less important, level from the stuck area.

That way, you can quickly look at the screen and see a scrolling status display, as well as (for activity logs from servers) the most recent FATAL, ERROR, WARNING, etc. message; or for the kinds of logs generated by long-running batch jobs, which tend to have lots of headings and subheadings, you'll always instantly see the headings/subheadings in effect for the log items you're reading.

This is related somewhat to the idea of having ERRORs and WARNINGs be situations with a beginning and an end (rather than just logged when they arise), such as "being low on disk space"; such a "situation alert" (rather than an event alert, as a single log message is) should linger on-screen somewhere until it's cancelled by the software that raised it emitting a corresponding "situation is over" event. Also related is the idea that event alerts above a certain severity should cause some kind of beeping/flashing to happen, which persists until manually stopped by pushing a button to acknowledge all current alerts. Such facilities can be integrated into the system.

This is relevant for a HYDROGEN console UI and pertinent to my previous thoughts on user interfaces for streams of events and programming interfaces to logging systems.

Thoughts on Programming and Tracing (by )

I was recently pointed at this interesting article: Learnable Programming.

It's a good read, overturning many assumptions the software industry has picked up over the years, and propagated without thought since.

The first part suggests allowing a programmer to trace the flow of execution of a program graphically, using an interactive timeline. My first thought was that this was all well and good, but would rely on every library in the language annotating every operation with information about how to present it - producing the little thumbnails to go in the timeline, or exposing numeric values that can be plotted onto charts. Also, highlighting the "current" drawing operation in red on the canvas relies on those operations being things that affect a canvas; more abstract operations, such as writing to a database (or even generating images to be encoded directly into a file rather than onto the screen) would require a more explicit "object preview".

However, those are not insurmountable goals. And, perhaps, things that can be built on top of my ideas about logging and tracing, making it possible to use such an interface to go through traces of execution captured from production servers, rather than just within a cute live-coding IDE; the trace entries generated by operations in your libraries could, with the help of a meta-library of trace visualisation rules, generate those little thumbnails. However, it would need to be augmented with dynamic scope information provided by the programming environment itself to know which line of code caused the trace event; the kind of thing one finds in a stack trace.

He asks "Another example. Most programs today manipulate abstract data structures and opaque objects, not pictures. How can we visualize the state of these programs?"; so I suggest that the abstract data structures and opaque objects be annotated with code that summarises their state. Many languages have a notion of "return a string representation of this object", generally aimed at debug logging - Python's repr() versus str(), for instance. Perhaps if we moved to expecting objects to return HTML representations of themselves, we could take a step in that direction.

The second part (and I'm taking some temporal liberties here, as some concepts I've included in the first part are touched upon in the second and vice versa) is also inspiring; it looks at the bigger picture, considering how libraries and code-editing environments can be designed to make it much easier for programmers to identify what operations their libraries are making available to them, rather than requiring the first step to be the reading of documentation. It touches on topics such as the dangers of mutable state (preaching to the converted here!), and the choice of library function names to make code using them clear (I'm also a big fan of smalltalk / Cocoa-style function call syntax, and how it might be brought into the Lisp family of languages...)

I've written before that I think modifying software should be a much more widely-practiced activity; and I think that should be achieved through removing unnecessary obstacles, rather than forcing everyone through complicated programming classes. I'm always interested in more thoughts on how to make that happen!

Insomnia (by )

There's something about the combination of having spent many weeks in a row without more than the odd half-hour here and there to myself (time when I get to do whatever I like, rather than merely choosing which of the list of things I need to get done urgently I will do next, or just having no choice at all), and knowing I need to get up even earlier the next morning than usual (to dive straight into a long day of scheduled activities), that makes it very, very, hard for me to sleep.

So, although I got to bed in good time for somebody who has to wake up at six o'clock, I have given up laying there staring at the ceiling, and come down to eat some more food (I get the munchies past midnight), read my book without disturbing Sarah with my bedside light, and potter on my laptop. I need to be up in five hours, so hopefully emptying my brain of whirling thoughts will enable me to sleep.

There's lots of things I want to do. Even though it's something I need to get done by a deadline, I'm actually enthusiastic about continuing the project I was working on today; making an enclosure for our chickens. This is necessary for us to be able to go away from the house for more than one night, which is something we want to do over Christmas; thus the deadline.

Three of the edges of the enclosure will be built onto existing walls or woodwork, but one of them needs to cut across some ground, so I've dug a trench across said bit of ground, laid an old concrete lintel and some concrete blocks in the trench after levelling the base with ballast, and then mixed and rammed concrete around them. When I next get to work on it, I'll mix up a large batch of concrete and use it to level the surface neatly (and then ram any left-overs into remaining gaps) to just below the level of the soil, then lay a row of engineering bricks (frog down) on a mortar bed on top of that in order to make a foundation that I can screw a wooden batten to. With that done, and some battens screwed into the tops of existing walls that don't already have woodwork on, I'll be able to build the frame of the enclosure (including a door), then attach fox-proof mesh to it, and our chickens will have a new home they can run around in safely.

Thinking about how I'm going to lay the next batch of concrete in a nice level run, working around the fact that I only have a short spirit level by placing a long piece of wood in there and levelling it with wedges and then using it as a reference to level the concrete to, has been one of the things running around in my head this evening.

Another has been the next steps from last Friday, when I had a fascinating meeting with a bunch of interesting people in the information security world. You see, I've always been interested in the foundation technologies upon which we build software, such as storage management, distributed computing, parallel computing, programming languages, operating systems, standard libraries, fault tolerance, and security. I was lucky enough to find a way into the world of database development a few years ago, which (with a move to a company that produces software to run SQL queries across a cluster) has broadened to cover storage management, distribution, parallelism, AND programming languages. So imagine my delight when said company starts to develop the security features in the product, and I can get involved in that; and even more when (through old contacts) I'm invited to the inaugural meeting of a prestigious group of peopled interested in security. That landed me an invite to the second meeting (chaired by an actual Lord, and held in the House of Lords!), the highlight of which was of course getting to talk to the participants after the presentations. I found out about the Global Identity Foundation, who are working pn standardising the kind of pseudonymous identity framework I have previous pined for; I'm going to see if I can find a way to get more involved in that. But I need to do a lot of reading-up on the organisations and people involved in this stuff, and figuring out how I can contribute to it with my time and money restrictions.

I'd really like to have some quiet time to work on my secret fiction project, too. And I want to investigate Ugarit bugs. Some bugs in the Chicken Scheme system have been found and fixed lately, so I need to re-test all these bugs to see if any of the more mysterious ones were artefacts of that. I'm in a bit of a vicious circle with that; the longer it is since I've been tinkering with the Ugarit internals, the longer it'll take me to get back into it, and the more nervous I feel about doing so. I think I might need to pick off some lighter bit of work with good rewards (adding a new feature, say) and handle that first, to get back into the swing of things. Either way, I'll need a good solid day to dig into it all again; trying to assemble that from sporadic hours just won't cut it.

I'm still mulling over issues in the design of ARGON. Right now I'm reading a book on handling updates to logical databases - adding new facts to them, and handling the conflicts when the new facts contradict older ones, in order to produce a new state of the database where the new fact is now true, but no contradictions remain. I need to work this out to settle on a final semantics for CARBON, which will be required to implement distributed storage of knowledge within TUNGSTEN. I need a semantics that can converge towards a consensus on the final state of the system, despite interruptions in internal network connectivity within the cluster causing updates to arrive in different orders in different places; doing that efficiently is, well, easier said than done.

I really want to finish rebuilding my furnace, which I hoped to get done this Summer, but I'm still assembling the structural supports for it. I've made a mould to cast shaped refractory bricks for the lining of the furnace, but I've yet to mix up the heatproof insulating material the bricks need to be made out of and start casting the bricks, as I still need to work out how I'll form the tuyere.

I want to get Ethernet cabled to my workshop, because currently I don't have a proper place for working on my laptop; I have to do it on the sofa in the lounge to be within range of the wifi, which isn't very ergonomic, doesn't give me access to my external screens, and is prone to interruption by children. I find it very motivating to be in "my space", too; the computer desk in the workshop is all set up the way I like it. And just for fun, I'd like to rig the workshop with computer-controlled sensors and gizmos (that kind of thing is a childhood dream of mine...).

This past year, I've tried booking two weekend days a month for my projects, in our shared calendar. This worked well at the start of the year, with projects such as the workshop ladder and eaves proceeding well, but it started to falter around the Summer when we got really busy with festivals and the like. I started having to fit half-days in around other things, which meant spending too much time getting started and clearing up compared to actually getting things done, so my morale faltered; and with so much other stuff on, I've been increasingly inclined to spend my free time just relaxing rather than getting anything done. On a couple of occasions I've tried taking a week off work to pursue my projects, but I then feel guilty about it and start allocating days to spending more time with the children or tidying the house, and before I know it, five days off becomes one day of actual project work. I need to stop feeling guilty about taking time to do the things I enjoy, because if I don't, I'll be too tired and miserable to do a good job of the things I should be doing! And rather than booking my monthly project days around other stuff that's going on, next year I'm going to mark out my two days each month in advance, and then move them elsewhere in the month if Sarah needs me to do something on that particular day, to decrease the chance of ending up having to scrape together half-days around the month (or to skip project days entirely, as I ended up doing last month). I feel awful about saying I'm going to spend days doing what I feel like doing rather than the things the rest of my family need me to drive them to, but if I don't, I think I'm going to fall apart!

Now... off and on I've spent forty minutes writing this blog post. So with my whirling thoughts dumped out, I'm going to go back to bed and see if I can sleep this time around. Wish me luck!

Privacy (by )

I have a looser attitude towards privacy than most people, but I have began to reconsider that lately.

Generally, I believed (and still do) that anything I do in public is pretty much exempt from privacy. I have no privacy objection to pervasive CCTV, because if I do anything in a public place, somebody could be watching me anyway. The fact that my enemies can now just consult massive archives of CCTV to find me rather than having to get somebody to follow me around isn't, in my view, a huge deal. Indeed, I quite like the idea of sousveillance, having my own recording of what happens around me. It might be inappropriate to be doing that in circumstances that the people around me consider "private", so I'd turn it off for their comfort when it seemed right to do so, but I would still assume that anything I do in the presence of other people is basically recorded to some extent - after all, it's in their memory, at least!

Likewise with monitoring my network traffic at my ISP; I have never had any illusion of privacy there. I encrypt traffic that matters, and accept that the existence and destination/origin of encrypted traffic might be used by my enemies for traffic analysis.

So, I didn't really have any objections to mass surveillance; I had far more objection to the facts that encryption is far from ubiquitous and that information security is not taught in schools. My feeling was that if I can't stop an enemy that doesn't abide by the law (eg, organised criminals) from performing traffic analysis on me, then I can't assume it's private; I can stop them reading my stuff or impersonating me by using public key cryptography, so as long as the law doesn't hinder that, I'm content.

As such, I always wished that Web browsers would just include some kind of unique user ID in the headers, ideally backing it up with a public-key signature of the entire HTTP request. Then we could dispense with session cookies, logins, and even things like OpenID; we'd just authenticate to our browser by supplying the keypair in some browser-dependent way, and then head out onto the secure-single-sign-on Web. There's no loss in privacy compared to the current status quo that people are happy to identify themselves to web sites with email addresses, but it'd be a whole lot simpler for users and for developers. And so that, basically, is the security model I developed for ARGON.

However, I am starting to change my mind.

I've always felt that the "hole" in my approach to privacy was that it depended on my own knowledge of security and my enlightened use of encryption; I wanted sufficient education to bring everyone to that level. Encryption tools are generally a bit clunky, but if more people wanted to use them, that would create demand for better tools (or, more pertinently, better integration into the tools they already use). I felt that if we could just get people to encrypt and sign their communications, and encrypt their storage, and use Tor for things where the cost is worth the protection against traffic analysis, everything would be fine.

However, what has made me start to change my mind is the move towards storing one's data on third-party servers. By which I mean, living your life through Facebook, or letting Google store your email and your documents. People are moving away from having a computer full of their stuff, and communicating semi-directly with their peer's computers, towards letting third parties hold all their stuff. Often third parties they don't pay money to and are in no contract with, so they have little or no leverage over.

It's easy to say that educating people in computer security would make them realise that's a bad idea, but I use many of these services despite not trusting them one bit; I do it because network effects force me to. I could run my own StatusNet server on my own hardware, but instead I use Twitter in order to make it easy for people to communicate with me. I use Facebook because it's the easiest way to keep up with my many peers that do, and sometimes because I am forced to; an organisation I am a member of uses a Facebook group for important announcements. Many people do not publish an email address, but instead require me to contact them through various third-party services.

In effect, we are being forced to hand our information to third parties, and to trust them with it. Variations on these services that store your information on hardware you control exist; variations on those services where you actually pay a service provider to store it on their hardware (in exchange for them looking after maintenance, amortizing up-front costs, and so on for you, and where they are more incentivised to keep your stuff secure so you trust them than to try and find ways to make money out of it) also exist.

But they are not popular, as the big "free" providers have the vast majority of the users, and the value of these services is in all your peers already being on them. Now that worries me.

I'd really like to see more push-back against this. If enough people used decentralised software like Diaspora or ran their own mail systems, then the network effects would benefit those, rather than centralised commercial outfits. Clearly, some large incentive needs to be found to push people over, and an unpleasant transition period where everyone needs to be on both. Eventually, organisations like Facebook, Twitter and Google would find themselves forced to interoperate with the decentralised protocol or lose their place in the market, and then would find themselves having to compete on points such as "privacy" when the same ease-of-use and functionality can be had elsewhere for little cost.

But, we need technical measures as well. Build sensible public-key infrastructure into the core of applications (including Web browsers). Ditch cookies, and replace them with explicit authentication: provide a system of public-key-signing HTTP requests as I suggest, but turn it off by default, and force web servers to request it with a status code, as is already done for HTTP authentication (not that that is used for web applications, alas). Let browsers seamlessly support multiple identities, and when a web site requests identification, let the user choose which identity to use; and then colour the border of the Web page according to the identity in use so they don't forget. And while providing identity management through that (controlled) mechanism, try as hard as possible to remove all other means of identification - don't send headers leaking lots of information about the user-agent and its capabilities and settings, and disallow Javascript from querying that sort of thing. Bundle Tor with browsers, so it can be turned on and off with the click of a button, as part of the "private browsing mode" found in many browsers.

I still don't think there's much point in trying to fix this with making information gathering and retention illegal (the recent PRISM scandals suggest that legitimate authorities will find ways to work around limitations on their information gathering, and organised criminals simply won't give a damn anyway); we need better technology that makes us anonymous by default and pseudonymous when we want to be. But there may be some value in legislation helping to break the stranglehold on the social software market held by big centralised organisations!

I'm updating the ARGON security model to work like this (not that that makes a difference to the Real World, mind...)

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales