The biggest currency exchange market in the Bitcoin world is MtGox. When it goes down, either due to a DDoS attack or sheer high load due to everyone panic-selling, then people who hold bitcoins and care about their value in dollars get the panicky realisation that they can't easily sell them - which causes two things:
A drop in the value of bitcoins; people care about their ability to turn them back into fiat money, and will continue to do so until lots of things can be bought directly with bitcoins.
Widespread angst that MtGox is a central point of failure for the Bitcoin economy, complaining that they are vulnerable to DDoSes and get high trading lag when under load, and so on.
So, as a high-performance systems developer, I thought I'd write some notes on how to build a more resilient exchange platform. Perhaps MtGox will do something like this, but perhaps more ideally, one of their competitors will, and thereby win some of MtGox's market share, and thus decentralise the exchange market somewhat.
So I resigned myself to the fact that Alaric would turn the new TV into a giant computer screen.
Of which my main concern is that he will work on it rather than going down to the 'lab' to work, but instead he set things up so Jean could play with Turtle Graphics which is a new version of an old thing he has implemented. Basically she has to actually programme in order to get a little turtle to draw. She has produced me some lovely flowers and a house and things
The pictures she's created are on Al's laptop so he'll have to pop them on here when he gets a spare moment (haha). She is using Chicken Scheme and using a programme called Simple Graphics which is what Al created initially though I believe it is open source and others have since added stuff
She also started to 'compose' scenes for photography today - which was a shock to me!
I have a hypothesis. The hypothesis is that, in software, the goals of logging stuff for audit purposes, reporting progress to a user on the course of a long-winded process, profiling a system to see where it is spending its time in the long run, and debugging certain kinds of problems (in particular, hangs of multithreaded systems come to mind, although many other kinds of problems too), can all fundamentally be met by a single piece of software infrastructure.
I say this because, largely, I write very similar code to do all four of them, and end up building unnervingly similar-looking ad-hoc solutions, despite the fact that a single well-developed tool could do all of them really well.
Basically, all of them boil down to a bit of code stating what it's about to do and what it's just done at interesting points.
Let's look at each in turn.
Software often writes, either to a log file for unattended bits of software such as daemons, or direct to the user if there is one, lines of text describing the current state of affairs. Generally, it logs when steps in the process are started (to document what it's about to do), when steps end (to document the results), and sometimes during iterations over something or other (although that's really just a special case of the previous cases, as each iteration is itself a step). This is often done for auditing reasons - to find out what operations led up to some state of the system, for a plethora of reasons - and sometimes to offer a kind of primitive progress reporting; you can look at the log to see how far something has gotten.
But we have more sophisticated progress reporting, sometimes. Rather than using a dumb logging facility to say "I'm about to process 1000 objects", then "I have processed N objects so far" periodically and letting the user read the text and do the maths to work out how far through we are, we can use a more evolved progress reporting infrastructure (usually called a "progress bar"), telling it when the process starts (displaying the bar) and ends (removing the bar from the display) and periodically updating it on progress by telling it how many steps we have to do and how many are currently done. Given that higher-level information, it can produce a more intuitive display, and even estimate the time to completion and report a rate of progress.
Although the display is rather different, there is a great commonality in the interface - we are stating the fact that a process has started (and how many steps it will take), and periodically stating how far we have done, and then confirming when we are finished. A high-level progress reporting interface could just as easily generate lines in a log file as display a progress bar on a graphical screen, from the same software interface.
I've also used logging for debugging, quite extensively. Rather than logging user-level messages that relate to the user's model of the state of the system, I can also log messages about the internal state of things "under the covers", and use that to guide my own investigations of problems caused by the internal state going awry. Sometimes I will do that by just printing those messages to a console I can observe, and removing or commenting out the code that does so when I'm not debugging; but more featureful logging infrastructures allow for "debug logging" that is hidden away unless specially requested, meaning I can leave the logging statements present in "production code" and turn them on at run-time, which can be a great boon.
Meanwhile, in profiling and debugging server applications with multiple threads handling requests in parallel, I have often used a particular technique. I've given each thread a thread-local "current status string", and then peppered my code at interesting points with a call to set the current thread's status string with a summary of what it's about to do. I can then easily ask the system to display a dump of all running threads and what they're currently doing. Java makes this easy with threads having a name string and the ability to introspect the tree of threads and thread groups; in a system written in C using different processes rather than threads, I've written my own infrastructure to do it using a shared memory segment with each process getting a fixed-sized slot therein and some lock-free atomic update mechanisms to avoid races between the processes and the sampling tool.
This lets me do two things. Firstly, when the system is grindingly slow, I can quickly see what every thread is up to. Is everyone all blocked in the same operation, all queuing for the same limited resource? Secondly, when something hangs, I can look at the state of the unresponsive threads to see what they're doing. Generally, this shows me what lock they're stalled on, and who has the lock (or, more tellingly, if nobody seems to have the lock). And finally, I can profile the system during heavy load by periodically sampling the status of each thread and then building a histogram of statuses, to see which statuses take up most of the time. However, I had to be careful with this - it worked well if the status string merely recorded what step was in progress, but not if the status string included details of the data being worked upon, because that made the strings different even if the system was at the same step, so they didn't count for the same histogram bucket. The solution was to mark such parameter data in some way (such as by always quoting it with square brackets) so that profiling tools can make themselves blind to them.
Sometimes I've run into trouble with the fact that the same procedure might be called in many different places; so the thread status shows me that I'm using that procedure (and where I am in it) but doesn't tell me why. If I was attaching a debugger I could view the entire stacktrace, but that pauses execution (which may interfere with the delicate race condition I'm hunting down) and it's fiddly to do that for every thread or process in a system to get a good snapshot of the overall state, which is why I prefer the explicit-status approach in the first place. The solution is simple: rather than replacing the entire status string, each procedure should just append their status to the existing string, and remove it when it's done. In Java I did this by capturing the current thread name and then assigning "originalName + processStep" each time, then restoring "originalName" at the end of the process; in C, I did it by recording a pointer to the end of the current string and just writing our new string there, and then setting it to "\0" at the end (being careful not to overrun the buffer!). This turned the status string into a kind of mini-backtrace, but rather than logging each and every procedure call, it only logs things considered important enough to log. When looking at a snapshot of a system with a hundred concurrent requests in progress, this is a great time-saver.
But, clearly, all of the above really just comes down to "reporting the current status of a thread of execution". All that changes is what the infrastructure does with that information. There's no reason why the same notification from the software cannot generate a line in a log file, an on-screen log message to a waiting user, the creation, updating, or removal of a progress bar, the updating of a global scoreboard on what all the threads in the system are up to, and the updating of a profiling histogram of where the system is spending its resources.
So here's my proposal for an API.
(task-start! name [args...])=> task handle
- Starts a task, which is some unit of work. Tasks may be dynamically nested within a thread, opening tasks within other tasks. Returns a task handle. The name is a string describing the task, which should itself be static, but may refer to the additional arguments positionally with
(task-status! handle status [args...])=> void
- Notifies the system that the given task's current status is the given status string, which again may refer to the arguments positionally with
(counted-task-start! steps unit-s unit-p name [args...])=> task handle
- Starts a task with the given number of counted steps. The singular name of a step is
unit-sand the plural name is
"bytes". Otherwise, as per
(counted-task-status! handle steps-completed [status [args...]])=> void
- Notifies the system that the given counted task's current status is the given status string, which again may refer to the arguments positionally with
%<number>syntax, and that the given number of steps have been completed so far. The status and arguments can be omitted if there's really nothing more to say then the number of steps completed.
(task-end! handle)=> void
- Ends a task (counted or not), given its handle.
(task-fault! handle message [args...])=> void
- Logs a fatal fault (internal failure within the task) of a task. The message should be a static string, referring positionally to the arguments with the usual syntax.
(task-error! handle message [args...])=> void
- Logs an fatal error (invalid inputs, or invalid behaviour from a subcomponent, but not a problem with the task itself) encountered by a task. The message is as above.
(task-warning! handle message [args...])=> void
- Logs a non-fatal problem encountered by a task. The message is as above.
(task-debug! handle message [args...])=> void
- Logs a debug-level event encountered by a task. The message is as above.
Now, this looks like a very ordinary logging framework, with the exception of an explicit hierarchy of nested tasks and the explicit mention of "counted" tasks with a known number of steps. Yet those two additions allow for the same interface to cover all of the above goals.
Seeing how they might generate a log file for auditing is trivial. The explicit knowledge of task nesting lets us give context to common subtasks used in lots of different places, be it by something as simple as indenting each log message according to the subtask nesting depth or creating a an aggregrate "task pathname" by combining all the names of parent tasks into one long string to log.
Generating a profile is also trivial; using the task names and status strings without interpolating the arguments, we can obtain a measure of what bit of the code we're in - either in isolation, or including the entire context of parent tasks, as desired. And we can also generate histograms of the arguments for each different status if we want; if a given subtask takes widely varying amounts of time depending on how it's called, we can find out what arguments make it run slowly to narrow down the problem.
Debugging is helped by turning on the display of debug-level messages in the log, and by making a snapshot of the current status (status and name of the current task and all parent tasks) of each thread/process in the system available for introspecting the current state of the system. That's a useful thing to include in a crash log, too.
But reporting progress to the user is where having a common infrastructure really shines. Rather than needing the explicit construction of "progress dialogs" in interactive applications, the infrastucture could notice when a thread has spent more than a few seconds inside a counted task and produce on automatically. It would display the progress of the highest-level parent counted task of the current task hierarchy, as the overall progress measure of the operation in progress; but if a subtask takes more than a few seconds, then it becomes worthwhile automatically expanding the dialog to list the entire hierarchy of nested counted tasks with progress bars (the very tip of the hierarchy, that has not gained several seconds of age, should be elided to avoid spamming the user with endless wizzing progress bars for short-lived subtasks; only display subtasks that have shown themselves to take user-interesting amounts of time). And the display of non-counted subtasks and their textual statuses within the hierarchy can be turned on as an option for users who want to see "more verbose" information, perhaps along with each task then growing a scrollable log showing that task's history of statuses, warnings, starting subtasks, and so on.
How best to handle multithreaded tasks is an open question, depending really on the threading model of the language you're using. Perhaps tasks should require an explicit parent-task handle to be passed in, to make it clear what the task hierarchy is in the presence of threads; or perhaps newly created threads should inherit their parent's current task. Either way, with threading, it's possible for a task to be parent to more than one active subtask, and a progress reporting user interface will need to handle that, perhaps by dividing the space underneath the parent task into multiple vertical columns for each concurrent subtask, when that level of detail is required.
Also left unspecified is more detail on the "log level" concept; I've just suggested a few (fault, error, warning, status update (called "notice" or "info" in most existing logging systems) and debug, but the system really needs to make some intelligent decision as to who to notify. A server-based interactive application given a fault condition really needs to apologise to the user in vague terms, while sending a detailled crash log and raising an alarm to the system's administrators, for instance. And more complex systems composed of multiple components may have fairly complex rules about who should be notified about what level of event in what context, which I've not even begun to worry about here...
But, to conclude, I think it's a shame that there's so many very different bits of infrastructure that have very similar interfaces and are used in similar ways by software, but with very different goals. I think it'd be great to create a single progress-reporting interface for applications to use, and make different goals into pluggable components beneath that interface!
I need a holiday! (by alaric)
This year, I've been alternating between bleak depression and enthusiastic elation.
Luckily, it's easy to see a pattern - the elation is when I let myself get distracted by interesting things; the depression is when I have to tear my attention back to what needs to be done rather than what I feel like doing!
It's been a funny year. On the one hand we've moved into a much larger house, with much better facilities, that's warmer and easier to keep clean and tidy. My work is great, and I've managed to catch up on some things that have been hanging over me for years - tax paperwork, terminating my limited company (that had become nothing more than a thorn in my side since I stopped freelancing), simplifying and upgrading my server setup, tidying up my home directory and organising my life. On the other hand, I've been so busy that the new home has mainly been a place to eat and sleep rather than something I've had much chance to enjoy, and I'm behind on the (small, reasonable) list of projects I wanted to do this year - with no year left to do them; I've so far spent only a handful of days on my own projects in the entire year.
I spent a whole day sorting out my workshop on my birthday in April, and ended that day with a few little things to finish off - which are still waiting for me. I've not finished the ring casting, which should only take a couple more days, nor rebuilt my furnace, which should take a few days more.
I've done a bit better on computer-based projects as I can do them wherever I have my laptop; I've done some work on my fiction project, and made progress on my organisational infrastructure to convert a huge pile of "things that need investigating to even begin to decide what needs doing about them" into a tractable TODO list, and done some writing for the ARGON project web site.
But, with my ability to concentrate on what I'm supposed to be doing rapidly waning, it's clear that I need some time off. So, I've booked the week before Christmas off of work, and I hope to:
Do what I can to fix the roof in the workshop.
- It leaks. This will be hard to fix properly, as it'll require spending lots of money on materials; and possibly can't be done until there's some warmer, drier, weather to dry the decking out. But I'll see if I can improve on the current bodge somewhat, at least to give the decking a chance to dry properly without regular re-soakings.
- There's great big gaps in the eaves, all round the walls, varying from a centimetre up to about twenty centimetres, through which an icy wind blows. All the warm air from the heater disappears, and ivy creeps in. I need to seal them up (minus a controllable air vent to let out humid air and fumes from welding - perhaps an air vent plus an extractor fan with a fume hood would be the way to go in the long run). I plan to saw some strips of wood to length so they can go between the rafters, nail them in place, and use judicious amounts of sealant to keep the tenacious ivy at bay and to account for my general inability to cut wood to exact lengths properly.
Run Ethernet to the workshop so I have an Internet connection there. This will involve spending some money on outdoor-suitable conduit and fittings, and trunking for the interior runs, then drilling lots of holes in walls and running cables through and sealing the gaps. But the result will be that I can actually do computer work at a desk with a comfy chair, rather than hunched over a laptop on the sofa with children tugging at me.
Start building the computer infrastructure in the workshop. I'm looking at a battery-backed low-voltage power system feeding a Raspberry Pi (which I already have, waiting - Sarah got me one for my birthday), bristling with sensors. Because sensors are fun.
If the weather and time permit, work on my ring casting and the furnace, although that somewhat requires dry weather. We'll see.
Chill out, play computer games, write fiction and ARGON prose.
But I need to take care that next year isn't like this one. Taking on so many responsibilities that I struggle to maintain my productivity means I get less stuff done, not more, and makes it hard to prioritise my effort sensibly. I'm going to book three weekend days each month, in advance, for my projects or simple relaxation, rather than just thinking I'll do them "when I get a free day" only to find that all of my weekends are booked up months in advance. I'll be open to rearranging them in order to fit around the days when Sarah or the children need me, or we're visiting people for events - most of the time, it doesn't matter what actual day I do things on. Sometimes this will involve getting a whole weekend, and then just a single day at the other end of a month; that's fine, just as long as it lets me keep making progress on my projects, and giving me a chance to unwind from the stresses of constantly doing what I must do, rather than what I want to do.