Category: Computing

Public key cryptography wish list (by )

I have opined in the past about how I'd like better support for public key infrastructure in applications and user interfaces, and a few ideas for how to generalise the signature infrastructure a bit, but I've since been accumulating even more things I think should happen to bring the benefits of strong public-key crypto to the masses.

  • I should be able to use a PGP key to sign my HTTP requests, as an HTTP authentication mechanism. For web apps that support it, the option of choosing a PGP key from my private keyring should appear on login boxes.

  • I mentioned before that I'd like to be able to sign blog comments and posts and other content I submit to web apps in text areas; but I feel like re-iterating it, and point out that this could be handled more neatly by having an extra HTML attribute on the <textarea> suggesting that it accepts signed content, thereby causing my browser to send a detached signature in the submission (as if placed in a second text area, whose name is the value of the attribute, but which does not need to actually exist as an HTML element) if I opt to take it up on the offer. That would be better than the hack recommended in my previous post.

  • Seamless support for signing all, or part, of a Web page, using an element wrapping the content which also refers to the signature (as a URI, or including it inline). For cases like where Markdown has been used to process the original entered content to make it into a Web page, the app should offer a link to the original content wrapped in the original signature; the app could have access to its own private key in order to sign the generated HTML as well, but that's orthogonal to the issue of the original author of the content signing it. Indicating to the user that a region of the page is signed needs to be done in a way that the page itself can't fake with CSS and JavaScript! Given the presence of canvas elements, this will presumably mean it has to involve some UI element outside of the rendering area of the web page - eg, in the browser toolbar.

  • Signing should really be the default state for files, messages sent via various means, etc - my user interface should be marking unsigned messages and files in red!

  • Public key management user interfaces should learn from Petnames, in order to provide a nice user interface while making impersonation attacks hard to do.

  • Seamless support for PGP-signed tar files. No need for a detached signature to download (it's in the tar file itself). Basically, I'd like to have tar able to detect a signed file and check the signature and seamlessly unwrap it to feed into the decompressor and then onto the actual tar file reading itself. This would be particularly pleasing, but in general I'd still argue for every app that reads a file to silently accept PGP-signed files without needing to explicitly unwrap them!

Needless to say, I am mulling infrastructure in ARGON to make public-key infrastructure an integral part of CARBON, and I'd suggest a Petname-based user interface for the management of entity IDs and CARBON global names!

Felix Winkelmann interviewed in Atomic Spin (by )

Here's an interview with my favourite Scheme implementer!

Building an online currency exchange (by )

The biggest currency exchange market in the Bitcoin world is MtGox. When it goes down, either due to a DDoS attack or sheer high load due to everyone panic-selling, then people who hold bitcoins and care about their value in dollars get the panicky realisation that they can't easily sell them - which causes two things:

  1. A drop in the value of bitcoins; people care about their ability to turn them back into fiat money, and will continue to do so until lots of things can be bought directly with bitcoins.

  2. Widespread angst that MtGox is a central point of failure for the Bitcoin economy, complaining that they are vulnerable to DDoSes and get high trading lag when under load, and so on.

So, as a high-performance systems developer, I thought I'd write some notes on how to build a more resilient exchange platform. Perhaps MtGox will do something like this, but perhaps more ideally, one of their competitors will, and thereby win some of MtGox's market share, and thus decentralise the exchange market somewhat.

Read more »

Jean Programming Again :) (by )

So I resigned myself to the fact that Alaric would turn the new TV into a giant computer screen.

Alaric turning the TV into a giant computer monitor

Of which my main concern is that he will work on it rather than going down to the 'lab' to work, but instead he set things up so Jean could play with Turtle Graphics which is a new version of an old thing he has implemented. Basically she has to actually programme in order to get a little turtle to draw. She has produced me some lovely flowers and a house and things 🙂

Jean Age 7 programming in Scheme

The pictures she's created are on Al's laptop so he'll have to pop them on here when he gets a spare moment (haha). She is using Chicken Scheme and using a programme called Simple Graphics which is what Al created initially though I believe it is open source and others have since added stuff 🙂

She also started to 'compose' scenes for photography today - which was a shock to me!

Logging, Profiling, Debugging, and Reporting Progress (by )

I have a hypothesis. The hypothesis is that, in software, the goals of logging stuff for audit purposes, reporting progress to a user on the course of a long-winded process, profiling a system to see where it is spending its time in the long run, and debugging certain kinds of problems (in particular, hangs of multithreaded systems come to mind, although many other kinds of problems too), can all fundamentally be met by a single piece of software infrastructure.

I say this because, largely, I write very similar code to do all four of them, and end up building unnervingly similar-looking ad-hoc solutions, despite the fact that a single well-developed tool could do all of them really well.

Basically, all of them boil down to a bit of code stating what it's about to do and what it's just done at interesting points.

Let's look at each in turn.

Software often writes, either to a log file for unattended bits of software such as daemons, or direct to the user if there is one, lines of text describing the current state of affairs. Generally, it logs when steps in the process are started (to document what it's about to do), when steps end (to document the results), and sometimes during iterations over something or other (although that's really just a special case of the previous cases, as each iteration is itself a step). This is often done for auditing reasons - to find out what operations led up to some state of the system, for a plethora of reasons - and sometimes to offer a kind of primitive progress reporting; you can look at the log to see how far something has gotten.

But we have more sophisticated progress reporting, sometimes. Rather than using a dumb logging facility to say "I'm about to process 1000 objects", then "I have processed N objects so far" periodically and letting the user read the text and do the maths to work out how far through we are, we can use a more evolved progress reporting infrastructure (usually called a "progress bar"), telling it when the process starts (displaying the bar) and ends (removing the bar from the display) and periodically updating it on progress by telling it how many steps we have to do and how many are currently done. Given that higher-level information, it can produce a more intuitive display, and even estimate the time to completion and report a rate of progress.

Although the display is rather different, there is a great commonality in the interface - we are stating the fact that a process has started (and how many steps it will take), and periodically stating how far we have done, and then confirming when we are finished. A high-level progress reporting interface could just as easily generate lines in a log file as display a progress bar on a graphical screen, from the same software interface.

I've also used logging for debugging, quite extensively. Rather than logging user-level messages that relate to the user's model of the state of the system, I can also log messages about the internal state of things "under the covers", and use that to guide my own investigations of problems caused by the internal state going awry. Sometimes I will do that by just printing those messages to a console I can observe, and removing or commenting out the code that does so when I'm not debugging; but more featureful logging infrastructures allow for "debug logging" that is hidden away unless specially requested, meaning I can leave the logging statements present in "production code" and turn them on at run-time, which can be a great boon.

Meanwhile, in profiling and debugging server applications with multiple threads handling requests in parallel, I have often used a particular technique. I've given each thread a thread-local "current status string", and then peppered my code at interesting points with a call to set the current thread's status string with a summary of what it's about to do. I can then easily ask the system to display a dump of all running threads and what they're currently doing. Java makes this easy with threads having a name string and the ability to introspect the tree of threads and thread groups; in a system written in C using different processes rather than threads, I've written my own infrastructure to do it using a shared memory segment with each process getting a fixed-sized slot therein and some lock-free atomic update mechanisms to avoid races between the processes and the sampling tool.

This lets me do two things. Firstly, when the system is grindingly slow, I can quickly see what every thread is up to. Is everyone all blocked in the same operation, all queuing for the same limited resource? Secondly, when something hangs, I can look at the state of the unresponsive threads to see what they're doing. Generally, this shows me what lock they're stalled on, and who has the lock (or, more tellingly, if nobody seems to have the lock). And finally, I can profile the system during heavy load by periodically sampling the status of each thread and then building a histogram of statuses, to see which statuses take up most of the time. However, I had to be careful with this - it worked well if the status string merely recorded what step was in progress, but not if the status string included details of the data being worked upon, because that made the strings different even if the system was at the same step, so they didn't count for the same histogram bucket. The solution was to mark such parameter data in some way (such as by always quoting it with square brackets) so that profiling tools can make themselves blind to them.

Sometimes I've run into trouble with the fact that the same procedure might be called in many different places; so the thread status shows me that I'm using that procedure (and where I am in it) but doesn't tell me why. If I was attaching a debugger I could view the entire stacktrace, but that pauses execution (which may interfere with the delicate race condition I'm hunting down) and it's fiddly to do that for every thread or process in a system to get a good snapshot of the overall state, which is why I prefer the explicit-status approach in the first place. The solution is simple: rather than replacing the entire status string, each procedure should just append their status to the existing string, and remove it when it's done. In Java I did this by capturing the current thread name and then assigning "originalName + processStep" each time, then restoring "originalName" at the end of the process; in C, I did it by recording a pointer to the end of the current string and just writing our new string there, and then setting it to "\0" at the end (being careful not to overrun the buffer!). This turned the status string into a kind of mini-backtrace, but rather than logging each and every procedure call, it only logs things considered important enough to log. When looking at a snapshot of a system with a hundred concurrent requests in progress, this is a great time-saver.

But, clearly, all of the above really just comes down to "reporting the current status of a thread of execution". All that changes is what the infrastructure does with that information. There's no reason why the same notification from the software cannot generate a line in a log file, an on-screen log message to a waiting user, the creation, updating, or removal of a progress bar, the updating of a global scoreboard on what all the threads in the system are up to, and the updating of a profiling histogram of where the system is spending its resources.

So here's my proposal for an API.

(task-start! name [args...]) => task handle
Starts a task, which is some unit of work. Tasks may be dynamically nested within a thread, opening tasks within other tasks. Returns a task handle. The name is a string describing the task, which should itself be static, but may refer to the additional arguments positionally with %<number> syntax.
(task-status! handle status [args...]) => void
Notifies the system that the given task's current status is the given status string, which again may refer to the arguments positionally with %<number> syntax.
(counted-task-start! steps unit-s unit-p name [args...]) => task handle
Starts a task with the given number of counted steps. The singular name of a step is unit-s and the plural name is unit-p, eg "byte" and "bytes". Otherwise, as per task-start!
(counted-task-status! handle steps-completed [status [args...]]) => void
Notifies the system that the given counted task's current status is the given status string, which again may refer to the arguments positionally with %<number> syntax, and that the given number of steps have been completed so far. The status and arguments can be omitted if there's really nothing more to say then the number of steps completed.
(task-end! handle) => void
Ends a task (counted or not), given its handle.
(task-fault! handle message [args...]) => void
Logs a fatal fault (internal failure within the task) of a task. The message should be a static string, referring positionally to the arguments with the usual syntax.
(task-error! handle message [args...]) => void
Logs an fatal error (invalid inputs, or invalid behaviour from a subcomponent, but not a problem with the task itself) encountered by a task. The message is as above.
(task-warning! handle message [args...]) => void
Logs a non-fatal problem encountered by a task. The message is as above.
(task-debug! handle message [args...]) => void
Logs a debug-level event encountered by a task. The message is as above.

Now, this looks like a very ordinary logging framework, with the exception of an explicit hierarchy of nested tasks and the explicit mention of "counted" tasks with a known number of steps. Yet those two additions allow for the same interface to cover all of the above goals.

Seeing how they might generate a log file for auditing is trivial. The explicit knowledge of task nesting lets us give context to common subtasks used in lots of different places, be it by something as simple as indenting each log message according to the subtask nesting depth or creating a an aggregrate "task pathname" by combining all the names of parent tasks into one long string to log.

Generating a profile is also trivial; using the task names and status strings without interpolating the arguments, we can obtain a measure of what bit of the code we're in - either in isolation, or including the entire context of parent tasks, as desired. And we can also generate histograms of the arguments for each different status if we want; if a given subtask takes widely varying amounts of time depending on how it's called, we can find out what arguments make it run slowly to narrow down the problem.

Debugging is helped by turning on the display of debug-level messages in the log, and by making a snapshot of the current status (status and name of the current task and all parent tasks) of each thread/process in the system available for introspecting the current state of the system. That's a useful thing to include in a crash log, too.

But reporting progress to the user is where having a common infrastructure really shines. Rather than needing the explicit construction of "progress dialogs" in interactive applications, the infrastucture could notice when a thread has spent more than a few seconds inside a counted task and produce on automatically. It would display the progress of the highest-level parent counted task of the current task hierarchy, as the overall progress measure of the operation in progress; but if a subtask takes more than a few seconds, then it becomes worthwhile automatically expanding the dialog to list the entire hierarchy of nested counted tasks with progress bars (the very tip of the hierarchy, that has not gained several seconds of age, should be elided to avoid spamming the user with endless wizzing progress bars for short-lived subtasks; only display subtasks that have shown themselves to take user-interesting amounts of time). And the display of non-counted subtasks and their textual statuses within the hierarchy can be turned on as an option for users who want to see "more verbose" information, perhaps along with each task then growing a scrollable log showing that task's history of statuses, warnings, starting subtasks, and so on.

How best to handle multithreaded tasks is an open question, depending really on the threading model of the language you're using. Perhaps tasks should require an explicit parent-task handle to be passed in, to make it clear what the task hierarchy is in the presence of threads; or perhaps newly created threads should inherit their parent's current task. Either way, with threading, it's possible for a task to be parent to more than one active subtask, and a progress reporting user interface will need to handle that, perhaps by dividing the space underneath the parent task into multiple vertical columns for each concurrent subtask, when that level of detail is required.

Also left unspecified is more detail on the "log level" concept; I've just suggested a few (fault, error, warning, status update (called "notice" or "info" in most existing logging systems) and debug, but the system really needs to make some intelligent decision as to who to notify. A server-based interactive application given a fault condition really needs to apologise to the user in vague terms, while sending a detailled crash log and raising an alarm to the system's administrators, for instance. And more complex systems composed of multiple components may have fairly complex rules about who should be notified about what level of event in what context, which I've not even begun to worry about here...

But, to conclude, I think it's a shame that there's so many very different bits of infrastructure that have very similar interfaces and are used in similar ways by software, but with very different goals. I think it'd be great to create a single progress-reporting interface for applications to use, and make different goals into pluggable components beneath that interface!

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales