Designing a Hardware Abstraction Layer: A VM for an operating system (by )

It's now quite commonplace to define virtual machines for "userland" code - the Java VM, the CLR, the Parrot virtual machine and 'hidden' VMs that underlie various programming language implementations such as CPython and Scheme 48.

However, it's more unusual to define a virtual machine for running an operating system kernel upon. Which is exactly what I set out to do with HYDROGEN, a kernel-level virtual machine for the ARGON project.

My motivation for doing so was simple: to make ARGON easy to port to different platforms. In particular, I wanted to be able to provide an implementation of ARGON that would compile and run on any POSIX-like system (from Linux to Microsoft Windows) with a C compiler; while on the other hand, being able to produce native bootable machine code images for bare-metal platforms from embedded processors to that holy grail of operating system implementation - x86. All with the minimum possible amount of effort on my part, as ease of implementation is a major design goal for ARGON.

To distinguish a kernel-level virtual machine from a userland one, I propose to use the (strictly synonymous, but used more in the context of operating systems) term "Hardware abstraction layer", or HAL, for the former. So HYDROGEN is a HAL rather than just a VM.

How Linux, NetBSD, etc. do it

It's instructive to start by looking at how conventional operating systems are "ported". Really, there's four parts to it:

Code generation

The first challenge is how to implement that most essential operation of a computer, computing itself - how to make the target platform even execute your code. Generally, the operating system and its applications are written in a platform-independent language, such as C, which must be 'compiled' to the platform's assembly language. For the overwhelming majority of operating systems, the way this is handled is by setting up gcc to output code for your target platform.

Most of the operating system and applications are written in portable C, but some parts may need to be written in assembly language directly, for various reasons; these parts need to have a version available for your target platform's processor. However, most of this code falls into the next categories:

Booting

The process of loading the operating system from wherever it's kept and starting it when the computer is turned on varies widely between platforms. And even once that is done, the process required to get the computer into the mode required to run the kernel (such as setting up the initial address space) depends on the details of the processor, memory system, and core peripherals such as interrupt controllers, system management modules, and bus bridges.

Usually, the code to do this has to use a lot of assembly language anyway, so the bootstrap tends to be written from scratch for each platform.

Device drivers

Some very low level parts of the device driver suite will need to be written in assembly language - such as operations to write to I/O address spaces that can't be mapped into the normal memory range, wrappers for interrupt handlers to establish correct machine state and to restore state after the handler completes, and so on.

Above that, the device drivers themselves will usually be written in C, but the scope for sharing device drivers between platforms varies. There may or may not be overlap in which devices are actually connectable to different platforms, in the first place; and even when common peripheral busses such as PCI are used, the way the PCI bus is accessed may vary between platforms, requiring either a platform-dependent abstraction layer or for drivers to be rewritten.

NetBSD does a good job of this, using a bus abstraction layer, that allows a driver for a given peripheral to be used regardless of the bus (PCI, ISA, etc) the peripheral is attached through, let alone how the bus is actually accessed on the platform.

Fundamental Architecture

When porting an operating system to a new platform, there must always be a high-level decision made as to how to meet the needs of the operating system in terms of the fundamental architecture of the platform. For example, the VAX platform had a fixed address layout, splitting the 32-bit virtual address space into four sections based on the top two bits; so the kernel region shared between all processes had to be in a certain address range, while x86 lets one handle taht isolation in a number of ways, by using the independent paged or segmented protection systems. The privilege model likewise varies between platforms. Most operating systems play it safe by assuming there are at least two modes, kernel and user, and that some address space may be shared between processes and made readable only in kernel mode, and that there is some mechanism for switching from user mode to kernel mode and back again; the details of how this is all done can be handled in a platform-specific module, and the rest of the kernel just deals in terms of that 'model'.

How HYDROGEN does it

It's clear that kernels such as Linux or NetBSD really split the problem into two - mapping portable source code into platform-specific machine code (handled by gcc) and platform-specific code (handled by splitting the kernel into platform-specific and platform-independent parts). However, this forces us to use the gcc model of code generation (as a batch process, which then has to be done in advance and does not provide the many benefits of runtime code generation); and also (and slightly relatedly) the boundary between platform-specific and platform-independent code in the kernel is often complex, having developed in an ad-hoc manner.

So the HYDROGEN HAL is designed to handle both aspects at once; all the platform-dependent code, including the mechanism of generating platform-specific machine code, is provided by HYDROGEN, behind a well-defined interface. Not that it's impossible to write platform-specific code above that interface; the semantics of many of the HYDROGEN operations are parametrised on properties of the implementation such as word size, but it's being carefully designed to make it easy to write programs that work portably by avoiding uses of the interface that would expose these variations in semantics. C has a similar model, but I'd like to think I've handled it a little better than C, with its notoriously vague int type hierarchy...

In subsequent posts, I will write more on various aspects of the design for HYDROGEN:

3 Comments

  • By Faré, Sun 12th Jul 2009 @ 2:21 pm

    What do you think of the VPRI's COLAs and the attempts at running it on the bare metal on some platforms such as the OLPC?

  • By Faré, Sun 12th Jul 2009 @ 2:22 pm

    Also of course, Synthetix, the SPIN kernel, Genera, etc.

  • By alaric, Tue 14th Jul 2009 @ 10:56 am

    COLA on the OLPC: Judging from http://www.vpri.org/fonc_wiki/index.php/XO_Hacking it looks fun, but they don't seem to be making an attempt at a general HAL, as opposed to just porting their compiler to a target platform and playing about with twiddling the hardware once it's booted.

    Synthetix: Got a URL?

    SPIN and Genera: What of them?

Other Links to this Post

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales