HYDROGEN: Implementation (by )

As promised, here is the fifth part of my series on HYDROGEN, where I will discuss implementation strategies.

Introduction

In the previous posts, I've spoken about various ways in which various aspects of the system might be implemented on various types of platform. In particular, I've spoken of implementations on the bare metal of systems from embedded devices with a hundred or so kilobytes of RAM up to 64-bit multiprocessors with stupendous amounts of RAM and large hard disks; and I've spoken of implementations that run as processes on POSIX-like operating systems (by which I really mean anything with processes and a filesystem; the sort of model that the C standard libraries aim for), either generating native code or producing a threaded code for efficient interpretation, thereby avoiding an architecture dependency.

One might query the value of an operating-system hardware abstraction layer implementation that runs on top of an existing operating system; but it has several uses.

Firstly, it's invaluable in development. If HYDROGEN lets your OS kernel run in a process on your Mac laptop while you're working onit, that saves you fiddling with virtualisation and the like to have a development environment.

Secondly, it is actually a reasonable production environment. Assuming I get enough free time, or enough money to hire somebody, to implement all I have planned for ARGON. If it could only run on a dedicated machine since there were only bare-metal versions, that would slow its adoption tremendously. Allowing ARGON to run as something like a distributed fault-tolerant load-balancing super-duper [[WIkipedia:Java Virtual Machine|JVM]] means it can be run on existing operating systems, alongside native software, and using that operating systems' existing device drivers.

Thirdly, I can get a version that runs on top of a POSIX OS finished a lot more quickly than a bare-metal version. So, looked at another way, the design of the HYDROGEN hardware abstraction layer means that I can get ARGON written and running on a wide variety of platforms relatively quickly, while still being able to provide the performance, reliability and maintenance benefits of ditching all that legacy infrastructure and porting it to bare metal in future without needing to rewrite too much code.

The structure of an implementation

Regardless of what sort of platform it's built on, most implementations will consist of two parts.

The core

The first part is a minimal kernel written in some platform-specific form (assembly language or C for most platforms; but maybe things like Java or Python if we decide to cover the JVM or Parrot, in a fit of madness). This will consist of basic bootstrap code to set up an execution environment, the implementations of primitive subroutines, then the implementation of the compiler and interpreter.

Difficult Bits

A lot of the core of a bare-metal environment is quite difficult grunt-work, such as supporting the plethora of network and disk controllers one might find in x86 machines.

Luckily, I have not produce any amazing innovation concerning the internal structure of a device driver, so we can actually potentially reuse existing code to do this. The NetBSD kernel is nicely modular, so it might be practical to remove the paging, process management and scheduling side of things (leaving a stub process per CPU to keep state), resulting in a bootable environment (for so many platforms!) that gives us a NetBSD device tree, a bunch of free memory, and a supervisor-mode execution environment upon which we can build a HYDROGEN runtime.

The library

Once that much is in place, the rest of the HYDROGEN system can be written in HYDROGEN. It will be a crude dialect of HYDROGEN at first, as it will lack all the standard components that have yet to be written; and it will be a platform-specific dialect, as everything defined in the HYDROGEN specification is there because the implementation of it is implementation-specific (anything we are unlikely to gain any benefit from having platform-dependent implementations of, like data structures, is left out of the HYDROGEN specification; that's a job for higher layers). The core will probably define some platform-dependent primitives that provide access to the parts of the system that are required by the library for implementing the rest of the HYDROGEN specification.

However, this does not mean that only subroutines defined in the minimal core will be "native" to the platform, with everything else compiled by the HYDROGEN code generator. There's no reason why there can't be other code generators than the HYDROGEN one available - in particular, assemblers.

Native Code

FORTH has a good history of embedding an assembler. In a similar vein, we can define platform-specific parsing words that read assembly language, and assembled it into a subroutine. Such words are platform- and implementation-dependent, not just because the CPU architecture defines the assembly language, but because different implementations will implement different mappings from virtual-machine concepts such as stacks and registers to physical processor resources.

So although we can't specify assemblers in the HYDROGEN spec itself, we can specify a general process whereby a performance-critical subroutine might have a definition in HYDROGEN source code that uses the standard portable code generation framework, then optional tailored implementations for specific platforms and implementations, which may then be written in assembly language.

The basic process is easy: we already have the feature? word to check for the existence of a feature, and any implementation-specific platform-specific extensions (such as assemblers) should be listed as features, so the same mechanism can be used; but for the common special case of providing different native-code versions of a subroutine definition, we standardise NATIVE, the word for invoking an assembler, to work thusly:

  NATIVE <platform/implementation name>
     ...native assembly code for that platform...
  NATIVE <another platform/implementation name>
     ...native assembly code for that platform...
  FALLBACK ( ... normal code ... )

Alternatively, for the sad case where no fallback can be provided, we can write FALLBACK NONE.

The implemenation for NATIVE must parse its input, skipping NATIVE blocks that name a platform/implementation that we aren't. If it finds one that matches, then it can assemble as far as the next NATIVE or FALLBACK, the continue skipping. If none matched, then if FALLBACK ( is found, drop into the compiler as if it had been just ( all along; if FALLBACK NONE is found, then fail with an error.

Either way, the eventual effect of the code is either an error or to push a subroutine handle; either to a native assembly version of the subroutine, or to the fallback compiled implementation.

Even the portable interpreter in C can have a NATIVE platform/implementation type - one that calls arbitrary C functions via dlopen, dlsym and libffi!

This facility ought to be useful in external libraries that do CPU-heavy things like numerical processing; it's not meant as a way to allow things outside of the HYDROGEN core to break the abstraction layer and access hardware devices other than ones that can speed up computational tasks. Anything like that should be written as a device driver in the correct manner for your implementation, and inserted into your implementation in an implementation-specific way.

Roadmap

My first implementation is going to be the vmgen-based portable C one that runs on top of POSIX systems. It'll use POSIX Threads to implement multiple virtual CPUs, and the libffi/dlopen tricks for loading external libraries (or wrappers for them written in C), and will use nonblocking I/O.

That's the quickest way to get a prototype working. And then I will play with the language. I need to write some actual code in a prototype of the language, to find any rough edges, cases I've not anticipated, and functionality that should be in the core language. After some tinkering, I'm sure I'll have valuable feedback into the language spec, from which I can plan the second prototype.

No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales