Garbage collection (by )

In particular, I was thinking about how to handle memory management in ARGON.

The HELIUM scheduler in ARGON is in charge of allocating CPU time and memory to bits of application software. Unlike the schedulers in most other systems, the emphasis is not on long-running processes that sit there suspended waiting for events, but on event handlers that are activated in response to an event, and finish quickly.

So it had occurred to me that I might be able to avoid a certain amount of garbage collection problems by keeping track of all the memory requested by the running of a particular event handler, and when it finishes, just freeing up all the memory that doesn't have a special reason to hang around (eg, it is a piece of data that the event handler requested be written to disk, but it's being kept in memory still so that if the next event handler asks to have it read back from the disk again, there's a copy already in memory to be used - or it's a piece of data that the event handler requested be sent somewhere, and it has yet to get there).

Only if an event handler starts using an excessive amount of memory, or running for a long time, would it need actual garbage collection. Each event handler would be given a "quota" of running time and memory; if it started getting close to the memory quota, garbage collection should kick in to try and free up space for it. If it hits its memory quota, it would be suspended until the garbage collector manages to free some memory for it, and if the garbage collector can't find any, then the event handler would have to be aborted due to trying to use more resources than allocated to it.

But I was wondering what kind of garbage collector to use. One that causes pauses in execution, like mark-sweep, wouldn't be the end of the world since it could be applied only to the memory blocks used by a single event handler - so only that event handler would pause, and not the whole system; other event handlers would still run. And since it would only be invoked for event handlers that had already consumed a lot of resources, it would only be delaying something likely to take a long time anyway.

I felt drawn back to the Armstrong/Virding algorithm, however. It was so elegant and efficient, and ARGON event handlers will be mainly functional, anyway. Could I extend it to handle non-functional programs?

The authors did briefly consider how this might be done at the end of the paper, but without a satisfactory answer.

However, I think I have found one. I need to do a bit more thinking to make sure I've not missed anything, but I suspect that one could use their system, except that you treat a modified memory block as if it's been deleted and a new one made - the fact that the 'new' memory block is in exactly the same location as the old one makes no difference. Whenever modifying an object, move it up to the top of the list, as if it were newly created, thus preserving the ordering of the list.

This is a moderately costly operation - at least four locations in memory need to be modified to move an entry in a list to the head. So it's a bit worse than the cost of reference counting, but this is the only extra effort required when a memory block is modified; whereas with reference counting, whenever a reference is updated, the old and new memory blocks referred to need a single location in each updating. With my system, any numbers of changes within a single memory block require just those four modifications, rather than two for each reference updated.

And since the ARGON event handlers are mainly functional, actual modifications of memory regions are relatively rare - most event handlers will request that data be read from disk or from over the network (creating memory blocks containing input data - or reusing existing blocks containing that data left behind from other event handlers), generate memory blocks containing intermediate results (which will all end up being cleared away by garbage collection, or discarded when the event handler finishes), and finally generate memory blocks containing the final result (which will be handled by the runtime system, then discarded).

I was struck by a beautiful symmetry with the technique I settled on for the TUNGSTEN file system - when a disk block is to be modified, it is left as-is and a new copy of it placed elsewhere on the disk, with the modifications applied. In TUNGSTEN, this enables protection against disk failure during a complex multi-block update operation. In HELIUM, the illusion of copy-on-write can be used to allow very efficient (in so many senses of the word!) garbage collection.

I was so delighted by these thoughts coming together as I left the train that I was grinning like an idiot and talking to myself under my breath...

Pages: 1 2 3 4 5 6 7

4 Comments

  • By Faré, Fri 18th Nov 2005 @ 3:21 am

    So as to be able to move your object to the top of the list, yet preserve the invariant that no object points to a newer object, you need to also move up all the objects that point to your born-again object, and so on recursively. In practice, this is only affordable if your object has no one pointing to it except the current context -- in which case what you have is a linearity constraint on objects.

  • By Alaric Snell-Pym, Fri 18th Nov 2005 @ 10:31 am

    Ah, but I do have a linearity constraint on objects 😉 That's why I was amused by the similarity to "copy on write" handling of tree structures on disk, which basically works the same way (to safely update a tree structure like a B-Tree, make copies of the changed leaf nodes into empty space, then work out what intermediate nodes would need changing to reflect the new locations of the leaf nodes and do the same with them, then continue bubbling the change up the tree until you have a new root pointer, etc).

    I'm sure there's some Fundamental Truth in the fact that the same underlying technique turns out to be useful both on disk and in memory, yet in very different contexts (ACID properties on disk, fast garbage collection in RAM).

  • By Alaric Snell-Pym, Sun 20th Nov 2005 @ 2:33 pm

    Oooh, while sawing up logs (a great time for thinking about abstract stuff) I was struck by a flaw...

    When a memory block is modified and gets brought up to the head of the chain, IF the collector has not yet reached the block in this pass, then the blocks referenced by this block will not get marked since the collector will then not visit the block in question until the next pass. If nothing else refers to the same blocks, they'll be freed. Oops!

    So we need to make sure that a block moved to the top of the chain still gets seen. My first thought was that the application could just quickly scan the block and mark all the referenced blocks, but that's wrong - it's the collector's job.

    So my next thought was to have a (either shared and lock-free, or per-processor, to stop it from becoming a point of contention in SMP systems) stack of 'touched' objects; when altering an object, the application would merely need to push a reference onto this stack (it wouldn't even need to do the move to the head of the chain). Now, the collector, whenever it's about to examine the next object in the chain, would first look on the stack(s) and go through any memory blocks on them, marking all the referenced blocks. That way, it will never be considering a block for freeing unless it has already 'scanned' all modified objects, so there's no chance of it mistakenly freeing something. Whenever the collector has scanned a block from the stack, it can then also do the chore of moving that block to the head of the chain, moving the task from the application code.

    However, there is a problem - an application that just sits there modifying the same large array of pointers over and over again would keep the collector forever rescanning that large array; never getting any real collection done. What we need is to only stack memory blocks for scanning if they've not yet been scanned anyway. This is easily resolved; have a 'scanned' flag in each block, that the collector sets whenever it scans a block, be it due to the block being on the stack or by traversing the chain. Newly allocated objects also have the 'scanned' flag set, since all the objectss they refer to must have been reachable anyway, and thus will be marked - they don't need rescanning until the next pass. When the collector finishes scanning the chain and is about to start again, it has to clear all the 'scanned' flags; but rather than walking the chain doing this, it's easier to just reverse the interpretation of the 'scanned' flag. Then newly created blocks will need to be marked as 'unscanned' for the next scan.

    There's a potential race condition in that if the collector changes the global variable that says what newly created blocks should be marked as between the application reading the current setting and the application putting the newly created block at the head of the chain, it could end up with the wrong setting. Therefore, before doing the swap, the collector should take a copy of the pointer to the current head of the chain; then when it starts its walk of the chain, it should force the correct value into all the blocks it examines until it hits the point in the chain it marked, this way ensuring nothing gets missed.

Other Links to this Post

  1. Snell-Pym » Garbage collection — Tue 31st Jul 2007 @ 5:56 pm

RSS feed for comments on this post. TrackBack URI

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales