CHROME and IRON data typing (by )

IRON schemas and the CHROME type system are intertwined... more accurately, I'd say the latter was a superset of the former.

Let me give a few examples of my requirements for the system. CHROME is mainly a pure functional language; only explicitly mutable data structures may be altered, and then under various constraints to allow for concurrent access. So there needs to be two versions of every basic data structure (list, set, array, etc); one with purely functional operations, and one with mutating operations. And ideally, methods to extract an immutable snapshot of the current state of a mutable structure.

There are no "global variables" in the normal sense - instead, CHROME programs refer to TUNGSTEN for data storage. That way, all state shared between activities is persistent, and distributed.

However, a given thread may split into two or more threads that share temporary mutable storage in-memory. Sometimes this is the best type of algorithm to solve the problem at hand.

Objects may need to be serialised for communication across a network - and serialisation will also be part of what TUNGSTEN does to store the persistent state of an entity on disk.

Mutable data structures that represent the entity's global state, and those that are just used to communicate between threads running on a node, ought to have the same interface to programmers, to avoid unnecessary duplication.

I propose that IRON includes a 'schema language' that can be used to specify the structure of user-defined types in terms of a small set of base types, and other user-defined types. The set of base types need not be large, since so many types (like dates and strings) can be made user-defined in terms of them. I'm looking to ASN.1 and s-expressions for inspiration here, maybe inspired by some of the work done with XML. The immutable base types might be:

  • Integer(min,max). The lower bound may be -∞ and the upper bound may be ∞, although nodes with finite memory may not be able to represent all integers...
  • Character. Selected from the entire 4-byte Unicode character set. With plentiful constraints, such as the surrogate codes not being legal characters since we directly represent the full character in one Character object, and accented characters that can be represented by an existing character plus a combining character being disallowed for canonicalisation reasons. Constraints on which control codes are legal; only those required for abstract representation of text - 32 for a space, 2028 for a line seperator, 2029 for a paragraph seperator. Functions on characters to extract all the Unicode database properties. Functions to convert an integer in the range 0..255, as a character index in any of a range of 8-bit character encodings, into a valid Character - mapping codes as appropriate, including mapping illegal input codes to FFFD ("REPLACEMENT CHARACTER"). Likewise for 16 bit and 32 bit Unicode characters, in which case the system need only deal with input characters that are illegal under our strict subset of Unicode. The intent here being to keep Characters as simple as possible, to release software that operates upon them from having to deal with the problems created by all the other character encodings out there
    • by putting that complexity at the edge of the system, when communicating with other systems, where it can more easily be updated to track changes.
  • Floating point number types. I think the IEEE standards for these would be a good place to get a starting selection.
  • Null. The often-useful type that has only one value... itself.
  • Symbol. Its definition at this level does not tie it to a string representation (for strings are defined higher up the type stack), but a symbol represents objects that, at heart, have only one operation: an equality test. However, for transport, they will be represented as a namespaced string.

There's no need to define mutable versions of the base types, when we can define:

  • MutableCell(type). A trivial optimisation of a single-element mutable array of the specified type, with get and set methods.

On top of these, we can layer constructed types. The list of constructed types is rather longer, since TUNGSTEN can benefit heavily from knowing as much as possible about the access pattern expected when implementing the mutable types as distributed transactional state - eg, a MutableQueue can easily be implementead as a MutableList, but TUNGSTEN can be far more efficient at implementing a queue if it knows that's what it's dealing with.

  • Array(type,list of dimensions). An N-dimensional immutable array; each "dimension" is either a specified size, or a minimum and maximum size. So a 4x4 array would be Array(foo,[4,4]), while just any two dimensional array would be Array(foo,[0..,0..])
    • although we may want a shorthand for such cases, perhaps Array(foo,[*,*]).
  • MutableArray(type,list of dimensions). As per Array, except that the contents of the cells are mutable. However, the dimensions aren't.
  • MutableList(type). A mutable variable-length list. There's no point in an immutable version - it would be equivalent to an array.
  • Dictionary(keyType,valueType). An immutable dictionary, with an operation to get the value corresponding to a given key, and to obtain the key set.
  • MutableDictionary(keyType,valueType). Mutable version of the above, with operations to get and set the value corresponding to a key, add and remove keys, and to obtain an immutable snapshot of the set of all keys.
  • Record(list of parents,dictionary of symbols to types). An immutable version of C's struct, with named fields of defined types. However, with the list of parents, the record type can inherit from zero or more existing record types, more like C++'s struct. This forms a subtype relationship.
  • MutableRecord(list of parents,dictionay of symbols to types). Likewise, but mutable.
  • Union(list of parents,dictionary of symbols to types). A discriminated union, with inheritance.
  • MutableUnion(list of parents.dictionary of symbols to types). Likewise, but mutable, in that you can alter which variant of the union it is, as well as the encapsulated value.
  • Set(type)
  • MutableSet(type)
  • MutableQueue(type). There is no need for an immutable version of this - Array quite suffices.
  • MutableTable(record type). A table of records, like an SQL table. However, due to the subtyping relationships between records, it can be polymorphic - unlike standard SQL tables. It can also be considered akin to a Linda tuple space.
  • Table(record type). Immutable version of the above.

Pages: 1 2

1 Comment

Other Links to this Post

  1. Snell-Pym » A syntax for IRON — Wed 4th Jul 2007 @ 3:12 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales