Type systems (by )

There are a number of type systems out there in different programming languages. In fact, there's zillions, but they boil down to a few basic approaches.

One of the biggest splits is between statically decidable type systems and more dynamic ones. In a totally static non-polymorphic type system like C uses, every variable and every member of a data structure has a single type, meaning that its value must be a member of a known set of values. Generally these sets are entirely exclusive - there is nothing that's both an integer and a string - but often there's a special case called NULL that bends the rules slightly.

Then you can add a bit of polymorphism. In C++, types can be subtypes of other types due to class inheritance. In Haskell, you can be a bit more flexible, but it's the same sort of thing. Lots of variables and members are still monomorphic, and where some aren't, this is either handled with tagged values or invisibly creating multiple instances of a function with one or more polymorphic parameters, each specialised for a possible type assignment so that they are actually monomorphic.

In C and C++, the syntax forces you to assign a type to every variable definition, function argument definition, or member definition in a data structure. The compiler's type checking job is then quite simple, since it can assign a type to every expression by taking the declared types of the inputs and applying rules - int + int = int, and looking up the type signatures of functions called - to find a type for the expression, complaining where there is no type rule for the operator and types used, or if the function called does not accept parameters of that type. Then it checks that the place the value of the expression is being sent accepts that type, and complains if not.

Some languages make things a bit easier with type inference, however, In Haskell, type declarations are optional. The compiler infers types for everything by tracing where values come from in a vaguely similar way to how the C/C++ compilers type an expression, except with added complexity due to the polymorphism and type inference across function call boundaries. The system knows the types of all "standard" values and functions and so on, so can in principle deduce the types of entire programs without any type declarations, but type declarations help to reduce the time the compiler spends inferring the types, and can help to report errors better: by the programmer declaring their intentions, the compiler can complain if it finds a type for a function that contradicts the programmer's intention, helping to locate problems.

Either way, when it comes to generating code, the type of every value is generally either known or restricted to a small set of types which the implementation carefully ensures can be treated as one type within the scope in question (eg, by using a virtual method table). This makes for efficient code (in general, the more information a code generator has, the more efficient its output will be) and means that there is reduced scope for run-time errors; most type errors are picked up at compile time.

I say "most" type errors. Some still get through. For a start, the type system isn't necessarily as fine-grained as you'd like. In C one can say that a variable is an int, but can't specify that it's a percentage so must be in the range 0 to 100, so the compiler won't complain about mySuccessRate = 250;, or optimise if (mySuccessRate < 150) to always execute.

More interestingly, C, C++, and Java, amongst other similar languages, off the ability to explicitly cast values to another type. In C and C++ this can be a dangerous operation since you can use it to lie to the compiler, forcing it to consider a value as one type when it's really another, with implementation-dependent results. Java, however, stores type information with values, so the actual type of any arbitrary object can be deduced, which is impossible in C and optional in C++, so a Java cast may, at run time, throw a ClassCastException at run time if the provided value is not actually of a compatible type with the type we are casting to.

At the other end of the spectrum, we have entirely dynamically typed languages, where data structure members and variables are generally or always not assigned any particular type, and type is a purely run-time notion. This is where Lisp, Python, PHP, and so on operate (I exclude Perl from that since it has a form of partial syntactic typing, like BASIC used to, but Perl objects are still dynamically typed). This means that the compiler does no type checking whatsoever; so if you try and call a string-only operation on an integer, you get a run time error. This means that mistakes are only picked up when the bit of code containing the mistake is run (which may be long after the software is released, or only during an expensive and time-consuming unit test run, rather than upon every compilation), and that the code generator has to generate code that checks the types of arguments to all operations before performing them, which slows down the generated code.

However, it provides a lot of flexibility - every function and data structure is as polymorphic as it can be; many data structures will never do anything with their members other than regurgitate them, so they will accept values of any type. If a function only uses two of its arguments by comparing them with the = operator, then it will work for any two arguments that can be compared. Static type systems are sometimes limiting, particularly with respect to complex higher-order or recursive functions and structures, rejecting perfectly correct programs because inferring or expressing types for parts of the programs is beyond the capability of the type system.

One can take a hybrid approach, starting with a dynamic language then, at points the programmer deems important, putting in explicit type checks. (assert (number? x)). This is still a run-time check that gets generated, but it means that errors are detected earlier, closer to the point where the error actually is, which makes debugging easier and increases the chance of the bug being spotted in the first place.

Which is best? It's a matter of taste, and also depends on your circumstances. For a safety-critical system that must not fail in unexpected ways at run time, strict static up-front type checking at compile time is a very logical choice. Likewise, for performance-critical code, the lack of run-time type checks is very valuable. But restrictions on the flexibility of programming caused by type system restrictions, which often result in an explicit run-time cast that's either dangerous (undefined behaviour if the value is not compatible with the target type) or may raise a runtime error as in Java, make these approaches unattractive for rapid development environments.

Pages: 1 2

No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment

WordPress Themes

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales