Snell-Pym

The implementation of Web applications (by alaric)

I first started Web app development in PHP in 1998. Although PHP as a programming language has many, many, shortfalls, the fundamental model - take an HTML file, change its file extension to bring it to the attention of the PHP module, then stick bits of code in where needed - was great... for the kinds of pages that are the results of simple GET requests; idempotent data-gathering. Code that's purely functional, at least macroscopically.

However, once you started bringing forms beyond search boxes into the mix, things started to go downhill. This first struck me when I had to develop a series of pages that allowed people to register domain names. At the time, this required gathering four sets of contact details (legal registrant, administrative contact, billing contact, and technical contact), along with some technical details. Since, most of the time, all of these sets of contact details would be one and the same, it was decided that we'd start off with a page with a form for the legal registrant's details, and this would have a "Next" button leading to a page with a form for the administrative contact's details, plus a button that would invoke Javascript to fill the form with the legal registrant's details so they could be submitted as-is or modified slightly (perhaps a different person's name, at the same company and postal address), then submitted with a "Next" button that led to the next set of contact details, this time with buttons to prefill with the legal registrant's details or, if they differed, the administrative contact's details. And so on.

And, of course, there was validation; any of these "Next" buttons might well instead bring you back to the page you just came from, with an error flagged, rather than to the next page.

Pages: 1 2 3 4

Computing | alaric | Sun 17th Dec 2006 2:37 am

6 Comments

By Ben, Sun 17th Dec 2006 @ 10:25 am

Continuations have already been used in web app frameworks. This article...

http://www-128.ibm.com/developerworks/java/library/j-cb03216/index.html

... has a nice list of frameworks at the bottom. I think Jetty is the most famous continuation based web app framework.

I disagree that web app frameworks have got things the wrong way round. You are writing an application which happens to be delivered over a web browser, not a web site which happens to have lots of code.

I suspect your view is formed by the nature of a lot of your work, where a bit of code has to be added to a bunch of marketting visuals and text. This is not what the frameworks were designed for -- the assumption was that there would be more code than fluff.
By alaric, Sun 17th Dec 2006 @ 1:45 pm

The way Jetty have added continuations, however, is a bit clunky to say the least... when your code hits the suspend point, IIRC, it throws a special exception to terminate the handler but keep the HTTP response pending; when the event comes in to resume the request, the handler is invoked again but is informed that it is dealing with a request already in progress. As I understand it, it was designed for the particular case of AJAX event getters, those GET requests which return a pending event if there is one, or else wait without replying until one comes. A central event dispatcher can find all requests 'waiting' on that event, put it in their local queues, and awaken them.

As for the code vs. fluff dispute - well, the particular case I had in mind was UpMyStreet.com.

UMS (for those who don't know it) was (and kind of still is, but it's nothing like it used to be) a site where one could enter a location - a postcode, part of a postcode (just SW7, for example) or a town name or a region name (London or Camden, for example), and see lots of information about that area: council tax rates, crime rates, property prices, the socio-economic grouping used by insurance companies and credit agencies, etc.

Originally, this was implemented with a common include file which defined the Location class, an instance of which was fed the location string supplied in the URL and then proceeded to compute assorted general bits of metadata. Each information page then contained code which would use the tools in the Location class to extract the appropriate data from the database and slot the results into the HTML.

However, the structure of the site was changing towards having more, smaller, pages. And in order to deal with the fact that the programmers were having a lot of simple work to do with rearranging the pages for design reasons, we moved to an architecture where every page included a common header file that set up the Location class and automatically created an instance of it based upon the URL parameters, then neatly wrapped the data-getters into a suite of easy to use library functions. We then taught the HTML designers enough PHP to:
1. Reference include files, which we showed worked just like Server Side Includes, so he could reuse headers and other components as he saw fit
2. Start every page with the "magic text" that included the common header file for pages that showed data about a location (which set up a Location instance), or not in the case of an About page.
3. A dictionary of magic words that could be used to extract and format information. For example, one might want to get the name of the local council covering the location, and the band D tax rate. To do this, one needs to consult the council module (which gets a whole lot of information, since it's all in the same row of the table) and then extract the desired data, and format it appropriately, something like:
  
  < ?php $council = load_council (); ?> ...Your local council is < ?php echo $council->name ?> and the band D tax rate is £< ?php number($council->tax_d, 2) ?> per annum...
We had a few simple functions like number which would format a number nicely with commas and the given number of decimal places. The load_* functions would access the global location instance, and spit out an object with fields that could easily be accessed (with strings already html entity escaped, for instance) from template code.

Having done this, the programmers didn't need to care what went on what page, generally. The HTML designers could make any page extract whatever data they wanted, and display it as they wanted. They could add and remove pages willy-nilly, which under the controller-based models of rails et al would need a programmer.

With a decent object store, many of the 'data getter functions' wouldn't need writing, either, if the HTML developer was just given read access to the store - most of the UMS ones really just encapsulated an SQL query, appropriately setting it up then processing the results.

Perhaps one area in which controller-based structure is the way to go is when you have form posts that do something; the above code is all idempotent data extraction. However, I have a theory that having HTTP POSTs return content is a bad design pattern. For a start, it results in a page that has no URL - if you have a page that came from a POST in your browser, you can't bookmark the URL and expect to get that page back if you go back to the URL. What I think should be done instead is to have POSTs go to handlers that are entirely programmer-written code to perform the action, which then return with a redirect to a URL that, when GETed, shows the appropriate result page.

In that case, the POST handler might be considered as a kind of controller (and it's far closer to the original meaning of 'controller' in the MVC model, IMHO). But it'd only be in the loop for the particular case of a POST form, and even then, only loosely bound (by returning a URL to GET) to the resulting view.

Now, what are most sites like? I think the average public Web site is mainly pages that display information, which I think are most easily handled by giving the HTML developers a template language that can access a library of data accessors rather than bothering programmers with. The POSTs tend to be fewer and further between; adding something to your basket, submitting a message, checking out, or the aforementioned awkward case of a 'dialogue' to perform some complex interactive process. All of which, I surmise, can be neatly handled within an overall site structure that remains in the hands of the HTML designers, with the programmers providing either simple POST handler 'controllers' that redirect to a results page URL afterwards, or POSTing or GET-linking to the start pages of dialogue processes.
By alaric, Sun 17th Dec 2006 @ 5:24 pm

I think Seaside (from Ben's link) is the closest to how I'd handle my 'dialogues' - but not normal idempotent view pages with links between them. They're best done statelessly. I'd like seamless integration between stateful dialogues, view pages with GET params, simple POST forms with controllers that return redirects, AJAXy edit pages with non-JS fallback POST handling, and any other models that are applicable to parts of a site. In other words, a general architecture that brings them all together; and I think this can be done with Apache modules that handle 'special files' in a public_html directory, be those files page templates that can access library modules, dialogue 'scripts', 'controller scripts' that accept a POST and return a redirect (or an error...), static resources, etc.

The one thing that sucks about that, of course, is that your URLs expose your technology: how many URLs are of the form "view.php?id=123"? I'd tweak the Apache filename matching slightly so that if the file named in the URL does not exist, but a file with the same name but with an extension appended matching a script or template handler DOES exist, then that file is used. So if one had a template language called "bob" one would store HTML templates in files ending ".html.bob" and access them with URLs ending in ".html". Or in files ending with ".bob" and access them with URLs with no extension on. These decisions should be cached by the Web server, to avoid having to do lots of stat()ing on every request.

Since POST controller scripts return a redirect, they could use absolutely any template language you had installed on your system to generate their result, which nicely decouples things. And the sending of pages for the dialogue system could be implemented by creating something like an Apache virtual request to generate the output page, nominating a page template in any language and just passing make-believe GET parameters to it.

That way, you'd have a toolkit of different structures for different parts of a site that can be connected together freely and openly.

I'd wondered about serialising continuations and putting them in hidden form fields rather than burdening the server with sleeping processes, which would make the server properly stateless (albeit opening some cans of worms regarding state external to the application itself, so not encapsulated in the continuation - external databases, for example) - but I felt that a lot of continuations could get rather large, putting a big strain on the network and client, and there's be mobility issues with changing class versions etc. on the server. Seaside, to my surprise, stores the state of the handler when it's suspended and gives the client an ID referring to that state, but when it's resumed again keeps the handler state while starting off with a copy of it, so one can use the Back button to return to previous states! Although this means it acheives that without sending huge continuations across the network, it does mean that every time you send an HTTP request, a new continuation is created on the server and kept until it times out, which will take up an awful lot of RAM 🙁 My own research into using continuations, rather than just suspending a process on the server, revolved around lightweight continuations; a lot of the environment in each continuation will be identical (eg, the innards of the request handling parts of the framework will be at the top of every stack), so representing continuations as "diffs" against a standard template, making sure the dispatchy bits of the framework use tail calls (which don't take up stack space, being a JUMP rather than a CALL), or having the framework start off the user application handler in a 'clean thread' with a very minimal stack, so the framework's state is not part of the continuation.
By alaric, Sun 17th Dec 2006 @ 5:35 pm

Ooh, a vaguely related point I might as well stick in here (just showing that this is a 'stream of ideas' post rather than one I've carefully thought out in advance) is that pages (even the idempotent ones) should really have a header in the template or script or whatever that defines the parameters, and gives them types.

This has a few uses.
1. It allows for strict type checking. Any unexpected parameters can be ignored by the template, making sure the template developer remembers to declare his parameters. Any parameters that should be integers but aren't can cause a nice 'invalid request' error to be sent back. This sort of stuff helps to prevent more unpleasant error messages occuring later in the code.
2. It helps documentation. I've seen too many scripts in various Web app languages where it's hard to tell what all the legal params to a complex view page are, because they're dotted around in the source code. When writing PHP pages with complex parameters, I tend to end up putting a big comment at the top documenting them, since if you think of the page as an RPC, they're the arguments to it; its API.
3. It enables automatic preprocessing of parameters, for convenience. Integer parameters can be parsed as a base 10 integer and made available to the script or template as an actual integer, not a string. More usefully, parameters which are IDs of objects in the object database can be referenced in the DB; if they don't exist, the page can return a 404 rather than a confusing error later in the code. And the script or template can be passed the resulting object, rather than the ID.
4. It makes it easy to have useful functions such as "give me a URL to this page but with one of the parameters changed", meaning that in a complex page with many params, you can easily have a link back to the same page with, for example, the next 10 results displayed, but the search query remaining the same.
This will all sound familiar to Ben - he had lovely declared page parameters in his very own Web app framework, and they worked well.
By Ben, Sun 17th Dec 2006 @ 8:48 pm

Yes, URLs need need explicit typed parameters. I dislike RoR's style; it starts off well with /controller/action/id but then after that it just uses key=value parameters. And nothing is typed or parsed automatically for you. I may end up rewriting the 'routing' to do something about it.

On continuation, it doesn't necessarily have to be implemented as a proper continuation. Another model is a state machine with data. This could be handled by some funky pre-processor or compiler support.
By alaric, Wed 24th Jan 2007 @ 11:48 am

I've since found out just how terribly badly most web application frameworks handle HTTP cache control headers, so have written a post about that which gives more ideas on what a decent framework needs: http://snell-pym.org.uk/archives/2007/01/24/http-caching/

Sarah and Alaric Snell-Pym living in interesting times

The implementation of Web applications (by alaric)

6 Comments

Other Links to this Post

Leave a comment

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta

Snell-Pym

Sarah and Alaric Snell-Pym living in interesting times

The implementation of Web applications (by alaric)

6 Comments

Other Links to this Post

Leave a comment

Subscribe

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta