Snell-Pym

Cool things I have worked on: Customer Relationship Management (by alaric)

The java server was a database bottleneck

So, the core function of a CRM is to manage interactions with customers, and in our system those fell into three groups:

Sending emails, which usually happened as a big batch operation: the user would, via the Java desktop app, select a subset of their customer base, enter an email message, and ask for it to be sent to that subset.
Receiving emails: replies to messages we sent, bounces, incoming emails from customers - they were either forwarded on to a human, or processed automatically.
HTTP events. If users clicked links in the emails, the link came to our HTTP server that would redirect them on to the actual link. There were also tracking images in the emails so we knew if people had opened the HTML version of them, and more HTTP events that I added later.

These actions were handled by separate Java processes that ran on dedicated servers and called into the central server via RMI, using service accounts set up for them. And all of these operations logged what happened - the CRM being all about recording interactions with customers - so a large fraction of the RMI methods being invoked were the methods that logged events. And a small fraction, but with the largest total run time, was the method the email service called to request the list of people to send an email to - the underlying SQL query didn't take much time in the JDBC stats, but creating a hundred thousand Java objects from the resulting SQL query's result set and serialising them to reply over JDBC sure did, and tended to make the server start to eat swap space too.

Well, both of these were easily fixed. I gave the email and HTTP services direct access to the PostgreSQL database, and let them insert entries directly into the log tables, and had the email server query the customer table directly, streaming customer objects from the DB in the same loop that composed the messages to each.

Batching log inserts

But, when you knock down one bottleneck, the next one just rears its head as load increases, so before long I was facing the database being slowed by committing thousands of tiny transactions per second due to individual inserts into the logging tables from the email and HTTP services. So I made a logging wrapper that both shared, which streamed log events into a file on the local disk, and a thread that would pick up that file once per second (starting a new file for subsequent events to go into) and stream the events into the database in a single transaction. This also meant that if the database went down the services could still work (if they didn't need the database for anything else), and would just continue to queue log events on disk and replay them into the DB later. Of course, I also added code that, on startup, would identify event spool files that hadn't been sent to the DB yet, if the server crashed, and send them over.

Parallel sending

As volumes grew, having a single thread in the email service performing any one message send became a problem, too - so I made the system parallelise large sends, by running multiple threads each with a slight variation on the SQL query used to select the customers to send to: effectively, ... and customerID mod NUMBER_OF_THREADS = INDEX_OF_THIS_THREAD.

Restructuring the customer database

The next problem we hit was with importing customer lists. Most of our users had a customer list in their e-commerce system, that they initially imported into our system and then periodically re-imported to update our system. Our customer database had a bunch of core fields the system needed to manage them - our ID, an ID from the upstream customer database if it has one, name, email address, whether they had consented to receive email, etc - and then a bunch of "user-defined fields"; usually stuff like how much they'd spent that quarter. The database structure we stored this in was a Customers table with the core fields in, and a bunch of tables with names something like IntegerColumn that combined a customer ID, a UDF ID, and an integer value.

So the import process for each row in the input CSV consisted of:

Checking for an existing record, based on the upstream customer ID if one was present, or the email address if not.
If one was found, reading it into memory, and then reading all its UDFs from the ...Column tables.
Either updating the in-memory customer record with the incoming data, or creating a new customer record if there wasn't an existing one.
If updating an existing customer, deleting all their ...Column records, updating the Customers record, and inserting new ...Column records for the current UDF values.
If creating a new customer, inserting into the Customers table and all the ...Column tables.

This was a lot of SQL queries, and the latency on each was being a bottleneck. So we ditched the ...Column tables and made our own lightweight serialisation format for a mapping from integer UDF IDs to values of each supported type (date, boolean, integer, string - from memory). We then stored all of a customer's UDFs in a udfs binary field in the Customers table, meaning that we could add or update a customer with a single INSERT or UPDATE and didn't need to join onto or separately query the ...Column tables when reading customer data, which sped the import process up greatly, and reduced the load on the PostgreSQL server.

However, we needed to use those UDF values in the WHERE clauses of the SQL statements that selected customers to send an email to, and in the SQL statements used to browse the customer database via the Java desktop client. We could have done the filtering of records in Java, but that would have wasted a lot of CPU and RAM (and network bandwidth). So we wrote a bunch of simple functions in C to extract values from our binary format, and wrapped them in a PostgreSQL extension that we loaded as a shared library, so our queries could look like ...AND udf_extract_integer(Customer.udfs, ?) = ?; the UDF ID of the desired integer UDF and the desired value being interpolated when the statement was prepared. This let PostgreSQL do the filtering for us deep in its innards, and avoided burdening anything else in the pipeline with unwanted records.

But why?

Now, dear reader, you may be marvelling at this tale so far. Those of you who have worked in the Interenet software industry will probably have found that performance is rarely a concern compared to rushing features out of the door; I have just described many person-months of work that add no new features to a system, but make it faster. What the hell?

Well, this was all happening around the early-2000s dotcom crash. I'd joined a company of about twenty people working from a nice office in Chelsea; before long, all but two of us had been made redundant and we were working from home so we didn't need to pay for that office any more. I was responsible for software development, systems administration, going into the datacentre to rotate backups, manning the customer support phone line and email address, writing documentation, and going to customer sites to train them. But, we had a product that people wanted and demand was growing, so I had to support an ever-increasing customer base without any budget for new servers. Our customers were happy with our feature set, but not happy with performance! So I continued to tear out generic components and replace them with hand-written ones optimised to our use case, to wring a bit more performance out of our hardware...

Adding features

However, I did find some time to add features too! A popular request from users was the addition of a "forward to a friend" feature. Apparently, when they sent out emails with some fancy offer or deal in, sometimes people forwarded these emails to other people they thought might be interested - and if they then clicked the links, we couldn't tell that they weren't the original recipient of the message.

Now, I can't really imagine somebody getting junk mail from somebody they've bought something from and going "Wow, what a great deal, I need to forward this to my mate Pete", but apparently it was a problem, and the proposed solution was to add a form to the email saying "Want to tell your friend about this great deal? Put their email address here and we'll send them a copy". A few of our competitors had support for this.

You can put a form in an HTML email, and when submitted it goes to an HTTP URL; so to make this happen we needed a new interface on the HTTP service that would accept these posts, find an existing or create a new customer record so they had an ID (and mark them as just a 'friend' if created anew so they didn't start getting regular customer emails), and schedule a send of the email to them. However, that HTTP form post handler also needed to return some content to the user's browser; most of our competitors at the time returned a generic "Thanks!" HTML page, but that looked nasty to me.

So I decided to solve this problem by solving a somewhat larger problem. I created a new concept in the system of a "Website"; customers, through the Java desktop add, could create websites and upload arbitrary content to be served. The HTTP service would accept URLs of the form .../hash/Wid/filename and look up the file filename within the website with the given id and return its contents. However, if there were additional parts before that initial /Wid/ then they would engage additional processing. Perhaps the most important were /Cid/ that told the system a customer ID, which was automatically added when links were generated within a context that knew the customer ID (such as in a generated email, or when returning an HTTP response to a request that itself had a /Cid/) and /Sid/ that did a similar thing with an email send ID. These enabled us to log access to hosted sites in just the same way as we logged redirects to external links. Oh, and that hash was a hash of the rest of the URL plus a secret we kept on the server, so that we could check if a URL had been tampered with; people couldn't just make up their own URLs.

I should mention at this point that we had a personalisation engine used to personalise email content, which I also used to personalise content with text/* MIME types returned by the new website hosting service. This let you write stuff like ${firstName} and have it replaced by the user's first name, and it also supported things like ${#IF booleanFieldName}...${#ELSE}...${#END IF}. I add a ${#LINK <website ID> <filename>} command to generate links into web sites with the available context information included (and put some UI in the Java desktop client to let you easily get a link to a website file in this form, to paste into your content, rather than typing in the web site IDs by hand). There was already a bunch of infrastructure in place for this in the HTTP handler that displayed email content in your browser, used both to preview messages before sending and to provide a "View this message in your browser" links for customers.

But what was all this for? Well, I also added command tags that could be included in the URL, so the forward-to-a-friend form in an email could be structured like this:

<form method="post" action="${#LINK forward 12345 thankyou.html}">
  Email: <input type="text" name="email.1">
  Name: <input type="text" name="firstName.1">
  <input type="submit">
</form>

The link would then contain a forward command, as requested by the forward argument to ${#LINK ...} and look something like:

http://...hostname of web service.../...hash.../forward/C98976/S21234/W12345/thankyou.html

That forward tag would cause the "forward" handler to be passed the request, which would pick out the details from the POST to do the forward-to-a-friend action from, before then passing the request on to the normal content handler that would return the contents of thankyou.html after personalising it. Oh, and any HTTP form submission values such as email.1 above would be made available in the personalisation context as fields something like ${FORM:email.1} so you could say Thanks for forwarding this email to ${FORM:firstName.1} if you wanted.

This made for a forward-to-friend system that returned customer-supplied personalisable thank you pages, which was bleeding edge tech in the CRM email space at the time, but more importantly - it provided a framework I would build upon going forward.

The first obvious step was to provide commands such as update, which would update fields in the customer record based on the form submission; unsubscribe-ID to unsubscribe them from a campaign, subscribe-ID to subscribed to a campaign ("campaigns" being a subset of the customer based used to manage them), nomail to withdraw their email consent, etc. These were quickly seized by our more savvy customers to make forms to let people update their marketing preferences; and because the same personalisation language worked in emails and in hosted web sites, they could create those forms once and link to them from all their emails, and so on. And because you could put multiple command tags in a URL - they'd be processed from left to right - one could create forms that would update your preferences and subscribe you to two campaigns.

And we had some very creative customers. One of them was a charity that used this system to create a whole bunch of quizzes and processes that people could go through - not as a fundraising thing as you might expect, but actually organising a citizen science project furthering the goals of the charity itself. The user's progression through stages was handled by unsubscribing and subscribing them to campaigns for each stage, and their data was stored in UDFs.

Now, one of the advantages of me running the entire show was that I was very close to the users, and the person who was running this charity's campaigns was often in touch with me to ask for details of things - he was a hacker at heart, and wanted to push the system to its limits by doing things I'd not thought of! He wanted to be able to send the customers an email when they moved to a new stage, so I added a thing similar to the forward-to-a-friend command that triggered a send of a chosen email message (by ID) to the customer performing the command; this let him make the system instantly email the user as they progressed without having to manually perform a send to the campaign for each stage on each day.

Conclusions

None of this was groundbreaking technology from a computer science perspective, unlike the previous Cool Things in this series; but it was still an interesting experience I learnt a lot from. In every job since I have missed the direct connection with customers that I got from being software developer, technical architect, technical writer, trainer, and support desk - but I have certainly not missed not having the time to properly perform preventative maintenance on the rack full of servers, basically fighting fires whenever something broke!

But I think that having held all of those roles at once has been really useful to me. When I write server software, I can appreciate the needs of the people who have to run it in production, and I make sure I keep it simple (no unnecessary moving parts!) and observable (all the moving parts can be seen!). I always try and solve more general problems than the precise one at hand, as with the web site system; it's usually only a tiny little bit more work in the short term (the generic command infrastructure I wrote was, like, half a day's work?) but saves so much time going forward (building something that could support the charity citizen science system from scratch just wouldn't have been practical at all). Being able to actually talk to the customers and back-and-forth on their requirements and possible solutions is so much more efficient and satisfying for everyone concerned than having indirection through too many layers of people - at the analytical database job, feature requests came from sales people, who reported them to their boss, who reported them to our boss, who passed them down to us; it was hard enough decoding what we got to find that actual customer requirement, and there was no scope for discussion and dialogue with the end user. Although it's good to have a "product owner" / "stakeholder" representing the users, with whom the buck stops when it comes to making decisions between conflicting requirements, I think it's still incredibly useful to have the technical architects of a system talking directly to actual end users.

And finally, the process of scaling a system by instrumenting to find bottlenecks and eliminating them one by one was very rewarding. When working on database engines later in my career, I've followed the same process (and taken inspiration from the same techniques) to great success...

Sarah and Alaric Snell-Pym living in interesting times

Cool things I have worked on: Customer Relationship Management (by alaric)

The java server was a database bottleneck

Batching log inserts

Parallel sending

Restructuring the customer database

But why?

Adding features

Conclusions

No Comments

Leave a comment

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta

Snell-Pym

Sarah and Alaric Snell-Pym living in interesting times

Cool things I have worked on: Customer Relationship Management (by alaric)

The java server was a database bottleneck

Batching log inserts

Parallel sending

Restructuring the customer database

But why?

Adding features

Conclusions

No Comments

Leave a comment

Subscribe

Search

Categories

About Us

Ada Lovelace Day

Business

Family

Fictional Friends

Friends

Mind candy

Projects

The Salaric Blogs

Archives

Meta