Repeatedly Retreating Software Development

Repeatedly Retreating Software Development

I needed to add a characterize to our web application, that required making calls to a vendor’s far away web service. This call had to be made soon after the webapp user made a buy, inside the app – but didn’t need to happen right away.

In this situation, you don’t want to make that far away API call while you’re handling the user’s web request. For all sorts of reasons.

The “right” way to do this:

Set up a job queue, that will asynchronously make that web service request.

Your webapp can just push the task to this queue, which is quick and reliable, then quickly return the response to the customer. Don’t want to keep them waiting, after all, especially when they just gave you money.

This kind of job queue is a little complicate to set up, but not too bad. And I had four days. Plenty of time.

So I started doing it the “right way”… and long story short, it didn’t work.

I produced a new node that would great number Celery – a Pythonic task queue – backed by RabbitMQ. Plus another, unnamed tool, that wrapped this node in a nice microservice API.

Unnamed, because after two days of work, I discovered this tool liked to pin the CPU to 99% every hour or two. Until I manually restarted the service.

Turns out, this was a bug that upstream had been working on for a YEAR. I wasn’t going to solve it on my own, in the two days I had left.

I thought that in the long term, it was better to set up the separate job queue node. But I needed to get this characterize deployed, and decided to set up Celery alongside Django, on the same node the webserver was running on.

I figured that would be quick and easy. Because usually, it is.

Spoiler alert: it wasn’t.

In fact, after another complete day, I ran into ANOTHER show-stopper with that approach. I won’t already bore you explaining what it was – it was stupid and frustrating. But bottom line, I couldn’t do that, either.

So with only a few hours left… what did I do?

I set up an hourly “cron” job.

If you don’t know about Cron, it’s a facility baked into Linux (and Unix), that will run a script for you on a regular basis. It’s simple, rock-substantial, and I’ve used it a zillion times.

So I wrote a quick script that would check who made purchases recently, and make the API to that web service for each of them. And I told cron to run it once per hour.

Quick and easy. It’s been running fine for months now, doing what it needs to do. No problems.

At some point, I’ll go back and fix my first approach. It’s better if the task runs right after the buy, instead of an hour later, and we’ll ultimately need that separate task-queue node running for other things, anyway.

But I think this story illustrates some real-world, functional aspects of writing software.

Sometimes we have to make tactical retreats. Because getting an imperfect solution deployed is, often, far better than a perfect solution late. And being willing to recognize when something is at risk of not working out, and we need to change our approach.

leave your comment