🪓 Launching our v1 Engine

March 24th, 2025

For the past several months, we've been working on a complete rewrite of the Hatchet queue with a focus on performance and a set of feature requests which weren't possible in the v0 architecture. The v1 engine has been available in preview for several weeks, and is currently running over 200 million tasks/month on Hatchet Cloud.

Migration Guide

To upgrade to the v1 engine, please see our migration guide in our docs.

In the rest of this document, we'll cover the major architectural improvements that we made to the Hatchet engine -- while you don't need to worry about these as a Hatchet user, perhaps they'll be interesting to other developers who are working on similar Postgres scaling problems.

TL;DR - nearly every usage pattern of Hatchet is faster and more efficient in v1, including usage of concurrency keys, high-throughput workflows, and long-running workflows. We've seen a 30% reduction in latency, a 10x increase in throughput, and a 5x reduction in CPU and IOPs load on the database.

Throughput Improvements

One of the main bottlenecks in the previous Hatchet engine was throughput -- we could only handle about 1k tasks/second on the ingestion side, even on a relatively large database, even though we could queue at a much faster rate once the work was ingested into the system.

Hatchet is a durable task queue built on Postgres, which means that every result and intermediate task event needs to be persisted to the database in a transactionally-safe manner. One of the main ways to increase throughput in Postgres is to perform batch inserts of data, which can considerably cut down on transaction overhead and round trips to the database.

While the v0 engine was utilizing batch inserts on a few high-volume tables, the v1 engine now uses batch inserts on almost every table in the system. This has allowed us to increase throughput by an order of magnitude -- we're now able to queue up to 10k tasks/second on a single Hatchet instance, and we've been able to sustain that rate over a 24 hour period.

However, throughput and latency are always a tradeoff -- so how did the Hatchet engine get faster as as well? The answer comes down to dynamic buffering...

Latency Improvements

The v1 engine is able to achieve a 30% reduction in latency, from 30ms -> 20ms average queue time. The main reason for this is that we run many buffers in parallel in engine memory which are flushed on an interval.

At a relatively low volume, we'd like to write to the database with only one item in the buffer, because we're not getting a major performance benefit from batched inserts on the database. So when the first tasks enter the buffers, we flush them as soon as there's 1 task in the buffer.

After we've flushed a buffer, we wait a minimum of 10ms before flushing out of the buffer again. This means that at a low volume of 100 tasks/s, we're still flushing to the database immediately. At a higher volume, we're batching tasks in the buffers before flushing them to the database.

This naturally isn't the full story -- if we receive too many tasks, we'd like our buffers to exert some sort of backpressure on the caller. So there's an internal saturation threshold on the buffers, where callers will be blocked from adding more tasks to the buffer until the buffer is flushed.

It turns out this strategy is extremely effective at providing fast queueing at relatively low volume, but allowing queue duration to be constant at a much higher volume!

Deletion Improvements

One of the main bottlenecks in the v0 engine was deletion of old data. Not only did large Hatchet instances get bloated on disk quickly, but deletion operations would commonly time out. The reason for this is that we were relying on UUIDs as the primary key for most tables, whose indices are randomly distributed across pages in the database.

In the v1 engine, we've added table partitioning by range to almost every large table in the system. This has allowed us to drop old data much more efficiently, because we can simply delete the table partition which contains the old data.

Other Improvements

We've also made a number of other improvements to the engine, including:

We changed our high-write and high-churn table definitions to use identity columns and tuned autovacuum settings on a per-table basis
We're now using separate tables to implement the actual queue vs listing workflow runs for usage by the API. This allowed us to remove almost all indexes on the queue which speeds up queueing significantly
We process status updates for tasks in a deferred fashion, using tables which are hash-partitioned by the workflow run id. This was one of the most difficult parts of the rewrite -- turns out computing task statuses is a surprisingly difficult problem when task events can arrive out-of-order. We actually were exploring the use of Clickhouse for storing task events and rendering task status, but it didn't perform the way that we were expecting.
A single task implements a state machine, and the state machine transitions are now using Postgres triggers. Using triggers keeps most updates and processing on the database server which increases performance.

Get Started with the v1 Engine

The v1 engine is available on Hatchet Cloud today. To get started, simply create a new Hatchet Cloud account and start queuing tasks. If you're an existing Hatchet user or are self-hosting, please see our migration guide to get started.