Hatchet Managed Compute

Hatchet Compute is a cloud runtime for async tasks, specifically focused on use-cases that make AI apps faster and more reliable. We provision long-running machines with no timeouts, while preserving the operational benefits of serverless infrastructure, like easily scaling to many machines or back down to 0 during periods of inactivity.

Hatchet Compute is enabled for all Hatchet Cloud instances by default from the Managed Compute tab in the dashboard:

You can test it out today by deploying a managed worker template without needing to link a credit card.

Why did we build this?

Until now, our users have been using Hatchet as a managed queue — we queue and invoke your tasks on workers which are running on your infrastructure (like AWS ECS, GCP Cloud Run, a Kubernetes cluster). Hundreds of developers use Hatchet to offload critical, near-real-time work to a background process, complete with fair queueing, retries and error handling, and scatter/gather workflows. For example, Ellipsis uses Hatchet to perform AI code reviews, fix bugs, and automate standup updates, processing over 1.5 million requests per month.

But we discovered that while there are many platforms which are great for running web services, they are optimized for short lived ephemeral requests. These platforms often treat workers designed for long-lived work as an afterthought (a toggle in a settings page with minimally-documented behavior).

This is particularly important in the era of AI agents, where workers may use a combination of local inference and remote LLM providers, and many workloads are resource intensive. Among the issues we've seen are:

  • Using serverless runtimes like AWS Lambda or Vercel functions and getting hit with function timeouts, cold starts, DB connection issues, or being unable to cache files to disk
  • Tasks getting interrupted due to worker re-deployments
  • Difficulty provisioning different classes of compute for different parts of the workload - for example, a CPU-intensive video processing step followed by a local inference step

The solution — Hatchet Compute

Hatchet compute is a better runtime for async tasks in a few ways:

  1. Long-lived workers

    Each worker in Hatchet is long-lived, meaning it can run many functions at the same time. There are no timeouts on these workers, but Hatchet will scale down workers during periods of inactivity by stopping them. Stopped workers will still be able to access the same filesystem when they start up again, and because they are meant to run many functions at once, you can easily pool connections or re-use caches within the same worker.

    And with our worker affinity feature you can choose a preference for your workloads to run on certain machines, for example those which have a certain model loaded into vRAM.

  2. Fully integrated with your queue

    One of the most common use-cases we hear is folks wanting to scale on queue depth. Because we also manage the queue, we're building this functionality directly into our managed workers — when there are no items in the queue, workers can scale down to a minimum replica count that you've set. When there are too many items in the queue for your current worker pool to handle, we'll scale up your workers (up to a max that you've set).

Get started

Hatchet Compute is available for all Hatchet Cloud users. To get started, simply log in to your Hatchet Cloud account and navigate to the Managed Compute tab in the dashboard.

We use cookies

We use cookies to ensure you get the best experience on our website. For more information on how we use cookies, please see our cookie policy.

By clicking "Accept", you agree to our use of cookies.
Learn more.