TL;DR - we're kicking off our second launch week with a major shift in how you inspect and debug Hatchet tasks and workflows! We've gone all-in on support for OpenTelemetry: the Hatchet waterfall view is completely refactored and otel-native, and you can now instrument your own spans and send them to Hatchet for contextual debugging. You can try this today on Hatchet Cloud or run the latest version of Hatchet locally (as always, these changes are bundled into the open-source offering).
Over the past 2 years, we've helped Hatchet users debug everything from blocked event loops to runaway CPU usage to esoteric application errors. One of the tools we nearly always recommend (and reach for ourselves) is OpenTelemetry; simply instrument your application and send it to your observability vendor of choice.
But this can still be tricky: lots of teams are relying on Hatchet as the primary source of truth for application state and health, and needing to dig through two separate platforms to understand what happened in a task can be cumbersome. So instead of recommending that you set up (yet another) third-party vendor, we're bringing otel support into the Hatchet platform.
When you use Hatchet Cloud, or you use the self-hosted version and set SERVER_OBSERVABILITY_ENABLED=true, the Hatchet engine will emit spans that provide engine-side information about your tasks, such as when they were queued, started, and failed. The default spans provide useful context:

You can choose to augment these spans with your own application-side spans, which provide additional information about the state of the task as it was running. For example, you can emit spans for tracking timing of database queries, external API calls, and additional context for application errors.
To set up application-side spans, you can simply add the Hatchet instrumentor to your Hatchet client:
You can read more here: https://docs.hatchet.run/v1/opentelemetry
When a task fails, the engine will automatically emit an error span with the stack trace and error message. This means that you can see exactly where and why a task failed. You can also choose to set your own otel status codes for additional context:

Hatchet will automatically link spans from a spawned child task to its parent, so that for a parent task with many subtasks is easily debuggable from a single view.
This is particularly useful for scatter-gather tasks or AI agents:

You can check out the docs here, or try it today on Hatchet Cloud.