I want to introduce Heimdall: a dashboard for operating Flink jobs and deployments. I’ve been working on it for the last several weeks, and we’ve been using it in Goldsky to manage 100+ Flink deployments.
First of all, why is it needed? Doesn’t Flink come with a built-in UI? It does, and Heimdall doesn’t try to replace it. Flink UI is amazing for managing a single job. It can also work great at managing multiple jobs deployed on the Session mode cluster. However, nowadays, especially when using Kubernetes, most of the teams choose to deploy Flink as many standalone jobs in Application mode (as “services”). And when you’re running more than a handful of jobs, tracking them and navigating between them becomes challenging.
Heimdall is still in its infancy - it’s a read-only application with some very basic functionality (mostly on the front-end) and it currently only supports jobs deployed with the Flink Kubernetes Operator. But I believe that, with time, it can be improved and turned into a full-fledged control plane. And even in its current form, it can be extremely useful and save a lot of time.
- Each job is displayed with its name, status, JobManager and TaskManager resources, start time, parallelism, Flink version and Docker image version (some of these can be hidden).
- Flink jobs can be searched by name, filtered using the status.
- Flink jobs can be sorted by name, start time and resources (replica count).
- Flink jobs are automatically refreshed (the interval is configurable).
- Four standard endpoints are available for each job: Flink UI, Flink API, Metrics and Logs (all configurable).
All options can be found here. In general, you don’t need to do much.
HEIMDALL_JOBLOCATOR_K8S_OPERATOR_NAMESPACE_TO_WATCHis needed to specify the Kubernetes namespace to watch (no need to configure it if you use
HEIMDALL_PATTERNS_DISPLAY_NAMEcan be used to modify the displayed job name using the metadata. Metadata is currently obtained from the Kubernetes labels. For example, at Goldsky, we use the following pattern:
HEIMDALL_ENDPOINT_PATH_PATTERNS_*are four variables that can be used to configure endpoints for accessing Flink UI, Flink API, metrics and logs for each Flink job.
$jobNameis replaced with the actual Flink job name for every row. Every company may deploy Flink differently when it comes to networking; every company may have a different observability tool. So instead of trying to support every way, Heimdall simply exposes a set of URL patterns in the config.
- I love Norse mythology.
- [Heimdall] is attested as possessing foreknowledge and keen senses, particularly eyesight and hearing - sounds like a great fit for what this project is trying to achieve 🙂.
Please provide feedback! Something’s not working? Have an idea? Feel free to create an issue.