Solve problem for dynamic Elixir cluster - ClusterHelper
Author: Manh Vu
Published: 2025-03-23

Intro

I’m deploying a cluster of Elixir with 5 to 7 nodes (dev env, in prod each node is a deployment and have 1 or more replicas) on Kubernetes(K8s), our system is using EasyRpc (a :erpc wrapper) in internal Elixir cluster for save time to develop. It’s perfect for case static/fixed cluster but hard for case run a Elixir cluster(dynamic cluster) on K8s.

Why it so hard?

K8s is designed for dynamic things like web service (usually, stateless & not join a cluster like Elixir cluster). IP and hostname of Pod is dynamic it’s changed every time pod is restarted an node name of Elixir will be changed follow. It’s not match with our system. Of course, we can workaround by using headless service or using Gossip strategy of :libcluster then using prefix of app name for check it in runtime but it’s more complicated and sound not good for thing work well with distributed system like Elixir.

Of course, we can use other thing like broker or bus message or wrap to gRPC/Rest API but that make more complicated for development & deployment. We want to use rpc of Elixir for fast develop (benefit from dynamic type language).

Our way

We made a library has name ClusterHelper to map role to Elixir node name in runtime. If a node join to cluster it will auto update all roles of that node for other nodes in cluster. The library run in every Elixir node in our cluster, now we can lookup Elixir node name by role. Each role has one or more nodes depended in our scale strategy.

In our system, a node can have one or several roles and a role can have several nodes. Information about roles & nodes are auto synced & cached in every node.

Based on role can bring for us some benefits like scale in/out easily, we don’t need to care about node name of Elixir and how to map it on K8s. It’s easy to scale a service by add more nodes for that service.

We have another library with name EasyRpc help us work smoothly with number of nodes have same a role.