Elixir Supervisor, a powerful thing can help dev & devops sleep well!
Author: Manh Vu
Published: 2024-06-29

Intro

I many years in my career, supervisor model is the best thing I wish other languages have. For example I have to work with Golang, it’s very good performance but still missed a important thing that is a way to control goroutines, a lot of mistake from me & my colleagues make service down and we need to fix bugs asap (some bugs don’t need to fast fix but it is affect to a other running tasks then we still need to fix asap). I also make a library has name easyworker for working with goroutines better. But totally, I have no way to handle issues from third party. It is limitation of Golang.

However, Elixir is a different thing. We can control very well process (like goroutines in Golang). We go through supervisor model for more details.

supervisor in Elixir is simple and work well for almost cases. We will go through every kind of supervisors in Elixir: Supervisor, DynamicSupervisor, PartitionSupervisor.

For people from other languages I explain a lit of bit before dive deep to Elixir supervisor.

Example: People who work in a factory, a production line can stop for worker can’t work in any reason. For solve that problem, factory alway have a back up workers for case if a worker can’t work, other worker can go to replace that worker. In runtime, supervisor’s monitoring all workers for case worker is ill or tired, supervisor will call other worker to support or replace. In the big factory we can have many supervisors and we have higher level supervisor for monitoring them.

In Elixir we have a same model for solving the problem (with much more options). In basic, we have a process is supervisor to monitor worker processes (or other supervisor). If any worker was die (crash, finish task,…) supervisor will receive a signal and desire to restart or not, depend to strategy & configs. Supervisor all so can kill any worker if needed.

Elixir has three kind of supervisors: Supervisor, DynamicSupervisor & PartitionSupervisor.

For way to add children, we have two type of supervisor: One for add children at app start time (fix/static type), one for add children in runtime (dynamic type).

Fix/Static supervisor

For static type we have Supervisor. We need to declare and add to application’s supervisor (common way) and number of children is fix we cannot add more in runtime.

Every Elixir application has one supervisor that ids called root supervisor and commonly we add child (worker or other supervisor) belong to application in here.

supervisor tree (basic supervisor tree, we have a supervisor_2 is child of supervisor_1)

Three main things we need to care is restart strategy of Supervisor, child spec for entry point function & arguments (supervisor will call when init child) & child restart option in config (will affect to supervisor restart strategy). Other configs we also need to care like: max restart time, intensive,…

Easy way to make a worker process is using GenServer I have explain in other post just declare callback and directly add to supervisor without care about child spec.

Supervisor is designed to wide use cases, then we need to understand some main things for better adaptation for us.

About Supervisor restart strategy we have 3 strategies: :one_for_one, :one_for_all & :rest_for_one.

:one_for_one strategy

Every child run independent, if one child is crashed it doesn’t affect to other children.

:one_for_one (only worker_2 are restarted)

:one_for_all

All children are linked together, if one child is crash other children will be restarted follow.

:one_for_all (all workers are restarted)

:rest_for_one

Remember all children in supervisor are declared in a list.

For this strategy, if one child is crashed, other children that declare behind crashed child in list will be restarted.

:rest_for_one (worker_2 & worker_3 are restarted, worker_1 still work normal)

Remember a supervisor can declare as child in other supervisor then we have a very flexible supervisor tree. Easy to group process for isolated task, specific task,…

now we go to simple example by use GenServer as worker.

we declare some code of worker like:

defmodule DemoEctoForm.Ets do
  use GenServer, restart: :transient

  ### Public API ###

  def start_link(_) do
    GenServer.start_link(__MODULE__, :ok, name: __MODULE__)
  end
end

# ...

This module implement a GenSever and add a restart config for child spec is :transient. Module has a start_link/1 function that ignore params (option/config) and call to start_link of GenServer with :name is name of module that means the supervisor (application) can has only one worker of this module.

at application module of app we declare:

defmodule DemoEctoForm.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  @impl true
  def start(_type, _args) do
    children = [
      DemoEctoForm.Ets,
      # ...
    ]

    opts = [strategy: :one_for_one, name: DemoEctoForm.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

At start function of module we see just add name of GenServer DemoEctoForm.Ets to child. Just because GenServer take care and generate child spec for us.

At opts variable we see :strategy is :one_for_one and name of supervisor is DemoEctoForm.Supervisor.

Full source code of app in here

Dynamic supervisor

In this type of supervisor we have two supervisor DynamicSupervisor & PartitionSupervisor

DynamicSupervisor support to add child at runtime (on demand) through start_link/1. This type of supervisor support us can add dynamic task (process) depend input. But downside of this type is missed :one_for_all & :rest_for_one strategy.

For using DynamicSupervisor we need to add to root supervisor of app or other supervisor.

Example add DynamicSupervisor to root supervisor:

children = [
  {DynamicSupervisor, name: MyApp.DynamicSupervisor, strategy: :one_for_one}
]

Supervisor.start_link(children, strategy: :one_for_one)

(Add a DynamicSupervisor has name MyApp.DynamicSupervisor, code from Elixir docs)

Add a runtime when has new event we can add new worker like:

{:ok, counter1} = DynamicSupervisor.start_child(MyApp.DynamicSupervisor, {Counter, 0})

The first param is name of supervisor, second is child spec. We can add many children as we need but remember a thing, DynamicSupervisor is use only one process for add child then it can be bottleneck in case we add children too fast (for this case we need use PartitionSupervisor or self made supervisor).

DynamicSupervisor has some other basic functions for managing children like: terminate_child for kill a child, count_children for get number of children in supervisor, which_children for get info of all children.

For more flexible for using DynamicSupervisor we can use linked process for grouping children or simply using more DynamicSupervisor for grouping children.

For fix bottleneck of DynamicSupervisor we go to using PartitionSupervisor.

PartitionSupervisor is one of different to original Erlang supervisor. It uses number of processes (can config, default is equal to number of schedulers of Erlang VM). Each process is a partition and when add new child supervisor uses key for calculating partition child will go to start on.

Because PartitionSupervisor using multi processes for start child then we can add parallel a lot of children. But remember, performance is still limited by time to init child. If you want to the best performance you need design & optimize init time of child (worker).

For example using PartitionSupervisor it quite simple, at a root supervisor or your supervisor add add a declare like:

  def start(_type, _args) do
    children = [
      {PartitionSupervisor,
       child_spec: DynamicSupervisor,
       name: LocationSimulator.DynamicSupervisor}
    ]

    opts = [strategy: :one_for_one, name: LocationSimulator.Supervisor]
    Supervisor.start_link(children, opts)
  end

As we see, PartitionSupervisor using DynamicSupervisor for child spec & api.

For add a child to PartitionSupervisor we use DynamicSupervisor with an extra param like:

 key = :rand.uniform(1_000)

 DynamicSupervisor.start_child(
   {:via, PartitionSupervisor, 
    {LocationSimulator.DynamicSupervisor, key}},
   spec
 )  

In this code, we random a integer for key then supervisor will select random partition for starting child.

Another benefit of using supervisor that is easy to grateful or force shutdown (brutal kill) in order (a painful for me when I work with Golang).

Another dynamic supervisor is Task.Supervisor but it’s specific for Task & I have a post about this.

Now we have a concept about supervisor in Elixir, apply it to our code then we can sleep well!

Some recommended docs: