Linking, Monitoring, and Supervising in Elixir
One of the benefits of microservices is that part of the system can go down without bringing the entire system down.
With Elixir, each process is in essence a microservice. It’s a small, isolated process that communicates with other processes via message passing, all orchestrated by the Erlang BEAM VM.
No memory is shared between processes, so the failure of one process is guaranteed to not effect other processes. But the key ability of Elixir isn’t just how processes work; it’s how they can be linked together, monitor one another, and use supervising functionality to determine what to do if a process fails.
In this article, we will touch on linking, monitoring, and supervisors, with an example of how to implement a simple caching GenServer that’s supervised by a supervisor.
Linking Processes
In Part I of this series, we looked at how to spawn a process and execute some code:
spawn(fn -> IO.puts "In #{inspect self()} process" end)
But if this process happens to fail for whatever reason, we’ll never know about it. It is completely isolated and won’t affect our current process at all.
spawn(fn -> IO.puts "Uh oh..." raise("I have failed you.") end) :timer.sleep(500) IO.puts "I'm done."
The reason our process that spawned this code wasn’t affected at all is because they aren’t linked. You’ll notice that it still printed the "I'm done."
message. Linking ties two or more processes together…if one process fails, so does the process that is linked to it. To begin a new linked process, you change the above code only slightly to call the spawn_link
function instead.
spawn_link(fn -> IO.puts "Uh oh..." raise("I have failed you.") end) :timer.sleep(500) IO.puts "I'm done."
Now that we have linked the process to our current (self()
) process, it won’t print the "I'm done."
message. When the linked process went down, so did our current one. Links are bidirectional. It doesn’t matter which process fails; by linking them, they are both effected.
So what if we actually did want to recover from a failure in a linked process? To do this, we will have to do something called trapping exits. When a linked process fails, we are given an opportunity to recover from it. We can listen for exits using a receive
block, the typical way that messages are passed from one process to another.
Tell our current process that we want to trap exits Process.flag(:trap_exit, true) # Spawn a linked process which will fail spawn_link(fn -> IO.puts "Uh oh..." raise("I have failed you.") end) # Receive the trapped exit message receive do {:EXIT, pid, :normal} -> IO.inspect "Normal exit from #{inspect pid}" {:EXIT, pid, msg} -> IO.inspect ":EXIT received from #{inspect pid}" IO.inspect msg end :timer.sleep(500) IO.puts "I'm done."
You’ll notice that in the receive
block above, I am actually pattern matching for two different :EXIT
messages. The first one is what happens when a process exits normally upon finishing its task. The second one will catch errors and in our case will output:
:EXIT received from #PID<0.73.0> {%RuntimeError{message: "I have failed you."}, [{:elixir_compiler_0, :"-__FILE__/1-fun-0-", 0, [file: 'error_linking_traps.exs', line: 5]}]}
Monitoring Processes
Links are bidirectional, but monitoring on the other hand is unidirectional. It allows you to monitor (hence the name) the status of another process without linking yourself to it. You’re observing it at a safe distance. Unlike linking, an error in a monitored process won’t bring down your current one; you’ll just be notified of it.
# Spawn a new process and grab its pid pid = spawn(fn -> :timer.sleep 500 raise("Sorry, my friend.") end) # Set up a monitor for this pid ref = Process.monitor(pid) # Wait for a down message for given ref/pid receive do {:DOWN, ^ref, :process, ^pid, :normal} -> IO.puts "Normal exit from #{inspect pid}" {:DOWN, ^ref, :process, ^pid, msg} -> IO.puts "Received :DOWN from #{inspect pid}" IO.inspect msg end
We’ll see the following:
Received :DOWN from #PID<0.73.0> {%RuntimeError{message: "Sorry, my friend."}, [{:elixir_compiler_0, :"-__FILE__/1-fun-0-", 0, [file: 'error_monitoring.exs', line: 3]}]}
Supervising
Linking and monitoring are available when you need them, but Elixir comes with Supervisor functionality. This allows us to easily define what behavior should occur when the code that is being supervised fails. We’ll use an example of a cache store, which fetches the cached value as long as it hasn’t expired.
In the code below, we first fetch the total
value, providing a function to call if it doesn’t exist or if it has already expired. We then identify the pid of this named process, which is 0.109.0
. After sending a :kill
message to the process, we then identify the pid again and can see that it is now 0.115.0
. It has automatically been restarted by its supervisor and is now able to fetch the total
value again (which would need to be recalculated because all state was lost when the process was killed).
iex(1)> CashMan.Cache.fetch('total', fn -> 20 end) 20 iex(2)> pid = Process.whereis(CashMan.Cache) #PID<0.109.0> iex(3)> Process.exit(pid, :kill) true iex(4)> Process.whereis(CashMan.Cache) #PID<0.115.0> iex(5)> CashMan.Cache.fetch('total', fn -> 20 end) 20
Because this example runs as an application, we’ll implement the start
function which is called automatically. Its job in this case is to start the Supervisor
module for this application by calling the start_link
function. Supervisors, like any other concurrent code in Elixir, are simply a specialized process.
defmodule CashMan do use Application def start(_type, _args) do CashMan.Supervisor.start_link end end
The implementation for the Supervisor
module includes the use Supervisor
statement. This gives us all of the functionality which comes built in to Elixir for this behavior.
We’ll call the start_link
function that comes with Supervisor, passing it the __MODULE__
(our current module, to use as the supervising module), an initial value, which in our case is simply :ok
, and the name of this process.
The init
function is then called automatically, which is where we can define the exact behavior for this specific supervisor: which children will it supervise, and which strategies should be used in case they fail.
Supervisors can supervise children (GenServers), but they can also supervise other supervisors, creating a supervision hierarchy or tree. Benjamin Tan Wei Hao produced an excellent cheatsheet detailing all of the different functions and options for supervisors.
defmodule CashMan.Supervisor do use Supervisor def start_link do Supervisor.start_link(__MODULE__, :ok, name: CashMan.Supervisor) end def init(:ok) do children = [ worker(CashMan.Cache, [CashMan.Cache]) ] supervise(children, [strategy: :one_for_one]) end end
A good overview of the different strategies can be found in this article, and although it is speaking about Erlang, the restart strategies are identical in Elixir.
I chose to use :one_for_one
in the example above because the supervisor is only supervising one child. You would also use this strategy when it is an isolated process that shouldn’t effect any other children that the supervisor is supervising.
Below we have the child, which implements the GenServer behavior. If you are looking for more details on how a GenServer works, please refer to my previous article on Concurrency Abstractions in Elixir.
defmodule CashMan.Cache do use GenServer @default_expiry 60 def start_link(name) do GenServer.start_link(__MODULE__, %{}, name: name) end # Allow async fetching, which returns a `Task`, # allowing you to call `Task.await()` at a later date. def async_fetch(key, func, expiry \\ @default_expiry) do Task.async(fn -> fetch(key, func, expiry) end) end # Fetch the fresh value for a given key # If missing or expired, re-generate a new value and store it in the cache. def fetch(key, func, expiry \\ @default_expiry) do case GenServer.call(__MODULE__, {:fetch, key, expiry}) do :missing -> value = Task.async(fn -> func.() end) |> Task.await() store(key, value, expiry) value value -> value end end # Store a given value in the cache, providing its expiry time in seconds def store(key, value, expiry) do GenServer.cast(__MODULE__, {:store, key, value, expiry}) end # Remove all expired entries from the cached def prune do GenServer.cast(__MODULE__, :prune) end # Return the current state of the cache def current do GenServer.call(__MODULE__, :current) end # Server def handle_call({:fetch, key, _expiry}, _from, state) do {answer, new_state} = case Map.fetch(state, key) do {:ok, {expired_at, value}} -> case expired?(expired_at) do true -> {:missing, Map.delete(state, key)} false -> {value, state} end :error -> {:missing, state} end {:reply, answer, new_state} end def handle_call(:current, _from, state) do {:reply, state, state} end def handle_cast({:store, key, value, expiry}, state) do new_state = Map.put(state, key, {calc_expired_at(expiry), value}) {:noreply, new_state} end def handle_cast(:prune, state) do new_state = Enum.reduce(state, %{}, fn ({key, {expired_at, value}}, new_state) -> if (expired?(expired_at)) do new_state else Map.put(new_state, key, {expired_at, value}) end end) {:noreply, new_state} end def calc_expired_at(expiry) do (DateTime.utc_now() |> DateTime.to_unix()) + expiry end def expired?(expired_at) do DateTime.to_unix(DateTime.utc_now()) > expired_at end end
By calling :observer.start
in any iex console, you will be able to see examples of supervisors. Logging
contains one which you can explore! Ours from the example above looks like the following:
Conclusion
For a deeper dive into the world of Elixir and OTP, I recommend The Little Elixir and OTP Guidebook. It does a great job diving much deeper into each of the subjects we touched on above. The topic of supervisors in Elixir is much deeper and more nuanced than I could have hoped to cover in a single article. As usual, the Elixir website has an excellent guide on supervisors also.
The cool thing about Elixir is that all of the more advanced/abstracted functionality is built on top of the building blocks of processes, which can link themselves to other processes and send and receive messages from one process to another. Everything else is an abstraction, including a supervisor which is just a specialized GenServer that comes with the language.
Reference: | Linking, Monitoring, and Supervising in Elixir from our WCG partner Leigh Halliday at the Codeship Blog blog. |