Understanding Elixir Types
When I first started getting into Elixir, one thing that never quite made sense to me was the type system. It worked…but it wasn’t clear why it was set up the way it was or what benefits come from it. Hopefully, I can get you through some of my initial confusion about Elixir types.
The Why: Message Passing
My initial confusion came from just trying to dive in after coming from other languages. When I started exploring Elixir, I wasn’t quite clear on what made Elixir different or the tradeoffs in language design that facilitated those differences. I got into the nuts and bolts of those differences in my post about comparing Elixir and Go, but I’m going to cut to the chase regarding how it affects types first.
Take this example that initially just confused me:
def do_it({:starsky, do_what}), do: do_what def do_it({:hutch, do_what}), do: do_what
This is a very simple and contrived example illustrating a tuple being used to define a function. Elixir uses pattern matching when the function is called, so if I call the function like so:
do_it({:hutch, "Something cool needs to be done here"})
The result is that it will call the second function because it will match on :hutch
. My first thought in looking at that was simply, “Why not just create a :hutch
type? Why not a separate hutch function?”
There are also guard clauses that can check the types at runtime, like so:
def do_it({:starsky, do_what}) when is_binary(do_what), do: do_what
As a follow-up to that, why can’t we define types in a JSON API? We can define a type on either end, but everything is just going to be serialized to a string, sent over the wire, and then deserialized on the other end back into whatever type you’ve defined.
That’s an important detail when you consider that Elixir works using message passing. Elixir functions are set up so that they can transparently be called across processes, heaps, or even machines in a cluster. You might say you’re sending some fancy, contrived type that you’ve created, but underneath it all it’s just a collection of basic data types with a name attached to it.
That’s all those tuples are in this example. Here is my :starsky
payload, and here is my :hutch
payload. As long as the rest of the structure matches the pattern of what’s being passed in, it will call the appropriate function. If the structure doesn’t match, the call will fail.
Because distributed computing transparently across nodes outside of a single heap space is critical to the functionality of Elixir, any more elaborate approach than this would involve serializing, passing, and deserializing for every function call before we even get into contract management.
Message passing is at the root of everything in Elixir, and it’s critical to operating millions of isolated heaps across multiple machines. Those isolated heap spaces make extreme fault-tolerance possible since they can be killed and restarted without impacting other parts of the system.
These are our building blocks.
Strong, Dynamic, Gradual Typing
Wait…what?
That was my reaction until I attended Jason Voegele’s excellent Optimistic Type Checking talk at ElixirConf this past September. If you have the time, it’s worth a watch, but I’m going to summarize with some of his excellent examples.
The example in the previous section looked like dynamic typing because we weren’t doing anything with it. Elixir does actually have strict type checking between primitives.
my_string = "Bob" my_int = 5 my_string + my_int # This is an error
In other languages, you might use a +
for concatenation of strings or addition. In Elixir, there is no operator overloading. You’ll see a whole set of operators that exist for different types.
1 + 1 # = 2 Math [1,2,3] ++ [4,5,6] # = [1,2,3,4,5,6] List concatenation "foo" <> "bar" # = "foobar" String concatenation
The Dialyzer takes advantage of this by scanning your code and inferring types based on the operators used with them. Variables next to a plus can only be numbers, and so on. With this knowledge, Dialyzer can look through your code and identify instances where a variable is being passed a type that doesn’t belong. Dialyzer works on compiled BEAM files and ships with Erlang, but it can be easily plugged into your Elixir project with Dialyxir or Dialyze.
The combination of pattern matching and Dialyzer will catch most of your type violations. You get strong typing and compile time type checking, which strikes a nice balance between the flexibility of dynamic types as well as the strictness of static types.
You may want more than those; implicit checks aren’t always enough, and that’s where typespecs come in handy. Sometimes it’s possible to create open-ended type definitions, and I’ll use Voegele’s example to demonstrate that here.
def add(x, y), do: x + y # add(number, number) :: number def divide(x, y), do: x / y # divide(number, number) :: float def and(false, _), do: false def and(_, false), do: false def and(true, true), do: true # and(any, any) :: boolean
When declaring a pattern for a function with an unused variable, you leave the pattern open-ended. In these cases, you can define a typespec to explicitly declare the definition.
@spec and(boolean, boolean) :: boolean def and(false, _), do: false def and(_, false), do: false def and(true, true), do: true
By adding that spec declaration, Dialyzer can now check every place in the code that the and/2
function is called to ensure that Booleans are being passed for both arguments and that what’s being returned is also a Boolean. This job is made a lot easier because of immutable data; it doesn’t have to worry that a string it checked will be reassigned an integer later on.
Typespecs work with all of the basic types: primitives, atoms, lists, maps, tuples, and even custom types.
This is gradual typing
The end result here is that you get dynamic typing with implicit compile time type-checking by default. You can utilize runtime enforcement with guards, and you can explicitly check any questionable types with typespecs.
Immutable data makes verifying these easier while the adherence to message-passing-compatible structure makes distributed computing and transparent clustering possible. In other words, you gradually increase your type strictness in the same way that you gradually iterate on your development approach.
This approach is an excellent compromise that gives you the best of both worlds with dynamic and static typing, while avoiding the drawbacks of each.
Efficient Usage
Earlier in this post, I showed the string concatenation operator for sake of example, but that’s not very efficient.
Raw string concatenation results in creating a new string from two other strings. This is inefficient because now our memory usage is the combination of both prior strings and the result. Erlang’s secret weapon is to instead use IO Lists to pass and render arrays of string parts rather than combining them in memory.
If you want to go deep on this subject, the folks over at Big Nerd Ranch did a two-part blog post exploring the concept of IO.List, as well as the implementation used within the Phoenix view layer that makes its microsecond response times possible.
Those posts go into more detail and numbers than I will here, but picture this. You create a multilevel template within Phoenix. You’ve got a layout, views, and smaller pieces, but intermixed with all of your code and variables are bits of HTML.
When Phoenix is compiled, those bits are separated into immutable pieces of memory…once. Rendering a view and passing it back through to the requestor no longer becomes a task of outputting all of the combined HTML intermixed with the data. Instead this returns an array of memory references to pieces of HTML alongside the data that will fill those gaps.
When that array gets back to the socket, each piece is sent directly to the socket in proper order…byte by byte…without using any more RAM than it did after compilation. Iterating through a list surrounded by "<li>"
and "</li>"
? Each of those parts is a single memory reference no matter how many list items you combine.
On the one hand, I felt a responsibility to include that; running off and doing string concats and explaining how the Dialyzer does it is great even though that particular case isn’t efficient. On the other hand, knowing where that blazing Phoenix speed comes from is pretty cool…and knowing is half the battle.
Reference: | Understanding Elixir Types from our WCG partner Florian Motlik at the Codeship Blog blog. |