Advanced Enumeration with Ruby
Enumeration by definition is “the action of mentioning a number of things one by one.” In programming, instead of mentioning, we choose any action we may want to perform, whether it simply be printing out the item to a display or performing some sort of selection and/or transformation on the item.
In programming, we can perform many ways to select and process a collection at one time by chaining on each additional transformation in steps. And each step can either consume the entire collection before handing the results off to the next step, or it can be handled “lazily” and pass one or more items at a time through all the transformations.
How Ruby Does Enumeration
In this post, I’ll give a quick review on what blocks and yields do. The blocks we’re interested in for Ruby are sections of code defined within methods or procs/lambdas. You can think of yield
as a place where a code block gets pasted into the current code block from elsewhere. Let me demonstrate.
def my_printer puts "Hello World!" end def thrice 3.times do yield end end thrice &method(:my_printer) # Hello World! # Hello World! # Hello World! thrice { puts "Ruby" } # Ruby # Ruby # Ruby
Methods accept two forms of blocks for yield: procs or blocks. The method
method will transform a method definition into a proc, which then can be passed in as a block as above in the my_printer
example.
Where yield
is written above, it’s as if the code passed as a block were written in its place. So in the first case, simply imagine yield
replaced with puts "Hello World!"
and the second yield
replaced with puts "Ruby"
.
yield
can also work as a simple enumerator. You can pass any value in as a parameter to the block/proc by adding it after yield
.
def simple_enum yield 4 yield 3 yield 2 yield 1 yield 0 end simple_enum do |value| puts value end # 4 # 3 # 2 # 1 # 0
Minimum Enumerator Requirements
Ruby’s standard way of producing an enumerator is the each
method, which yields values. With this, you can define an each
method on any Ruby object and then take advantage of more than 50 methods for processing and evaluating collections from the Enumerable module. Simply add include Enumerable
within the object that has a valid each
method, and you can fully utilize all of those methods.
Enumerators aren’t limited to simple collections such as Array
, but any collection that includes the each
method (and will typically have the module Enumerable in its ancestors).
Array.ancestors # => [Array, Enumerable, Object, Kernel, BasicObject] Hash.ancestors # => [Hash, Enumerable, Object, Kernel, BasicObject] Hash.method_defined? :each # => true require "set" Set.ancestors # => [Set, Enumerable, Object, Kernel, BasicObject]
Lazy and Not Lazy Enumeration
Lazy enumeration is often considered a better way of processing a collection, as it will allow you to step through infinite sequences as far as you’d like to go.
Think of an assembly line of people to make a pizza where each person is responsible for only one step in the pizza’s transformation/creation. The first person tosses the dough into the right shape, the next person adds the sauce, the next the cheese, a person for each topping, one to put it in the oven, and the last person to deliver the ready pizza to you. In this example, Ruby’s lazy version of this is to have any number of orders of pizza, but everyone takes the time to do just the first pizza through every step of the process before continuing on to the next pizza to make.
If you don’t use lazy enumeration, then each step would have to wait for the entire collection to be done one step at a time. For example, if you have 20 orders of pizza, the person who tosses the pizza dough will have to do 20 of them before any of them get sauce added on by the next person. And each step in the line waits in a similar manner. Now, the bigger the collection you need to process, the more ridiculous it seems to make the rest of the assembly line wait.
A more real-world example would be processing emails to be sent out to all users. If there is an error in the code and it’s not being handled lazily, then it’s quite likely no one would have received an email. But in the case of lazy evaluation, you could potentially get most of your users emailed before an account information issue causes a problem. If a record is kept of successful emails sent, it’s easier to track down where the issue may lie.
Creating a lazy enumerator in Ruby is as simple as calling lazy
on an object with Enumerable
included in it or to_enum.lazy
on an object with each
defined on it.
class Thing def each yield "winning" yield "not winning" end end a = Thing.new.to_enum.lazy Thing.include Enumerable b = Thing.new.lazy a.next # => "winning" b.next # => "winning"
Calling to_enum
returns an object that is both an Enumerator
and an Enumerable
object and will have access to all of their methods.
 
It is important to pay attention to which enumerable methods will consume the entire collection and which will work with lazy evaluation. For example, the partition
method consumes the entire collection, so it’s unacceptable for infinite collections. Better options for lazy evaluation would be methods like chunk
or select
.
x = (0..Flot::INFINITY) y = x.chunk(&:even?) # => #<Enumerator::Lazy: #<Enumerator: #<Enumerator::Generator:0x0055eb840be350>:each>> y.next # => [true, [0]] y.next # => [false, [1]] y.next #=> [true, [2]] z = x.lazy.select(&:even?) # => #<Enumerator::Lazy: #<Enumerator::Lazy: 0..Infinity>:select> z.next # => 0 z.next # => 2 z.next # => 4
In the case of using select
with an infinite sequence, you must first call the lazy
method to prevent select
from consuming the entire collection and the program halting for want of infinity.
Creating a Lazy Enumerator
Ruby has the Enumerator::Lazy
class, which allows you to write your own lazy enumerator methods like Ruby’s take
.
(0..Float::INFINITY).take(2) # => [0, 1]
For a good example, we’ll implement FizzBuzz, which will start at any integer and allow infinite FizzBuzz results.
def divisible_by?(num) ->input{ (input % num).zero? } end def fizzbuzz_from(value) Enumerator::Lazy.new(value..Float::INFINITY) do |yielder, val| yielder << case val when divisible_by?(15) "FizzBuzz" when divisible_by?(3) "Fizz" when divisible_by?(5) "Buzz" else val end end end x = fizzbuzz_from(7) # => #<Enumerator::Lazy: 7..Infinity:each> 9.times { puts x.next } # 7 # 8 # Fizz # Buzz # 11 # Fizz # 13 # 14 # FizzBuzz
With Enumerator::Lazy
, whatever you give to yielder
will be the value that returns per each step in the progression. Enumerators do keep track of the current progress when using next
. But if you call each
after a few usages of next
, it will start from the beginning of the collection.
The parameter you pass to Enumerator::Lazy.new
is the collection that is to be enumerated over. If you wrote this method for Enumerable
or a compatible object, you can simply place self
as the parameter. val
will be one value produced at a time from the collection’s each
method and the yielder
must be the one to receive input for any block of code you wish to pass to it, such as you would with each
.
Advanced Enumerator Usages
When processing collections of data, it is recommended to put your limitation filters first in the chain of transformations you process. This way, it takes less work for the code to process the data. If you’re getting data from a database to process, have your limitation filters implemented in the database’s own language before Ruby if possible. That will likely be much more efficient.
require "prime" x = (0..34).lazy.select(″.method(:prime?)) x.next # => 2 x.next # => 3 x.next # => 5 x.next # => 7 x.next # => 11
After the select
method above, you could have other methods appended to it to process the data. Those methods will only deal with the limited selection of data within prime numbers and not the rest.
Grouping
One nice way to process data for splitting into columns is to use group_by
to convert the results into a hash of groups. After that, just retrieve the values, as that’s all we’re interested in.
[0,1,2,3,4,5,6,7,8].group_by.with_index {|_,index| index % 3 }.values # => [[0, 3, 6], [1, 4, 7], [2, 5, 8]]
If you print the above results onto a web page, the data would be ordered as follows:
0 3 6 1 4 7 2 5 8
The group_by
code above passes both a value and an index into the code block. We use an underscore for the value from the array to indicate we don’t care about that value and are only interested in the index. What gets returned by that is a hash with the keys of 0, 1, and 2 pointing to each of the groups of values we grouped. Since we don’t care about the keys, we call values
on that hash to get the array of arrays to display as we please.
If we wanted to arrange the collection from left to right in columns, we could simply do this:
threes = (0..2).cycle [0,1,2,3,4,5,6,7,8].slice_when { threes.next == 2 }.to_a # => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
The threes
enumerator simply cycles through 0 to 2 infinitely, in a lazy fashion. Which will then permit the display to be:
0 1 2 3 4 5 6 7 8
Ruby also has a transpose
method, which will flip the above results from one to the other.
x = [[0, 1, 2], [3, 4, 5], [6, 7, 8]] x = x.transpose # => [[0, 3, 6], [1, 4, 7], [2, 5, 8]] x = x.transpose # => [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
Folding
Let’s look at ways to compound a collection down to a result. In other languages, this is commonly done with a method named fold
. In Ruby, it has long been done with reduce
and inject
. A more recent addition, and the preferred way to do this, is with each_with_object
. The basic idea behind these is to process one collection into another as the result.
Summing a collection of integers is as simple as:
[1,2,3].reduce(:+) # => 6 [1,2,3].inject(:+) # => 6 class AddStore def add(num) @value = @value.to_i + num end def inspect @value end end [1,2,3].each_with_object(AddStore.new) {|val, memo| memo.add(val) } # => 6 # As of Ruby 2.4 [1,2,3].sum # => 6
each_with_object
typically needs an object that can be updated. You can’t change an integer object from itself, which is why for this trivial example we created an AddStore object.
These methods will be better demonstrated by taking data from one collection and placing them into another. Note that inject
and reduce
are the same aliased method in Ruby and need to have the return value be what is at the end of the block for what the enumeration continues to build upon. each_with_object
does not need the last piece of the code block to return the item to build on.
collection = [:a, 2, :p, :p, 6, 7, :l, :e] collection.reduce("") { |memo, value| memo << value.to_s if value.is_a? Symbol memo # Note the return value needs to be the object/collection we're building } # => "apple" collection.each_with_object("") { |value, memo| memo << value.to_s if value.is_a? Symbol } # => "apple"
Structs
Ruby struct objects are also enumerable objects, which can make for some convenient objects to write methods in.
class Pair < Struct.new(:first, :second) def same?; inject(:eql?) end def add; inject(:+) end def subtract; inject(:-) end def multiply; inject(:*) end def divide; inject(:/) end def swap! members.zip(entries.reverse) {|a,b| self[a] = b} end end x = Pair.new(23, 42) x.same? # => false x.first # => 23 x.swap! x.first # => 42 x.multiply # => 966
Structs aren’t usually used for large collections but rather as useful data objects, a way to pass organized data together, which permits clear purpose with data rather than data clumps.
Data clumps are when two or more variables are always used in group and it wouldn’t make sense to use one of the variables by itself. This group of variables should be extracted into an object/class.
So structs in Ruby are generally small collections of data, but there isn’t anything to say that the data itself could be other collections of data. In which case, a struct could be a way to implement transformations over those collections, much like you could do with writing a class of your own.
Summary
Ruby’s pretty fantastic with how easy it is to work with and manage collections of data. Learning each piece of what Ruby has to offer allows you to write far more elegant code and to test and optimize for better implementations.
If performance is key, then benchmark alternative implementations and be sure to put your filters and limits as early into the process as you can. Consider limiting your input source into smaller chunks when you can, like using the readline
method on files rather than read
or readlines
or LIMIT number
in SQL.
Lazy iteration can help greatly with splitting tasks off for different threads or background jobs to handle. The concept of lazy iteration really has no downsides, as you could still choose to consume any entire collection at any point. It offers the greatest flexibility, and some languages, such as Rust with iterators, have made it their standard to be implemented lazily.
The possibilities are endless when it comes to how to manage and transform data sets. And it’s a fun process to learn and create each way of handling our data sets by programming. Ruby has well-documented examples for each of their enumerable methods, so it helps to learn from the examples given. I encourage you to experiment and discover many new things which will help make programming all the more enjoyable.
Reference: | Advanced Enumeration with Ruby from our WCG partner Daniel P. Clark at the Codeship Blog blog. |