Nested data structures with functional lenses

The concept I’m going to describe here is far from new, however I found that there wasn’t a good Elixir library that implemented it so I wrote one (https://hex.pm/packages/lens). If you’re familiar with lenses, then you’ll probably not find anything new in the post, just go and use the library.

Still here? Great. So what does lens do? You can think about this like Elixir’s get_in/update_in on steroids. Where in the case of get_in/update_in you can describe a path to a single value (with the narrow exception of Access.all), lens allows you to describe a path to many values and get or update all of them with one function call. In fact lenses in lens are compatible with get_in/update_in. Also note that when I say “update” I mean that in the functional sense - create a new datastructure that’s mostly the same except for the parts we’re “updating”.

For example you can use Lens.filter like Access.all that only focuses on certain elements:

import Integer

data = 1..10
lens = Lens.filter(&is_odd/1)

get_in(data, [lens])
# => [1, 3, 5, 7, 9]
update_in(data, [lens], &(-&1))
# => [-1, 2, -3, 4, -5, 6, -7, 8, -9, 10]

The neat thing about lenses is that you can use functions from Lens to compose them into more complex lenses. The simplest example is Lens.seq - it applies the first lens, and then the second one in sequence. It behaves very similarly to listing many keys in a get_in/update_in call, but returns a lens, so the result can be composed further. For example:

import Integer

data = %{items: 1..10}
# Lens.key focuses on the given key in a map
lens = Lens.seq(Lens.key(:items), Lens.filter(&is_odd/1))

get_in(data, [lens])
# => [1, 3, 5, 7, 9]
update_in(data, [lens], &(-&1))
# => %{items: [-1, 2, -3, 4, -5, 6, -7, 8, -9, 10]}

Another useful composition function is Lens.both (and it’s generalisation Lens.multiple) - it focuses on what both of it’s argument lenses focus on.
For example:

import Integer

data = %{a: 1..10, b: 1..10}
lens = Lens.seq(
  Lens.both(Lens.key(:a), Lens.key(:b)),
  Lens.filter(&is_odd/1))

get_in(data, [lens])
# => [1, 3, 5, 7, 9, 1, 3, 5, 7, 9]
update_in(data, [lens], &(-&1))
# => %{a: [-1, 2, -3, 4, -5, 6, -7, 8, -9, 10],
#      b: [-1, 2, -3, 4, -5, 6, -7, 8, -9, 10]}

Now for a bigger example. Let’s imagine we have devised some kind of structure to represent mathematical expressions.

3                   # This just represents the number 3
{:+, 1, 2}          # This is 1 + 2
{:*, {:+, 1, 2}, 3} # This is (1 + 2) * 3
{:+, 1, :x}         # This is 1 + x

This is less theoretical than it may seem - if you’re writing some kind of parser you might end up with something that’s very similar. Now let’s say we would like to validate that only allowed operations have been used in the expression (and we allow +, -, *, and /). So we roll up our sleeves and type out the following function:

def allowed_operators?({operator, operand1, operand2}) do
  Enum.member?([:+, :*, :-, :/], operator) and
    allowed_operators?(operand1) and
    allowed_operators?(operand2)
end
def allowed_operators?(_), do: true

So far, so good. Now let’s add another function, this one substituting a value for a variable, so that the expression can be evaluated:

def substitute({operator, operand1, operand2}, variable, value) do
  {
    operator,
    operand1,
    substitute(operand2, variable, value)
  }
end
def substitute(variable, variable, value), do: value
def substitute(value, _, _), do: value

And finally a function to evaluate an expression:

def eval({operator, operand1, operand2}) do
  operand1 = eval(operand1)
  operand2 = eval(operand2)
  case operator do
   :+ -> operand1 + operand2
   :- -> operand1 - operand2
   :* -> operand1 * operand2
   :/ -> operand1 / operand2
  end
end
def eval(value), do: value

What is wrong with this picture? If you look at it from the right angle it turns out that only a small part of the code is actually dealing with what those functions are supposed to be doing.

# This
Enum.member?([:+, :*, :-, :/], operator)

# This
def substitute(variable, variable, value), do: value

# ... and this
case operator do
 :+ -> operand1 + operand2
 :- -> operand1 - operand2
 :* -> operand1 * operand2
 :/ -> operand1 / operand2
end

The rest is there in each of these functions to traverse the expression structure. You might have also noticed that there is a bug in one of the traversals - substitute only goes into the right branch of the tree. Couldn’t we somehow just do the traversal once and only provide the changing part? That would also make it less likely to make a bug in one of the cases.

One approach would be to write a traversal function that takes an expression and an action to perform. It then recursively calls itself with subexpressions of that expression and the same action, calling the action at each subexpression. This is fine, although it might work out a bit complicated if you need different kinds of traversals. For full generality it will need to be a fold (https://en.wikipedia.org/wiki/Fold_(higher-order_function)).

What I want to propose is to use lens instead to create a composable description of the traversal that can be used to implement all the cases we need.

def subexpressions do
  Lens.match(fn
    {_, _, _} -> Lens.multiple([
      Lens.seq(Lens.at(1), subexpressions()),
      Lens.seq(Lens.at(2), subexpressions()),
      Lens.root()
    ])
    _ -> Lens.root()
  end)
end

expression = {:+, {:*, 2, :x}, 3}
get_in(expression, [subexpressions()]) |> IO.inspect
# => [2, :x, {:*, 2, :x}, 3, {:+, {:*, 2, :x}, 3}]

This introduces some new concepts. Lens.match passes the datastructure to the given choice function and uses the lens returned from that function. Lens.at focuses on the given index in a list or tuple. Lens.root focuses on the whole datastructure. On a high level the first clause in that Lens.match implements the recursive branch of the traversal while the second clause traverses a leaf.

Notice that the structure is visited in the order part-whole. Functions in lens follow this convention and it will be very useful when implementing eval.

Using that we can reimplement the operations as follows:

def allowed_operators?(expression) do
  get_in(expression, [subexpressions()])
  |> Enum.filter(&match?({_, _, _}, &1))
  |> Enum.map(fn {op, _, _} -> op end)
  |> Enum.all?(&Enum.member?([:+, :-, :*, :/], &1))
end

def substitute(expression, variable, value) do
  update_in(expression, [subexpressions()], fn
    ^variable -> value
    other -> other
  end)
end

def eval(expression) do
  update_in(expression, [subexpressions()], fn
    {:+, a, b} -> a + b
    {:-, a, b} -> a - b
    {:*, a, b} -> a * b
    {:/, a, b} -> a / b
    a -> a
  end)
end

There is still some ugliness, especially in allowed_operators?, but you can see that the traversal logic has been almost completely moved out of these functions. Now we can use the composition features I talked before to create another lens based on subexpressions that only focuses on operators:

def operators do
  subexpressions()
  |> Lens.satisfy(&match?({_, _, _}, &1))
  |> Lens.at(0)
end

Lens.satisfy is a little bit like Lens.filter but instead of always acting on an Enumerable it refines what a previous lens focuses on. Only those items that return true from the filter will be focused on in the composed lens. We also rely on the fact that most functions in lens come in 2 variants so that |> Lens.at(0) is equivalent to |> Lens.seq(Lens.at(0)), providing a very natural way to sequence them.

With that in hand we can simplify the implementation of allowed_operators? like so:

def allowed_operators?(expression) do
  get_in(expression, [operators()])
  |> Enum.all?(&Enum.member?([:+, :-, :*, :/], &1))
end

I hope this short introduction convinced you to try lenses in your project. Check out the full documentation at https://hexdocs.pm/lens/readme.html. If you have suggestions, issues, or want to contribute - visit the repo at https://github.com/obrok/lens.

 
48
Kudos
 
48
Kudos

Now read this

Benefits of the erlang scheduler

Elixir has a certain advantage when it comes to writing actor code over other systems of that type. This advantage is thanks to running on top of erlang’s BEAM VM. What’s more it cannot be easily replicated when concurrency is based on... Continue →