Exploring Terraform: Modules

July 8, 2015

At This End Out, we’re using terraform to manage our AWS infrastructure. I’m planning on sharing our official design and toolset once it’s matured and seen more production time. In the meantime, I’m exploring the theory and practical constraints around using terraform in a continuous integration and deployment mindset. To begin, I’d like to look at building modular terraform configurations.

Consider a terraform configuration Prod. Prod describes an infrastructure that is instantiated when needed. It has internal variables and relationships between resources. It may be a single instance or an entire multi-provider infrastructure. Creating an instance of Prod is easy in terraform; you cd into the directory and run terraform plan && terraform apply. This method can work regardless of how large Prod gets.

Complications arise when you want to duplicate some or all of Prod for another environment, such as Dev (You may also run into this if you have a lot of duplication within Prod itself). Prod and Dev need to be as similar as possible, but will necessarily have different configurations or parameters otherwise there could be conflicts when the resources are created. The easiest way to duplicate Prod would be to copy all of Prod’s resources and manually edit the values to make Dev. This may work but often leads to problems down the line; it isn’t obvious which values must be the same or different per-environment. If you’re working on a team (or your future self returns to this file after days or weeks), the next duplication or update may result in a missed parameter and a conflict. If you’re lucky, you’ll get an error when you try to create the resource. If not, you’ll spend hours hunting down the source of some weird behavior in production.

There’s also the problem of duplication. Once you have two or three copies of your infrastructure, how do you add a resource to all of them? Right now, the only option is more cut-and-paste. Also, there will likely be parts of your infrastructure that are shared (as a singleton) between these environments. So there is a need to ensure that multiple copies of shared infrastructure are not created for each environment.

Enter modules, which give the ability to re-use and share configuration between instances of your infrastructure. Terraform currently (0.5) has two types of modules: local (the regular kind) and remote. In short, modules allow you to create encapsulated terraform configurations per-folder and reference them from other configurations.

As a way of setting the context, consider an infrastructure resource A. A requires some input parameters in order to complete its configuration. Once configured, A outputs some values that may be used by other resources. Enter related resource B. Either A needs information from B or A provides information to B. If A and B need information from each other, then we have a circular dependency. This shouldn’t happen at the resource level. If A and B are modules, they should probably be combined into one module: AB.

In this way we can identify A and B as either data producer or data consumer. (It’s also possible to reverse the concept and consider B to be a producer of a service which A consumes. I’m going to concentrate on the variable/data flow to maintain a fixed vantage point.) In a simple system, the consumer has no knowledge of the producer and is simply instantiated with the correct inputs. Let’s use A => B to mean data flows from A to B.

So how do we represent this in terraform? At a resource level, we can use the resource attributes to refer to the ‘outputs’ of a resource. At the module level, we have a few options:

A creates an instance of B by using the module resource-type. Now, A owns a local copy of B and passes the data B requires.
A is created as a standalone module with defined outputs. B creates a copy of A and uses the outputs of A to configure itself.
A is created as a standalone module with defined outputs. B creates a reference to A using a remote module resource-type. Here, B makes use of a pre-existing A.
Create a meta-module C in which we instantiate A and B. We must ‘gmanually’ wire the outputs of A into the inputs of B.

These options highlight some interesting aspects of this problem:

We can opt to have A own B, B own A, or have no direct ownership between them.
We have to consider whether A or B are singletons or can have multiple copies created.
Remote-modules are output-only. Since they are assumed to be managed and created elsewhere, we don’t have the option to create B inside of A if A is a remote module.
Option 1 and 2 seem roughly interchangeable in isolation. As we add more resources to the dependency graph, though, it’s likely that one of the options will cease to be viable and one of the resources will be forced to be the wrapper/parent module.

In all four cases, we achieve A => B. In choosing, we have to consider ownership (both in code and within the organization) and arity. As we introduce resources D, E, and F we’ll need to be more purposeful about the design. It’s likely that adding more components will help expose which components are foundational and belong at the top of the hierarchy. This introduces a different set of problems, though, when it comes to managing the lifecycle of environments that have a shared parent. I plan to explore the problems of environment lifecycles in the next post.