Exploring Terraform: Modules
At This End Out, we’re using terraform to manage our AWS infrastructure. I’m planning on sharing our official design and toolset once it’s matured and seen more production time. In the meantime, I’m exploring the theory and practical constraints around using terraform in a continuous integration and deployment mindset. To begin, I’d like to look at building modular terraform configurations.
Consider a terraform configuration Prod
. Prod
describes an infrastructure that is instantiated when needed. It has internal variables and relationships between resources. It may be a single instance or an entire multi-provider infrastructure. Creating an instance of Prod
is easy in terraform; you cd
into the directory and run terraform plan && terraform apply
. This method can work regardless of how large Prod
gets.
Complications arise when you want to duplicate some or all of Prod
for another environment, such as Dev
(You may also run into this if you have a lot of duplication within Prod
itself). Prod
and Dev
need to be as similar as possible, but will necessarily have different configurations or parameters otherwise there could be conflicts when the resources are created. The easiest way to duplicate Prod
would be to copy all of Prod
’s resources and manually edit the values to make Dev
. This may work but often leads to problems down the line; it isn’t obvious which values must be the same or different per-environment. If you’re working on a team (or your future self returns to this file after days or weeks), the next duplication or update may result in a missed parameter and a conflict. If you’re lucky, you’ll get an error when you try to create the resource. If not, you’ll spend hours hunting down the source of some weird behavior in production.
There’s also the problem of duplication. Once you have two or three copies of your infrastructure, how do you add a resource to all of them? Right now, the only option is more cut-and-paste. Also, there will likely be parts of your infrastructure that are shared (as a singleton) between these environments. So there is a need to ensure that multiple copies of shared infrastructure are not created for each environment.
Enter modules, which give the ability to re-use and share configuration between instances of your infrastructure. Terraform currently (0.5) has two types of modules: local (the regular kind) and remote. In short, modules allow you to create encapsulated terraform configurations per-folder and reference them from other configurations.
As a way of setting the context, consider an infrastructure resource A
. A
requires some input parameters in order to complete its configuration. Once configured, A
outputs some values that may be used by other resources. Enter related resource B
. Either A
needs information from B
or A
provides information to B
. If A
and B
need information from each other, then we have a circular dependency. This shouldn’t happen at the resource level. If A
and B
are modules, they should probably be combined into one module: AB
.
In this way we can identify A
and B
as either data producer
or data consumer
. (It’s also possible to reverse the concept and consider B
to be a producer of a service which A
consumes. I’m going to concentrate on the variable/data flow to maintain a fixed vantage point.) In a simple system, the consumer has no knowledge of the producer and is simply instantiated with the correct inputs. Let’s use A => B
to mean data flows from A
to B
.
So how do we represent this in terraform? At a resource level, we can use the resource attributes to refer to the ‘outputs’ of a resource. At the module level, we have a few options:
A
creates an instance ofB
by using themodule
resource-type. Now,A
owns a local copy ofB
and passes the dataB
requires.A
is created as a standalone module with defined outputs.B
creates a copy ofA
and uses the outputs ofA
to configure itself.A
is created as a standalone module with defined outputs.B
creates a reference toA
using a remote module resource-type. Here,B
makes use of a pre-existingA
.- Create a meta-module
C
in which we instantiateA
andB
. We must ‘gmanually’ wire the outputs ofA
into the inputs ofB
.
These options highlight some interesting aspects of this problem:
- We can opt to have
A
ownB
,B
ownA
, or have no direct ownership between them. - We have to consider whether
A
orB
are singletons or can have multiple copies created. - Remote-modules are output-only. Since they are assumed to be managed and created elsewhere, we don’t have the option to create
B
inside ofA
ifA
is a remote module. - Option 1 and 2 seem roughly interchangeable in isolation. As we add more resources to the dependency graph, though, it’s likely that one of the options will cease to be viable and one of the resources will be forced to be the wrapper/parent module.
In all four cases, we achieve A => B
. In choosing, we have to consider ownership (both in code and within the organization) and arity. As we introduce resources D
, E
, and F
we’ll need to be more purposeful about the design. It’s likely that adding more components will help expose which components are foundational and belong at the top of the hierarchy. This introduces a different set of problems, though, when it comes to managing the lifecycle of environments that have a shared parent. I plan to explore the problems of environment lifecycles in the next post.