Introduction to Terraform

July 28, 2022 | 8 min Read

Introduction to Terraform

What is Terraform?

Developed by Hashicorp, creators of Vagrant and Packer, Terraform is a software that allows us to define our infrastructure as code. Terraform processes our code, compares it with the state of the specified service provider and builds an execution plan so that the state of the deployed infrastructure is as defined in the code.

Practical explanation: we can add new instances or modify instances and resources in our code (ssh keys, network connectivity, firewall rules) and apply them to the remote infrastructure without worrying about how.

Terraform has support for all leading providers of local or cloud infrastructure such as Amazon AWS, Microsoft Azure, Openstack, VMware vSphere and Digital Ocean.

The complete list can be found on the Terraform website.

Terraform Workflow

alter-text
Terraform Workflow

The Terraform workflow is based on five key steps: write, initiate, plan, apply and destroy. However, their details and actions vary between workflows.

  • Write: creates changes in the code.

  • Init: initializes your code to download the requirements mentioned in your code.

  • Plan: reviews the changes and chooses whether to simply accept them.

  • Apply: accepts the changes and applies them to the actual infrastructure.

  • Destroy: to destroy all the created infrastructure.

Any difference between the remote state and the locally defined state will be irreversibly applied once the terraform apply confirmation is accepted. It is advisable to visualize the plan before executing changes.

What is a State?

alter-text
What is a State

Terraform must store the state of its managed infrastructure and configuration. Terraform uses this state to assign real-world resources to its configuration, tracking metadata and improving the performance of large infrastructures.

This state is stored by default in a local file called terraform.tfstate, but can also be stored remotely, which works best in a team environment.

Terraform uses this local state to create plans and make changes to its infrastructure. Before any operation, Terraform performs an update to bring the state up to date with the real infrastructure.

The main purpose of Terraform state is to store links between objects in a remote system and instances of resources declared in its configuration.

When Terraform creates a remote object in response to a configuration change, it will register the identity of that remote object to a particular resource instance and then potentially update or delete that object in response to future configuration changes.

What variables does terraform offer?

The Terraform language uses the following types for its values:

  • String: a sequence of Unicode characters representing some text, such as “hello”.

  • Number: a numeric value, the number type can represent either integers or fractional values such as 6.283185.

  • Bool: a boolean value, either true or false, can be used in conditional logic.

  • List or tuple: a sequence of values, such as [‘us-west-1a’, ‘us-west-1c’]. The elements of a list or tuple are identified by consecutive integers, starting with zero.

  • Map or object: a group of values identified by named labels, such as {name = “Mabel”, age = 52}.

Strings, numbers and Booleans are sometimes referred to as primitive type. Lists/tuples and maps/objects are sometimes called complex type, structural type, or collection type.

Finally, there is a special value that has no type:

  • Null: represents absence or omission. If you set an argument of a resource to null, Terraform behaves as if you have omitted it entirely, will use the default value of the argument if it has one, or will generate an error if the argument is mandatory.

    Null is most useful in conditional expressions, so you can dynamically omit an argument if a condition is not met.

What is an output for?

The terraform output command is used to extract the value of an output variable from the status file.

What is a module and how to use it?

Terraform modules allow us to create the infrastructure in a simpler and faster way. In addition, they allow us to reuse our code across multiple environments or applications, we just have to change the appropriate parameters to get the results we want.

Module structure

Reusable modules are defined using the same configuration language concepts that we use in root modules.

Most commonly, modules use:

  • Input variables: to accept values from the calling module.

  • Output values: to return results to the calling module, which it can then use to complete arguments elsewhere.

  • Resources: to define one or more infrastructure objects to be managed by the module.

In order to define a module, you must create a new directory for it and place one or more .tf files inside it just as you would do with a root module.

Terraform can load modules from local relative paths or from remote repositories; if a module will be reused by many configurations, you may want to place it in its own version control repository.

Modules can also call other modules using a module block, but we recommend keeping the module tree relatively flat and using module composition as an alternative to a deeply nested module tree, because this makes individual modules easier to reuse in different combinations.

In addition, we can wrap existing modules in modules that have a simpler interface or add an extra layer of configuration to the infrastructure. All we need to do is place them in a repository and version them.

What is a Workspace?

Since version 0.10 of Terraform, there is the Workspace concept, which allows, by using the “terraform workspace” command, to create different workspaces that automatically point to different Terraform status files.

terraform workspace new [nombre_workspace]  
-> para crear un workspace nuevo
terraform workspace select [nombre_workspace]  
-> para trabajar en ese workspace

When “plan” and “apply” different variable files can be used, as well as different output files per workspace with the –var-file and -out options:

terraform plan -var-file='../../dev/dev.tfvars' -out='../../dev/terraform-dev.tfplan'
terraform apply -out='../../dev/terraform-dev.tfplan'

This way we would have in our repository a variables file and a file with the output of the “terraform plan” command for each environment, but the same resource definition for all of them.

Another option, if we want to avoid maintaining different variable files per environment, is to use the variable local.env in our variable definition file. This variable is initialized when we execute the command “terraform select workspace”.

For example, if we run “terraform select workspace dev”, the value of local.env is “dev”.

In this way we can use the following “local maps” of variables to define, from a single variable file, the different characteristics of our environment:

locals {
    instances = {
        "dev"    = "m4.large"
        "pre"    = "m4.large"
        "pro"    = "m5.large"
    }
    instance_type = "${lookup(local.instances,local.env)}"
}

This option is quite elegant, because from a single file you can quickly and intuitively see the different configurations of the different environments. It is also simpler because it is not necessary to include in the plan and apply commands any reference to the environment or to the specific file of variables to be used.

Using multiple workspaces allows us to use the same code for all our environments and to maintain in a practically transparent way for us different tfstates per environment, within the same backend.

That is, supposing we had a single resource file, a bucket and three workspaces called development, pre-production and production, then we would have the following content in our bucket:

│[bucket]/
      └── desarrollo.tfstate
      └── preproduccion.tfstate
      └── produccion.tfstate

If we also had the resources divided into independent directories (as indicated above), the backend with a prefix according to the name of the resource, and the three workspaces already mentioned, we would have the following division in the backend:

│[bucket]/vpc
      └── desarrollo.tfstate
      └── preproduccion.tfstate
      └── produccion.tfstate
│[bucket]/subnetwork
      └── desarrollo.tfstate
      └── preproduccion.tfstate
      └── produccion.tfstate
│[bucket]/gke
      └── desarrollo.tfstate
      └── preproduccion.tfstate
      └── produccion.tfstate
[...]

In the case of aws, it also sets the prefix inside the bucket:

 "env:/"${terrraform.workspace}"/"${resource}".

Finally, a tip when using the “outputs” generated from one resource directory (with its corresponding backend and its tfstate) from another resource directory:

If we define all our terraform resources in the same directory, when referencing between them, it is sufficient to refer directly to the name of the resource, module or data like this:

“resource.resource_name.field”.

Let’s imagine that we have the resources divided by directories and we need to reference, from the “/eks” directory, the “outputs of the directory where we manage the creation of the vpc” (that means, to reference some parameter of the resources created in the “/vpc” directory, as for example, the identifier of the created vpc).

In that case, having this division by folders, we would have to import the remote tfstate created from the vpc directory, using “terraform_remote_state” as follows.

data "terraform_remote_state" "vpc" {
    backend = "[nombre backend]"
    config {
        encrypt = true
        bucket = "[nombre bucket]"
        key = "env:/${terraform.workspace}/vpc/terraform.tfstate"
        region = "eu-west-1"
    }
}

Then you can use the vpc_id (VPC identifier) as follows:

vpc_id= "${data.terraform_remote_state.vpc.vpc_id}"

That means, the reference to the backend when we create the resources from the original directory (in the example the vpc directory), is done indicating the prefix that we want it to include according to the resource:

key   = "vpc/terraform.tfstate"

then, Terraform transparently creates the path

 “env:/[nombre workspace]/vpc/terraform.tfstate." 

However, when “using” this remote ’tfstate’ from a directory different from the original one (the eks directory in the example), it is necessary to define the key as indicated above:

key = "env:/${terraform.workspace}/vpc/terraform.tfstate"

It is not complicated, but because it is not very intuitive, it can lead to errors.

When should it be used?

Pointing out when it makes sense to use or not to use a technology, since they all have their pros and cons. On this occasion it is recommended to always use it, even for that small project that usually ends up being bigger.

The only case where it is not recommended is for rapid tests (especially if they are small).

The use of IaC provides us with repeatable, maintainable, reusable, versioned and more stable infrastructure, while making us more aware of what we do.

Of course, it requires a little more time to write the code, especially when using a guide where everything is done with commands, but in the end, the benefits more than compensate for this extra time.