Introduction

Over the past year at my previous company (a telecom provider), I spent a great deal of time automating infrastructure. The primary approach was codifying middleware deployments with Ansible – replacing what had been manual processes – to enable efficient deployment and ongoing operations. However, cloud resources such as virtual machines and domain names were still provisioned manually through web consoles. When setting up new data centers, nearly ten thousand servers were all created by hand through click-ops.

The industry already has a mature tool for this: Terraform. Domestic cloud providers like Alibaba Cloud and Tencent Cloud also offer comprehensive support. This post documents the approach used by the Arch Linux DevOps team and my own local practices.

This article covers:

  • How to codify cloud resources
  • How to encrypt and store sensitive information (AK, SK, etc.) locally in Terraform code

IaC & Terraform

Introduction to Terraform

Terraform is declarative (similar to Kubernetes), built on its own configuration language: HCL. For example, if you have 10 servers, the infrastructure code repository written in HCL will contain declarations for each server’s security groups, instance specifications, and images (of course, modules are used for abstraction to avoid repetitive code).

Terraform HCL only declares essential fields. Non-essential fields – such as instance IDs and image IDs – are stored as state in a location called the state backend. The local HCL code works together with the data in the state backend to fully describe all cloud resource information.

The default state backend is a local .tfstate file on the operator’s machine. This local file creates a problem: if someone else wants to see the current state of all infrastructure, they need to request the latest state file from the previous operator. Therefore, teams typically use a remote state backend, which can be a database or S3 (AWS object storage). For example, I use PostgreSQL as my remote state backend. All infrastructure code contributors simply need secure access to the PostgreSQL instance and object storage.

Even with a remote state backend, there will still be a local .tfstate file, but it only contains the remote state backend connection information – not the supplementary cloud resource data. The local .tfstate file is written during Terraform initialization. For example, my initial Terraform setup command looks like this:

# My secrets are managed via Ansible Vault. `get_key.py` is a script written by the Arch Linux DevOps Team for reading and formatting Ansible Vault encrypted content.
terraform init -backend-config="conn_str=postgres://terraform:$(../misc/get_key.py ../group_vars/all/vault_terraform.yml vault_terraform_db_password)@state.jinmiaoluo.com?sslmode=verify-full"

This generates a local .terraform directory (remember to add it to .gitignore), which contains the .tfstate file.

My Cloud Infrastructure

My main cloud infrastructure consists of:

  • AWS
    • One DNS domain: jinmiaoluo.com
    • A sub-account for automated Let’s Encrypt DNS-01 certificate renewal
    • A sub-account for Terraform to manage cloud resources
  • Alibaba Cloud
    • A lightweight application server in Hong Kong, running Xray and Trojan-go for proxy purposes
    • A sub-account for Terraform to manage cloud resources
  • Tencent Cloud
    • A lightweight application server in Hong Kong, running FRPS to expose the blog to the public internet (high bandwidth but high latency)
    • A lightweight application server in Guangzhou, running FRPS to expose WireGuard to the public internet (low bandwidth but low latency), allowing me to access physical and virtual machines on my home LAN from anywhere – such as a coffee shop. Common use cases include accessing my self-hosted Jira/GitLab/Samba, and remote development via VSCode Remote (similar to GitPod).
    • A sub-account for Terraform to manage cloud resources

How I Use Terraform

I use PostgreSQL as my remote state backend. The PostgreSQL instance is deployed in my virtualization environment, accessible from anywhere via WireGuard:

terraform {
  backend "pg" {
    schema_name = "terraform_remote_state_aws"
  }
}

I adopted the approach from Archlinux Infrastructure. The AK and SK for AWS/Alibaba Cloud/Tencent Cloud are encrypted with ansible-vault and stored in separate YAML files, like this: ArchLinux Terraform Vault

I also reuse their get_key.py script, which is invoked by Terraform’s external provider (and also used during Terraform initialization):

data "external" "vault_aws" {
  program = [
    "${path.module}/../misc/get_key.py", "${path.module}/../misc/vaults/vault_aws.yml",
    "aws_ak",
    "aws_sk",
    "--format", "json"
  ]
}

data "external" "vault_alicloud" {
  program = [
    "${path.module}/../misc/get_key.py", "${path.module}/../misc/vaults/vault_alicloud.yml",
    "alicloud_ak",
    "alicloud_sk",
    "--format", "json"
  ]
}

data "external" "vault_tencentcloud" {
  program = [
    "${path.module}/../misc/get_key.py", "${path.module}/../misc/vaults/vault_tencentcloud.yml",
    "tencentcloud_ak",
    "tencentcloud_sk",
    "--format", "json"
  ]
}

provider "aws" {
  region = "us-east-1"
  access_key = data.external.vault_aws.result.aws_ak
  secret_key = data.external.vault_aws.result.aws_sk
}

provider "alicloud" {
  region = "cn-hongkong"
  access_key = data.external.vault_alicloud.result.alicloud_ak
  secret_key = data.external.vault_alicloud.result.alicloud_sk
}

provider "tencentcloud" {
  region = "ap-guangzhou"
  secret_id = data.external.vault_tencentcloud.result.tencentcloud_ak
  secret_key = data.external.vault_tencentcloud.result.tencentcloud_sk
}

The actual execution flow is as follows:

  1. The Terraform CLI reads the .tfstate file in .terraform to obtain the remote state backend connection details (username, password, database name)
  2. The external provider invokes the script
  3. The script launches GnuPG, which interactively prompts for the GnuPG password to decrypt the ciphertext
  4. The decrypted ciphertext yields the Ansible Vault key
  5. Using the Ansible Vault key and the corresponding encrypted file, the cloud provider’s AK and SK are decrypted
  6. The AK and SK are printed to stdout, where the external provider captures them
  7. The external provider stores the AK and SK in the remote backend (PostgreSQL)

As a result, other developers only need to run the initialization command and ensure they can access the PostgreSQL database to view and manage all cloud resource information.

Common Commands

Terraform relies on information in the .terraform directory, so all commands must be executed from the directory containing .terraform:

  • List resources in the state backend: terraform state list
  • Show details of a specific resource in the state backend: terraform state show data.external.vault_alicloud
  • Import existing resources into the state backend: terraform import
  • Apply cloud resource changes (e.g., create or update DNS records): terraform apply