Terraform Complete Master Guide¶
Terraform is an Infrastructure as Code (IaC) tool that lets you define and provision infrastructure using configuration files.
Installation Guide¶
Install Terraform on Linux (Ubuntu/Debian)¶
# Install required packages
sudo apt update && sudo apt install -y wget unzip
# Download Terraform (check latest version at https://www.terraform.io/downloads)
wget https://releases.hashicorp.com/terraform/1.6.6/terraform_1.6.6_linux_amd64.zip
# Extract and install
unzip terraform_1.6.6_linux_amd64.zip
sudo mv terraform /usr/local/bin/
# Verify installation
terraform version
# Expected output: Terraform v1.6.6
Install Terraform on macOS¶
# Using Homebrew (recommended)
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
# Or manually with curl
curl -o terraform.zip https://releases.hashicorp.com/terraform/1.6.6/terraform_1.6.6_darwin_amd64.zip
unzip terraform.zip
sudo mv terraform /usr/local/bin/
# Verify installation
terraform version
Install Terraform on Windows¶
# Using Chocolatey
choco install terraform
# Or manually download from https://www.terraform.io/downloads
# Extract terraform.exe to C:\Windows\System32\
# Verify in PowerShell
terraform version
Verify Installation & Enable Auto-completion¶
# Check version
terraform version
# Enable bash completion
terraform -install-autocomplete
# Initialize working directory (creates .terraform folder)
terraform init
BEGINNER LEVEL: Your First Infrastructure¶
Scenario 1: Creating Your First Terraform File¶
Understanding the basic workflow: init → plan → apply
sequenceDiagram
participant User as Developer
participant Terminal as Command Line
participant TF as Terraform CLI
participant Config as main.tf File
participant State as Terraform State
participant File as Local File System
User->>Config: Create main.tf
Config->>Config: Define local_file resource
User->>Terminal: terraform init
Terminal->>TF: Execute init command
TF->>TF: Download local provider
TF->>State: Create .terraform directory
TF->>Terminal: Show "Terraform initialized"
User->>Terminal: terraform plan
Terminal->>TF: Execute plan command
TF->>Config: Read resource definition
TF->>State: Check current state
TF->>Terminal: Show "Will create 1 resource"
User->>Terminal: terraform apply
Terminal->>TF: Execute apply command
TF->>User: Prompt for approval
User->>TF: Type "yes"
TF->>File: Create hello.txt
TF->>State: Save resource state
TF->>Terminal: Show "Apply complete!"
Note over State: State tracks what Terraform created
Code:
# Create a file named main.tf
# This is your Terraform configuration file
# Provider tells Terraform what platform to use
provider "local" {
version = "~> 2.4"
}
# Resource defines infrastructure to create
resource "local_file" "welcome" {
# Content to write to file
content = "Hello, this is my first Terraform resource!"
# File path (creates in current directory)
filename = "${path.module}/hello.txt"
}
# Output shows information after creation
output "file_location" {
value = local_file.welcome.filename
}
output "file_size" {
value = filesize(local_file.welcome.filename)
}
Step-by-step execution:
# 1. Create a project directory
mkdir terraform-beginner
cd terraform-beginner
# 2. Create the main.tf file (paste the code above)
nano main.tf
# 3. Initialize Terraform (downloads provider plugins)
terraform init
# Expected output:
# Initializing the backend...
# Initializing provider plugins...
# - Finding hashicorp/local versions matching "~> 2.4"...
# - Installing hashicorp/local v2.4.0...
# Terraform has been successfully initialized!
# 4. See what Terraform will do (dry run)
terraform plan
# Expected output:
# Plan: 1 to add, 0 to change, 0 to destroy.
# 5. Create the resource (type "yes" when prompted)
terraform apply
# Expected output:
# Do you want to perform these actions?
# Terraform will perform the actions described above.
# Only 'yes' will be accepted to approve.
#
# Enter a value: yes
#
# local_file.welcome: Creating...
# local_file.welcome: Creation complete after 0s
# Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
#
# Outputs:
# file_location = "/home/user/terraform-beginner/hello.txt"
# 6. Verify the file was created
ls -la hello.txt
cat hello.txt
# Should show: "Hello, this is my first Terraform resource!"
# 7. See current state
terraform show
# Shows the state file contents
# 8. Destroy the resource (cleanup)
terraform destroy
# Expected output:
# Do you really want to destroy all resources? yes
# local_file.welcome: Destroying... [id=6c63204f5a9cfd5f0d8ec0d6bd5d6c82bc173290]
# local_file.welcome: Destruction complete after 0s
# Destroy complete! Resources: 1 destroyed.
Scenario 2: Understanding Variables & Outputs¶
Making configurations dynamic and reusable
sequenceDiagram
participant User as Developer
participant Var as variables.tf
participant Main as main.tf
participant TF as Terraform
participant State as State File
User->>Var: Define variable inputs
User->>Main: Reference variables
Main->>TF: Process variable values
User->>TF: terraform apply -var="name=prod"
TF->>State: Store variable values
TF->>User: Display output values
Note over Var: Variables make code reusable
Code:
# variables.tf - Define inputs
variable "filename" {
description = "Name of the file to create"
type = string
default = "message.txt"
}
variable "content" {
description = "Content to write in file"
type = string
default = "Default message"
}
variable "file_permissions" {
description = "Unix file permissions"
type = string
default = "0644"
}
# main.tf - Use variables
resource "local_file" "message" {
content = var.content
filename = "${path.module}/${var.filename}"
file_permission = var.file_permissions
}
# outputs.tf - Show results
output "file_details" {
description = "Details about created file"
value = {
name = var.filename
size_bytes = filesize(local_file.message.filename)
permissions = var.file_permissions
}
}
output "created_at" {
value = timestamp()
}
Execution with variables:
# Method 1: Use defaults
terraform apply -auto-approve
# Creates message.txt with "Default message"
# Method 2: Override with -var flags
terraform apply -var="filename=custom.txt" -var="content=Custom content!" -auto-approve
# Method 3: Use variables file
echo 'filename = "vars.txt"' > prod.tfvars
echo 'content = "Production values"' >> prod.tfvars
terraform apply -var-file="prod.tfvars" -auto-approve
# Method 4: Environment variables (TF_VAR_name)
export TF_VAR_filename="env.txt"
export TF_VAR_content="From environment"
terraform apply -auto-approve
# View outputs
terraform output
# Shows:
# file_details = {
# "name" = "env.txt"
# "permissions" = "0644"
# "size_bytes" = 19
# }
# created_at = "2024-11-30T10:00:00Z"
Scenario 3: Working with Lists and Maps¶
Managing multiple resources efficiently
sequenceDiagram
participant User as Developer
participant Var as Variables (list/map)
participant TF as Terraform Core
participant Resources as Multiple Resources
participant State as State Management
User->>Var: Define list of filenames
User->>Var: Define map of file contents
TF->>Resources: Create resources in loop
Resources->>State: Track each resource
TF->>User: Show count of resources created
Note over Resources: for_each creates many from one definition
Code:
# Create multiple files dynamically
variable "files_map" {
description = "Map of filenames to their contents"
type = map(string)
default = {
"readme.txt" = "# Project README\nThis is auto-generated"
"config.yaml" = "app:\n environment: production\n version: 1.0"
"license.txt" = "MIT License\nCopyright 2024"
"authors.md" = "Authors:\n- DevOps Team"
}
}
variable "permissions_map" {
description = "Map of file extensions to permissions"
type = map(string)
default = {
"txt" = "0644"
"md" = "0644"
"yaml" = "0600"
"sh" = "0755"
}
}
# Create multiple resources using for_each
resource "local_file" "project_files" {
# Loop through each entry in the map
for_each = var.files_map
# each.key is the filename, each.value is the content
filename = "${path.module}/${each.key}"
content = each.value
# Set permissions based on file extension
file_permission = lookup(
var.permissions_map,
split(".", each.key)[1],
"0644"
)
}
# Create a single script file
resource "local_file" "setup_script" {
filename = "${path.module}/setup.sh"
content = <<EOF
#!/bin/bash
# Auto-generated setup script
echo "Creating project structure..."
mkdir -p logs configs
touch logs/app.log
echo "Setup complete!"
EOF
file_permission = "0755"
}
# Count how many files we created
output "total_files" {
value = length(local_file.project_files)
}
# Show all created file paths
output "all_files" {
value = [for f in local_file.project_files : f.filename]
}
# Example of conditional resource
resource "local_file" "optional_file" {
count = var.create_debug_file ? 1 : 0
filename = "${path.module}/debug.log"
content = "Debug mode enabled at ${timestamp()}"
}
variable "create_debug_file" {
type = bool
default = false
}
Execution:
# Create all files
terraform apply -auto-approve
# Check created files
ls -la *.txt *.md *.yaml *.sh
# Should show:
# -rw-r--r-- readme.txt
# -rw------- config.yaml
# -rwxr-xr-x setup.sh
# etc.
# Try with debug file
terraform apply -var="create_debug_file=true" -auto-approve
ls debug.log
# Inspect state
terraform state list
# Shows:
# local_file.optional_file[0]
# local_file.project_files["authors.md"]
# local_file.project_files["config.yaml"]
# etc.
Scenario 4: Understanding State Management¶
How Terraform tracks your infrastructure
sequenceDiagram
participant Dev as Developer
participant TF as Terraform
participant State as terraform.tfstate
participant Lock as State Lock (.lock)
participant Backup as State Backup
Dev->>TF: terraform apply
TF->>Lock: Create lock file
Lock->>State: Prevent concurrent writes
TF->>State: Read current state
TF->>State: Compare with desired state
TF->>State: Update with new resources
State->>Backup: Create backup file
Lock->>Lock: Remove lock file
Note over State: State file is single source of truth
Code:
# main.tf
resource "local_file" "state_demo" {
content = "State management example"
filename = "${path.module}/demo.txt"
}
# Backend configuration for remote state (AWS S3)
terraform {
# This block configures where state is stored
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "beginner/state-demo.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
# Data source: Read existing file
data "local_file" "existing" {
filename = "${path.module}/demo.txt"
# This doesn't create resource, just reads it
depends_on = [local_file.state_demo]
}
# Output from data source
output "file_content" {
value = data.local_file.existing.content
sensitive = false
}
# Manage state with commands
# terraform state show local_file.state_demo
# terraform state list
# terraform state mv local_file.old local_file.new
# terraform state rm local_file.unwanted
State management commands:
# Initialize with backend
terraform init
# Show current state
terraform show
# Shows JSON representation of all resources
# List all resources in state
terraform state list
# Show details of specific resource
terraform state show local_file.state_demo
# Simulate a problem: Manually delete the file
rm demo.txt
# Terraform detects drift (difference between state and reality)
terraform plan
# Shows: Plan: 1 to add, 0 to change, 0 to destroy.
# It wants to recreate the deleted file
# Refresh state without changing anything
terraform refresh
# Remove resource from state (but don't destroy it)
terraform state rm local_file.state_demo
# Import existing resource into state
# First, create the file manually
echo "Import me" > demo.txt
# Then import it
terraform import local_file.state_demo demo.txt
# Move resource to new address
terraform state mv local_file.state_demo local_file.renamed_demo
# Backup state manually
cp terraform.tfstate terraform.tfstate.backup
# Restore from backup if corrupted
cp terraform.tfstate.backup terraform.tfstate
terraform refresh
INTERMEDIATE LEVEL: Real Cloud Infrastructure¶
Scenario 5: Deploying an AWS EC2 Web Server¶
Complete web server with security group and networking
sequenceDiagram
participant TF as Terraform
participant AWS as AWS API
participant SG as Security Group
participant Key as SSH Key Pair
participant EC2 as EC2 Instance
participant User as Developer
User->>TF: terraform apply
TF->>AWS: Create SSH key pair
AWS->>Key: Generate key material
TF->>AWS: Create security group
AWS->>SG: Allow HTTP (80) and SSH (22)
TF->>AWS: Launch EC2 instance
AWS->>EC2: Use Amazon Linux 2 AMI
EC2->>SG: Attach security group
EC2->>Key: Use SSH key
AWS->>TF: Return instance details
TF->>User: Show public IP & connection info
Note over EC2: Web server is live at http://<public-ip>
Code:
# provider.tf - Configure AWS provider
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Backend for state storage
backend "s3" {
bucket = "my-terraform-state"
key = "intermediate/web-server/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "aws" {
region = var.aws_region
# Use shared credentials file (~/.aws/credentials)
# Or specify access keys (not recommended for production)
# access_key = var.aws_access_key
# secret_key = var.aws_secret_key
}
# variables.tf
variable "aws_region" {
description = "AWS region to deploy resources"
type = string
default = "us-east-1"
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro" # Free tier eligible
}
variable "key_name" {
description = "Name of SSH key pair"
type = string
default = "web-server-key"
}
# data.tf - Get latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
}
# security.tf - Create security group
resource "aws_security_group" "web_server" {
name_prefix = "web-server-"
description = "Security group for web server"
# Allow HTTP from anywhere
ingress {
description = "HTTP from internet"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow HTTPS from anywhere
ingress {
description = "HTTPS from internet"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
# Allow SSH from your IP (restrict this!)
ingress {
description = "SSH from office"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["203.0.113.0/24"] # Replace with your IP
}
# Allow all outbound traffic
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "web-server-sg"
}
}
# key-pair.tf - Generate SSH key
resource "aws_key_pair" "web_server" {
key_name = var.key_name
public_key = tls_private_key.web_server.public_key_openssh
}
resource "tls_private_key" "web_server" {
algorithm = "RSA"
rsa_bits = 4096
}
# instance.tf - Create EC2 instance
resource "aws_instance" "web_server" {
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
key_name = aws_key_pair.web_server.key_name
vpc_security_group_ids = [aws_security_group.web_server.id]
# User data to install web server on first boot
user_data = <<-EOF
#!/bin/bash
yum update -y
amazon-linux-extras install -y nginx1
systemctl start nginx
systemctl enable nginx
echo "<h1>Hello from Terraform!</h1>" > /usr/share/nginx/html/index.html
EOF
# Prevent accidental termination
disable_api_termination = false
# Add tags for organization
tags = {
Name = "terraform-web-server"
Environment = "development"
ManagedBy = "terraform"
}
}
# outputs.tf - Display important information
output "instance_id" {
description = "ID of the EC2 instance"
value = aws_instance.web_server.id
}
output "public_ip" {
description = "Public IP address of the instance"
value = aws_instance.web_server.public_ip
}
output "public_dns" {
description = "Public DNS of the instance"
value = aws_instance.web_server.public_dns
}
output "ssh_command" {
description = "Command to SSH into the instance"
value = "ssh -i ${aws_key_pair.web_server.key_name}.pem ec2-user@${aws_instance.web_server.public_ip}"
}
# Save private key (sensitive!)
resource "local_file" "private_key" {
content = tls_private_key.web_server.private_key_pem
filename = "${path.module}/${aws_key_pair.web_server.key_name}.pem"
file_permission = "0600" # Only owner can read/write
sensitive = true
}
Deployment workflow:
# 1. Configure AWS credentials
aws configure
# Enter AWS Access Key ID
# Enter AWS Secret Access Key
# Enter region: us-east-1
# Enter output format: json
# 2. Create S3 bucket for state storage (do this once)
aws s3 mb s3://my-terraform-state --region us-east-1
aws s3api put-bucket-encryption \
--bucket my-terraform-state \
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
# Create DynamoDB table for locking
aws dynamodb create-table \
--table-name terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
# 3. Initialize Terraform
terraform init
# 4. Plan the deployment
terraform plan
# 5. Apply (creates real AWS resources costing money)
terraform apply
# 6. Access the web server
# Outputs will show:
# public_ip = 203.0.113.45
# ssh_command = ssh -i web-server-key.pem ec2-user@203.0.113.45
# Open in browser: http://203.0.113.45
# 7. SSH into instance
chmod 600 web-server-key.pem
ssh -i web-server-key.pem ec2-user@203.0.113.45
# Verify Nginx is running: systemctl status nginx
# 8. Check AWS Console to see created resources
# - EC2 instance running
# - Security group with rules
# - Key pair registered
# 9. Destroy everything when done
terraform destroy
# Confirms deletion of all resources
Scenario 6: Creating and Using Terraform Modules¶
Organizing code into reusable components
sequenceDiagram
participant Root as Root Module
participant Module as Module (./modules/web-app)
participant Resources as Module Resources
participant State as Terraform State
participant Registry as Terraform Registry
Root->>Module: Call with variables
Module->>Resources: Create SG, EC2, EBS
Resources->>State: Store all resources
Module->>Root: Return outputs (IP, DNS)
Root->>Registry: Can publish module
Registry->>Other: Reuse in other projects
Note over Module: Self-contained, reusable infrastructure
Code - Module Structure:
project/
├── main.tf
├── variables.tf
├── outputs.tf
└── modules/
└── web-app/
├── main.tf
├── variables.tf
├── outputs.tf
└── README.md
modules/web-app/main.tf
# This is a reusable module for deploying a web application
resource "aws_security_group" "app" {
name = "${var.app_name}-sg"
description = "Security group for ${var.app_name}"
vpc_id = var.vpc_id
# Dynamic ingress rules based on input
dynamic "ingress" {
for_each = var.allowed_ports
content {
from_port = ingress.value
to_port = ingress.value
protocol = "tcp"
cidr_blocks = var.allowed_cidrs
}
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.app_name}-sg"
}
}
resource "aws_instance" "app" {
count = var.instance_count
ami = var.ami_id
instance_type = var.instance_type
key_name = var.key_name
vpc_security_group_ids = [aws_security_group.app.id]
subnet_id = var.subnet_ids[count.index % length(var.subnet_ids)]
root_block_device {
volume_type = "gp3"
volume_size = var.root_volume_size
encrypted = true
}
ebs_block_device {
device_name = "/dev/sdf"
volume_type = "gp3"
volume_size = var.data_volume_size
encrypted = true
}
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
app_name = var.app_name
environment = var.environment
app_version = var.app_version
}))
tags = merge(
var.tags,
{
Name = "${var.app_name}-${count.index + 1}"
}
)
}
resource "aws_eip" "app" {
count = var.assign_eip ? var.instance_count : 0
instance = aws_instance.app[count.index].id
domain = "vpc"
tags = {
Name = "${var.app_name}-eip-${count.index + 1}"
}
}
data "template_file" "user_data" {
template = file("${path.module}/user_data.sh")
}
modules/web-app/variables.tf
variable "app_name" {
description = "Name of the application"
type = string
}
variable "environment" {
description = "Environment name"
type = string
default = "dev"
}
variable "vpc_id" {
description = "VPC ID"
type = string
}
variable "subnet_ids" {
description = "List of subnet IDs"
type = list(string)
}
variable "ami_id" {
description = "AMI ID for instances"
type = string
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
variable "instance_count" {
description = "Number of instances"
type = number
default = 2
}
variable "key_name" {
description = "SSH key pair name"
type = string
}
variable "allowed_ports" {
description = "List of allowed ports"
type = list(number)
default = [22, 80, 443]
}
variable "allowed_cidrs" {
description = "List of allowed CIDR blocks"
type = list(string)
default = ["0.0.0.0/0"]
}
variable "root_volume_size" {
description = "Root volume size in GB"
type = number
default = 20
}
variable "data_volume_size" {
description = "Data volume size in GB"
type = number
default = 50
}
variable "app_version" {
description = "Application version"
type = string
default = "latest"
}
variable "assign_eip" {
description = "Assign Elastic IPs"
type = bool
default = false
}
variable "tags" {
description = "Additional tags"
type = map(string)
default = {}
}
modules/web-app/outputs.tf
output "instance_ids" {
description = "IDs of EC2 instances"
value = aws_instance.app[*].id
}
output "public_ips" {
description = "Public IPs of instances"
value = aws_eip.app[*].public_ip
}
output "private_ips" {
description = "Private IPs of instances"
value = aws_instance.app[*].private_ip
}
output "security_group_id" {
description = "Security group ID"
value = aws_security_group.app.id
}
modules/web-app/user_data.sh
#!/bin/bash -xe
# User data script passed to EC2 instances
# Update system
yum update -y
# Install Docker
amazon-linux-extras install -y docker
systemctl start docker
systemctl enable docker
# Install app
mkdir -p /opt/${app_name}
cd /opt/${app_name}
# Pull application
docker pull myorg/${app_name}:${app_version}
# Run container
docker run -d \
--name ${app_name} \
-p 80:8080 \
-e ENVIRONMENT=${environment} \
-e VERSION=${app_version} \
myorg/${app_name}:${app_version}
# Setup CloudWatch logging
yum install -y awslogs
systemctl start awslogsd
systemctl enable awslogsd
Root main.tf (using the module)
# main.tf - Root module that calls our reusable web-app module
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state"
key = "intermediate/web-app-module/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
provider "aws" {
region = var.region
}
# Get VPC and subnets data
data "aws_vpc" "default" {
default = true
}
data "aws_subnets" "default" {
filter {
name = "vpc-id"
values = [data.aws_vpc.default.id]
}
}
# Generate SSH key for this deployment
resource "tls_private_key" "app" {
algorithm = "RSA"
rsa_bits = 4096
}
resource "aws_key_pair" "app" {
key_name = "${var.project}-${var.environment}-key"
public_key = tls_private_key.app.public_key_openssh
}
resource "local_file" "private_key" {
content = tls_private_key.app.private_key_pem
filename = "${path.module}/${aws_key_pair.app.key_name}.pem"
file_permission = "0600"
sensitive = true
}
# Deploy web app module for production
module "production_web_app" {
source = "./modules/web-app"
app_name = "my-web-app"
environment = "production"
vpc_id = data.aws_vpc.default.id
subnet_ids = data.aws_subnets.default.ids
ami_id = data.aws_ami.amazon_linux.id
instance_type = "t3.medium"
instance_count = 3
key_name = aws_key_pair.app.key_name
allowed_ports = [22, 80, 443, 8080]
allowed_cidrs = ["203.0.113.0/24"] # Your office IP
root_volume_size = 30
data_volume_size = 100
app_version = "2.1.0"
assign_eip = true
tags = {
Project = var.project
CostCenter = var.cost_center
ManagedBy = "terraform"
}
}
# Deploy dev version (smaller, cheaper)
module "development_web_app" {
source = "./modules/web-app"
app_name = "my-web-app"
environment = "development"
vpc_id = data.aws_vpc.default.id
subnet_ids = data.aws_subnets.default.ids
ami_id = data.aws_ami.amazon_linux.id
instance_count = 1
key_name = aws_key_pair.app.key_name
allowed_cidrs = ["0.0.0.0/0"] # Open for dev
app_version = var.app_version
assign_eip = false
tags = {
Environment = "dev"
CostCenter = var.cost_center
}
}
# Data source for AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
Root variables.tf
variable "region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "project" {
description = "Project name"
type = string
default = "web-app-project"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "cost_center" {
description = "Cost center for tracking"
type = string
default = "engineering"
}
variable "app_version" {
description = "Application version"
type = string
default = "latest"
}
Execution:
# 1. Initialize (downloads module dependencies)
terraform init
# 2. Plan both environments
terraform plan
# 3. Apply (creates 4 instances: 3 prod, 1 dev)
terraform apply
# 4. Access outputs
terraform output production_web_app_public_ips
# ["203.0.113.45", "203.0.113.46", "203.0.113.47"]
terraform output development_web_app_private_ips
# 5. Test production load balancer
# Install a load balancer in front:
# - Use AWS ALB target group with these instances
# - Or use module output to configure external LB
# 6. Update module (change instance type)
# Edit module call, then:
terraform plan -target="module.production_web_app"
# 7. Destroy dev environment only
terraform destroy -target="module.development_web_app"
# 8. Publish module to registry
# Tag version:
cd modules/web-app
git tag v1.0.0
git push origin v1.0.0
# Use from registry:
module "web_app" {
source = "app.terraform.io/myorg/web-app/aws"
version = "1.0.0"
# ... configuration
}
Scenario 7: Remote State & Data Sources¶
Sharing state between Terraform configurations
sequenceDiagram
participant Network as Network Team
participant App as App Team
participant State as Remote State (S3)
participant Data as Data Sources
participant AWS as AWS Resources
Network->>AWS: Create VPC, subnets
Network->>State: Store network state
App->>Data: Read network state
Data->>State: Fetch VPC/subnet IDs
App->>AWS: Deploy app in existing network
Note over State: Single source of truth across teams
Code - Network Team (creates shared infrastructure):
# network-team/main.tf
terraform {
backend "s3" {
bucket = "shared-terraform-state"
key = "network/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
}
}
resource "aws_vpc" "shared" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "shared-vpc"
ManagedBy = "network-team"
}
}
resource "aws_subnet" "public" {
count = 3
vpc_id = aws_vpc.shared.id
cidr_block = "10.0.${count.index + 1}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "public-subnet-${count.index + 1}"
Tier = "public"
}
}
resource "aws_subnet" "private" {
count = 3
vpc_id = aws_vpc.shared.id
cidr_block = "10.0.${count.index + 11}.0/24"
availability_zone = data.aws_availability_zones.available.names[count.index]
tags = {
Name = "private-subnet-${count.index + 1}"
Tier = "private"
}
}
resource "aws_internet_gateway" "shared" {
vpc_id = aws_vpc.shared.id
tags = {
Name = "shared-igw"
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.shared.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.shared.id
}
tags = {
Name = "public-route-table"
}
}
resource "aws_route_table_association" "public" {
count = 3
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Outputs that other teams will consume
output "vpc_id" {
description = "Shared VPC ID"
value = aws_vpc.shared.id
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
output "vpc_cidr" {
description = "VPC CIDR block"
value = aws_vpc.shared.cidr_block
}
Code - Application Team (consumes shared network):
# app-team/main.tf
terraform {
backend "s3" {
bucket = "shared-terraform-state"
key = "apps/webapp/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
}
}
# Data source to read network team's state
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "shared-terraform-state"
key = "network/terraform.tfstate"
region = "us-east-1"
}
}
# Use data sources to query AWS
data "aws_security_groups" "default" {
filter {
name = "group-name"
values = ["default"]
}
filter {
name = "vpc-id"
values = [data.terraform_remote_state.network.outputs.vpc_id]
}
}
# Deploy application in existing network
module "web_app" {
source = "./modules/web-app"
app_name = "customer-portal"
environment = "production"
# Use shared network resources
vpc_id = data.terraform_remote_state.network.outputs.vpc_id
subnet_ids = data.terraform_remote_state.network.outputs.public_subnet_ids
# Other configuration...
instance_count = 3
key_name = aws_key_pair.app.key_name
}
# Create security group referencing shared VPC
resource "aws_security_group" "app" {
name_prefix = "customer-portal-"
vpc_id = data.terraform_remote_state.network.outputs.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "customer-portal-sg"
}
}
# Create database in private subnets
resource "aws_db_subnet_group" "app" {
name = "customer-portal-db-subnet-group"
subnet_ids = data.terraform_remote_state.network.outputs.private_subnet_ids
tags = {
Name = "Customer Portal DB Subnet Group"
}
}
resource "aws_db_instance" "app" {
identifier = "customer-portal-prod"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.medium"
allocated_storage = 100
db_subnet_group_name = aws_db_subnet_group.app.name
vpc_security_group_ids = [aws_security_group.app.id]
# Database credentials (use secrets in production!)
db_name = "customerportal"
username = "dbadmin"
password = var.db_password
backup_retention_period = 7
backup_window = "03:00-04:00"
tags = {
Name = "customer-portal-db"
Environment = "production"
}
}
Scenario 8: Terraform Workspaces for Environments¶
Managing dev, staging, production with same configuration
sequenceDiagram
participant Dev as Developer
participant TF as Terraform Workspaces
participant StateDev as State: dev
participant StateStage as State: staging
participant StateProd as State: prod
participant AWS as AWS Resources
Dev->>TF: workspace new dev
TF->>StateDev: Create dev.tfstate
Dev->>TF: workspace new staging
TF->>StateStage: Create staging.tfstate
Dev->>TF: workspace new prod
TF->>StateProd: Create prod.tfstate
Dev->>TF: workspace select dev
TF->>StateDev: Switch context
Dev->>AWS: Deploy dev resources
Dev->>TF: workspace select prod
TF->>StateProd: Switch context
Dev->>AWS: Deploy prod resources
Note over TF: Same code, isolated states
Code:
# main.tf
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "workspaces/app/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
# Workspace key prefix enables multiple states
workspace_key_prefix = "workspaces"
}
}
provider "aws" {
region = var.aws_region
}
# Tagging strategy based on workspace
locals {
environment = terraform.workspace
common_tags = {
Environment = local.environment
ManagedBy = "terraform"
Project = var.project_name
}
}
# Choose instance size based on environment
locals {
instance_config = {
dev = {
type = "t3.micro"
count = 1
}
staging = {
type = "t3.small"
count = 2
}
prod = {
type = "t3.medium"
count = 3
}
}
current_config = lookup(local.instance_config, local.environment, local.instance_config["dev"])
}
resource "aws_instance" "web" {
count = local.current_config.count
ami = data.aws_ami.amazon_linux.id
instance_type = local.current_config.type
tags = merge(local.common_tags, {
Name = "${var.project_name}-${local.environment}-${count.index + 1}"
})
}
# Different CIDR blocks per environment
variable "vpc_cidrs" {
type = map(string)
default = {
dev = "10.0.0.0/16"
staging = "10.1.0.0/16"
prod = "10.2.0.0/16"
}
}
resource "aws_vpc" "main" {
cidr_block = lookup(var.vpc_cidrs, local.environment, "10.0.0.0/16")
tags = merge(local.common_tags, {
Name = "${var.project_name}-vpc"
})
}
# Environment-specific cost allocations
variable "cost_centers" {
type = map(string)
default = {
dev = "dev-team"
staging = "qa-team"
prod = "production"
}
}
resource "aws_tag" "cost_allocation" {
resource_id = aws_vpc.main.id
key = "CostCenter"
value = lookup(var.cost_centers, local.environment, "unknown")
}
# Data source to get current workspace info
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "my-terraform-state"
key = "workspaces/app/terraform.tfstate"
region = "us-east-1"
}
workspace = local.environment
}
Workspace commands:
# List available workspaces
terraform workspace list
# * default
# Create new workspace
terraform workspace new dev
# Created and switched to workspace "dev"!
# Create staging workspace
terraform workspace new staging
# Create production workspace
terraform workspace new prod
# List again
terraform workspace list
# default
# * dev
# prod
# staging
# Switch workspace
terraform workspace select prod
# Switched to workspace "prod"!
# Show current workspace
terraform workspace show
# prod
# Plan for dev
terraform workspace select dev
terraform plan -var="project_name=myapp"
# Plan for prod (different resources)
terraform workspace select prod
terraform plan -var="project_name=myapp"
# State files in S3 will be:
# s3://my-terraform-state/workspaces/dev/app/terraform.tfstate
# s3://my-terraform-state/workspaces/prod/app/terraform.tfstate
# Delete a workspace (must be empty first)
terraform workspace select default
terraform workspace delete dev
# Use in automation
if [ "${CI_ENVIRONMENT_NAME}" == "production" ]; then
terraform workspace select prod
terraform apply -auto-approve
else
terraform workspace select dev
terraform apply -auto-approve
fi
ADVANCED LEVEL: Production-Ready Patterns¶
Scenario 9: Multi-Tier Architecture with Load Balancer¶
Complete production stack: ALB, ASG, RDS, ElastiCache
sequenceDiagram
participant Client as User
participant ALB as Application LB
participant ASG as Auto Scaling Group
participant EC2 as EC2 Instances
participant RDS as RDS Database
participant Cache as ElastiCache
participant S3 as S3 Bucket
participant TF as Terraform
Client->>ALB: HTTPS request
ALB->>ASG: Distribute traffic
ASG->>EC2: Launch 2-10 instances
EC2->>RDS: Query database
EC2->>Cache: Cache session data
EC2->>S3: Store uploads
TF->>ALB: Configure listener rules
TF->>ASG: Set scaling policies
TF->>RDS: Create Postgres cluster
TF->>Cache: Create Redis cluster
Note over ASG: Health checks & auto-healing
Code:
# main.tf - Complete multi-tier architecture
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "prod-terraform-state"
key = "advanced/multi-tier/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "prod-terraform-locks"
encrypt = true
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = var.project_name
Environment = var.environment
ManagedBy = "terraform"
}
}
}
# Variables
variable "project_name" {
type = string
default = "multi-tier-app"
}
variable "environment" {
type = string
default = "production"
}
variable "aws_region" {
type = string
default = "us-east-1"
}
variable "app_version" {
type = string
default = "v2.5.0"
}
# Data sources
data "aws_availability_zones" "available" {
state = "available"
}
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
data "aws_elb_service_account" "main" {}
# VPC and networking
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0"
name = "${var.project_name}-vpc"
cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
public_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
private_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
database_subnets = ["10.0.201.0/24", "10.0.202.0/24"]
enable_nat_gateway = true
single_nat_gateway = false
enable_vpn_gateway = true
tags = {
"kubernetes.io/cluster/${var.project_name}-eks" = "shared"
}
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
}
# S3 bucket for assets
resource "aws_s3_bucket" "assets" {
bucket = "${var.project_name}-assets-${data.aws_caller_identity.current.account_id}"
}
resource "aws_s3_bucket_versioning" "assets" {
bucket = aws_s3_bucket.assets.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "assets" {
bucket = aws_s3_bucket.assets.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# S3 bucket for logs
resource "aws_s3_bucket" "logs" {
bucket = "${var.project_name}-logs-${data.aws_caller_identity.current.account_id}"
}
resource "aws_s3_bucket_policy" "logs" {
bucket = aws_s3_bucket.logs.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "ELBWriteAccess"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${data.aws_elb_service_account.main.arn}:root"
}
Action = "s3:PutObject"
Resource = "${aws_s3_bucket.logs.arn}/logs/alb/*"
},
]
})
}
# Application Load Balancer
resource "aws_lb" "main" {
name = "${var.project_name}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = module.vpc.public_subnets
enable_deletion_protection = false
access_logs {
bucket = aws_s3_bucket.logs.bucket
prefix = "logs/alb"
enabled = true
}
tags = {
Name = "${var.project_name}-alb"
}
}
resource "aws_lb_target_group" "app" {
name = "${var.project_name}-tg"
port = 80
protocol = "HTTP"
vpc_id = module.vpc.vpc_id
target_type = "instance"
health_check {
enabled = true
healthy_threshold = 3
unhealthy_threshold = 3
timeout = 5
interval = 30
path = "/health"
matcher = "200-299"
}
stickiness {
type = "lb_cookie"
cookie_duration = 86400
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2016-08"
certificate_arn = aws_acm_certificate.main.arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.app.arn
}
}
resource "aws_lb_listener" "http_redirect" {
load_balancer_arn = aws_lb.main.arn
port = "80"
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
# ACM Certificate
resource "aws_acm_certificate" "main" {
domain_name = "*.example.com"
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
}
resource "aws_route53_record" "cert_validation" {
for_each = {
for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = data.aws_route53_zone.main.zone_id
}
resource "aws_acm_certificate_validation" "main" {
certificate_arn = aws_acm_certificate.main.arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
# Launch Template for Auto Scaling
resource "aws_launch_template" "app" {
name_prefix = "${var.project_name}-"
image_id = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
key_name = aws_key_pair.app.key_name
network_interfaces {
security_groups = [aws_security_group.app.id]
subnet_id = module.vpc.public_subnets[0]
}
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_size = 20
volume_type = "gp3"
encrypted = true
delete_on_termination = true
}
}
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
app_version = var.app_version
log_group = aws_cloudwatch_log_group.app.name
region = var.aws_region
}))
tag_specifications {
resource_type = "instance"
tags = {
Name = var.project_name
}
}
lifecycle {
create_before_destroy = true
}
}
# Auto Scaling Group
resource "aws_autoscaling_group" "app" {
name = "${var.project_name}-asg"
vpc_zone_identifier = module.vpc.public_subnets
target_group_arns = [aws_lb_target_group.app.arn]
health_check_type = "ELB"
min_size = 2
max_size = 10
desired_capacity = 3
launch_template {
id = aws_launch_template.app.id
version = "$Latest"
}
termination_policies = ["OldestLaunchTemplate", "ClosestToNextInstanceHour"]
tag {
key = "Name"
value = var.project_name
propagate_at_launch = true
}
lifecycle {
create_before_destroy = true
}
}
# Auto Scaling Policies
resource "aws_autoscaling_policy" "cpu_scale_up" {
name = "${var.project_name}-cpu-scale-up"
autoscaling_group_name = aws_autoscaling_group.app.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 60.0
}
}
resource "aws_autoscaling_policy" "cpu_scale_down" {
name = "${var.project_name}-cpu-scale-down"
autoscaling_group_name = aws_autoscaling_group.app.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 30.0
}
}
# RDS PostgreSQL Database
resource "aws_db_subnet_group" "app" {
name = "${var.project_name}-db-subnet-group"
subnet_ids = module.vpc.database_subnets
tags = {
Name = "${var.project_name}-db-subnet-group"
}
}
resource "aws_db_parameter_group" "app" {
name = "${var.project_name}-db-params"
family = "postgres15"
parameter {
name = "log_connections"
value = "1"
}
parameter {
name = "log_disconnections"
value = "1"
}
parameter {
name = "log_duration"
value = "1"
}
}
resource "aws_db_instance" "app" {
identifier = "${var.project_name}-db"
engine = "postgres"
engine_version = "15.3"
instance_class = "db.t3.medium"
allocated_storage = 100
storage_type = "gp3"
storage_encrypted = true
db_name = var.db_name
username = var.db_username
password = var.db_password
db_subnet_group_name = aws_db_subnet_group.app.name
vpc_security_group_ids = [aws_security_group.rds.id]
parameter_group_name = aws_db_parameter_group.app.name
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
skip_final_snapshot = false
final_snapshot_identifier = "${var.project_name}-db-final-${formatdate("YYYYMMDDhhmmss", timestamp())}"
tags = {
Name = "${var.project_name}-db"
}
}
# ElastiCache Redis
resource "aws_elasticache_subnet_group" "app" {
name = "${var.project_name}-cache-subnet-group"
subnet_ids = module.vpc.private_subnets
}
resource "aws_elasticache_cluster" "app" {
cluster_id = "${var.project_name}-cache"
engine = "redis"
node_type = "cache.t3.medium"
num_cache_nodes = 1
parameter_group_name = "default.redis7"
engine_version = "7.0"
port = 6379
subnet_group_name = aws_elasticache_subnet_group.app.name
security_group_ids = [aws_security_group.cache.id]
tags = {
Name = "${var.project_name}-cache"
}
}
# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "app" {
name = "/${var.project_name}/app"
retention_in_days = 30
tags = {
Name = "${var.project_name}-logs"
}
}
# IAM Role for instances
resource "aws_iam_role" "app" {
name = "${var.project_name}-instance-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy" "app" {
name = "${var.project_name}-instance-policy"
role = aws_iam_role.app.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup"
]
Resource = "arn:aws:logs:*:*:*"
},
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject"
]
Resource = [
"${aws_s3_bucket.assets.arn}/*",
"${aws_s3_bucket.logs.arn}/*"
]
}
]
})
}
resource "aws_iam_instance_profile" "app" {
name = "${var.project_name}-instance-profile"
role = aws_iam_role.app.name
}
# Security Groups
resource "aws_security_group" "alb" {
name = "${var.project_name}-alb-sg"
description = "ALB Security Group"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "app" {
name = "${var.project_name}-app-sg"
description = "Application Security Group"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "rds" {
name = "${var.project_name}-rds-sg"
description = "RDS Security Group"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "aws_security_group" "cache" {
name = "${var.project_name}-cache-sg"
description = "ElastiCache Security Group"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [aws_security_group.app.id]
}
}
# TLS Private Key
resource "tls_private_key" "app" {
algorithm = "RSA"
rsa_bits = 4096
}
resource "aws_key_pair" "app" {
key_name = "${var.project_name}-${var.environment}-key"
public_key = tls_private_key.app.public_key_openssh
}
resource "local_file" "private_key" {
content = tls_private_key.app.private_key_pem
filename = "${path.module}/${aws_key_pair.app.key_name}.pem"
file_permission = "0600"
}
# Route53 Zone (assumes zone exists)
data "aws_route53_zone" "main" {
name = var.domain_name
private_zone = false
}
# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "cpu_high" {
alarm_name = "${var.project_name}-cpu-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "300"
statistic = "Average"
threshold = "80"
alarm_description = "CPU utilization is high"
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.app.name
}
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_sns_topic" "alerts" {
name = "${var.project_name}-alerts"
}
# Variables
variable "domain_name" {
type = string
default = "example.com"
}
variable "instance_type" {
type = string
default = "t3.micro"
}
variable "db_name" {
type = string
default = "myappdb"
}
variable "db_username" {
type = string
sensitive = true
}
variable "db_password" {
type = string
sensitive = true
}
user_data.sh
#!/bin/bash -xe
# Install CloudWatch agent
yum install -y awslogs
# Configure CloudWatch
cat > /etc/awslogs/awslogs.conf <<EOF
[general]
state_file = /var/lib/awslogs/agent-state
[/var/log/messages]
file = /var/log/messages
log_group_name = ${log_group}
log_stream_name = {instance_id}/var/log/messages
[/var/log/docker]
file = /var/log/docker
log_group_name = ${log_group}
log_stream_name = {instance_id}/var/log/docker
EOF
systemctl start awslogsd
systemctl enable awslogsd
# Start application
docker run -d \
--name app \
-p 80:8080 \
-e DB_HOST=${db_host} \
-e DB_USER=${db_user} \
-e DB_PASS=${db_pass} \
-e REDIS_HOST=${redis_host} \
-e VERSION=${app_version} \
myorg/app:${app_version}
Execution:
# Initialize with module dependencies
terraform init
# Set database credentials
export TF_VAR_db_username="admin"
export TF_VAR_db_password="VeryStrongPassword123!"
# Plan infrastructure (review costs)
terraform plan
# Apply with approval
terraform apply
# After completion, test:
# 1. Access ALB DNS name: https://multi-tier-app-alb-123456789.us-east-1.elb.amazonaws.com
# 2. Check health status: /health
# 3. SSH to instance: ssh -i multi-tier-app-production-key.pem ec2-user@<instance-ip>
# 4. Check logs: docker logs app
# 5. Database connectivity: psql -h <db-endpoint>
# Simulate high CPU to test autoscaling
# stress-ng --cpu 4 --timeout 600
# Monitor in AWS Console:
# - CloudWatch metrics
# - ALB target health
# - RDS performance
# - ElastiCache metrics
# Costs: ~$200/month (t3.medium DB, t3.micro instances, ALB)
# Destroy when done
terraform destroy
Scenario 10: Custom Provider Development¶
Extending Terraform with a custom provider
sequenceDiagram
participant Dev as Provider Developer
participant SDK as Terraform Plugin SDK
participant API as Custom API
participant Schema as Provider Schema
participant Resource as Resource Implementation
participant TF as Terraform CLI
participant User as End User
Dev->>SDK: Implement Provider interface
Dev->>SDK: Define resource schema
SDK->>Schema: Generate CRUD callbacks
User->>TF: terraform init
TF->>Dev: Download custom provider
User->>TF: terraform apply
TF->>Resource: Call Create()
Resource->>API: POST /api/resource
API->>Resource: Return resource data
Resource->>TF: Set resource ID
TF->>User: Resource created successfully
Note over Dev: Go programming required
Code - Custom Provider Skeleton:
// main.go - Entry point for custom provider
package main
import (
"context"
"fmt"
"log"
"github.com/hashicorp/terraform-plugin-sdk/v2/diag"
"github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema"
"github.com/hashicorp/terraform-plugin-sdk/v2/plugin"
)
func main() {
plugin.Serve(&plugin.ServeOpts{
ProviderFunc: func() *schema.Provider {
return &schema.Provider{
Schema: map[string]*schema.Schema{
"api_key": {
Type: schema.TypeString,
Required: true,
DefaultFunc: schema.EnvDefaultFunc("CUSTOM_API_KEY", nil),
Description: "API key for the custom service",
},
"base_url": {
Type: schema.TypeString,
Optional: true,
Default: "https://api.custom-service.com",
Description: "Base URL for the API",
},
},
ResourcesMap: map[string]*schema.Resource{
"custom_server": resourceCustomServer(),
"custom_database": resourceCustomDatabase(),
},
DataSourcesMap: map[string]*schema.Resource{
"custom_region": dataSourceCustomRegion(),
},
ConfigureContextFunc: providerConfigure,
}
},
})
}
func providerConfigure(ctx context.Context, d *schema.ResourceData) (interface{}, diag.Diagnostics) {
apiKey := d.Get("api_key").(string)
baseURL := d.Get("base_url").(string)
// Initialize API client
client := NewClient(apiKey, baseURL)
return client, nil
}
// resource_custom_server.go
func resourceCustomServer() *schema.Resource {
return &schema.Resource{
CreateContext: rw6jYohNVBeVZFxAjEUHqSbjor6i8pNm4h,
ReadContext: resourceCustomServerRead,
UpdateContext: rw6jYohNVBeVZFxAjEUHqSbjor6i8pNm4h,
DeleteContext: resourceCustomServerDelete,
Schema: map[string]*schema.Schema{
"name": {
Type: schema.TypeString,
Required: true,
ForceNew: true,
},
"region": {
Type: schema.TypeString,
Required: true,
},
"size": {
Type: schema.TypeString,
Optional: true,
Default: "small",
},
"status": {
Type: schema.TypeString,
Computed: true,
},
"ip_address": {
Type: schema.TypeString,
Computed: true,
},
"metadata": {
Type: schema.TypeMap,
Optional: true,
Elem: &schema.Schema{Type: schema.TypeString},
},
},
}
}
func rw6jYohNVBeVZFxAjEUHqSbjor6i8pNm4h(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
client := meta.(*Client)
name := d.Get("name").(string)
// Create API request
server := &Server{
Name: name,
Region: d.Get("region").(string),
Size: d.Get("size").(string),
Metadata: expandMap(d.Get("metadata")),
}
// Call API
created, err := client.CreateServer(ctx, server)
if err != nil {
return diag.FromErr(fmt.Errorf("error creating server %s: %w", name, err))
}
// Set resource ID
d.SetId(created.ID)
// Wait for server to be ready
if err := waitForServerReady(ctx, client, created.ID); err != nil {
return diag.FromErr(err)
}
return resourceCustomServerRead(ctx, d, meta)
}
func resourceCustomServerRead(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
client := meta.(*Client)
// Get server from API
server, err := client.GetServer(ctx, d.Id())
if err != nil {
if isNotFoundError(err) {
d.SetId("")
return nil
}
return diag.FromErr(err)
}
// Update state
d.Set("name", server.Name)
d.Set("region", server.Region)
d.Set("size", server.Size)
d.Set("status", server.Status)
d.Set("ip_address", server.IPAddress)
d.Set("metadata", server.Metadata)
return nil
}
// client.go - API client implementation
package main
import (
"context"
"encoding/json"
"fmt"
"net/http"
)
type Client struct {
apiKey string
baseURL string
httpClient *http.Client
}
type Server struct {
ID string `json:"id"`
Name string `json:"name"`
Region string `json:"region"`
Size string `json:"size"`
Status string `json:"status"`
IPAddress string `json:"ip_address"`
Metadata map[string]string `json:"metadata"`
}
func NewClient(apiKey, baseURL string) *Client {
return &Client{
apiKey: apiKey,
baseURL: baseURL,
httpClient: &http.Client{},
}
}
func (c *Client) CreateServer(ctx context.Context, server *Server) (*Server, error) {
body, err := json.Marshal(server)
if err != nil {
return nil, err
}
req, err := http.NewRequestWithContext(ctx, "POST",
fmt.Sprintf("%s/api/v1/servers", c.baseURL), bytes.NewReader(body))
if err != nil {
return nil, err
}
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", c.apiKey))
req.Header.Set("Content-Type", "application/json")
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusCreated {
return nil, fmt.Errorf("unexpected status code: %d", resp.StatusCode)
}
var created Server
if err := json.NewDecoder(resp.Body).Decode(&created); err != nil {
return nil, err
}
return &created, nil
}
// utils.go - Helper functions
func expandMap(v interface{}) map[string]string {
if v == nil {
return nil
}
result := make(map[string]string)
for key, value := range v.(map[string]interface{}) {
result[key] = value.(string)
}
return result
}
func waitForServerReady(ctx context.Context, client *Client, serverID string) error {
for {
server, err := client.GetServer(ctx, serverID)
if err != nil {
return err
}
if server.Status == "running" {
return nil
}
select {
case <-ctx.Done():
return ctx.Err()
case <-time.After(10 * time.Second):
// Continue polling
}
}
}
Build and Install Custom Provider:
# 1. Build the provider
cd terraform-provider-custom
go mod init terraform-provider-custom
go mod tidy
go build -o terraform-provider-custom
# 2. Install in Terraform plugins directory
mkdir -p ~/.terraform.d/plugins/registry.terraform.io/myorg/custom/1.0.0/linux_amd64/
mv terraform-provider-custom ~/.terraform.d/plugins/registry.terraform.io/myorg/custom/1.0.0/linux_amd64/
# 3. Use in Terraform configuration
cat > main.tf <<'EOF'
terraform {
required_providers {
custom = {
source = "myorg/custom"
version = "1.0.0"
}
}
}
provider "custom" {
api_key = var.custom_api_key
}
resource "custom_server" "web" {
name = "web-server-1"
region = "us-east-1"
size = "medium"
metadata = {
owner = "devops-team"
cost-center = "engineering"
}
}
output "server_ip" {
value = custom_server.web.ip_address
}
EOF
# 4. Initialize and use
terraform init
terraform apply
Scenario 11: Terraform Cloud/Enterprise Workflows¶
Collaborative infrastructure with VCS integration
sequenceDiagram
participant Dev as Developer
participant Git as GitHub
participant TFC as Terraform Cloud
participant Agent as TFC Agent
participant AWS as AWS
Dev->>Git: Push to feature branch
Git->>TFC: Webhook triggers run
TFC->>Agent: Queue plan
Agent->>Git: Fetch configuration
Agent->>AWS: terraform plan
AWS->>Agent: Return plan details
Agent->>TFC: Post plan results
TFC->>Dev: Show plan in UI
Dev->>TFC: Approve plan
TFC->>Agent: Queue apply
Agent->>AWS: terraform apply
AWS->>Agent: Provision resources
Agent->>TFC: Report completion
TFC->>Git: Update commit status
Note over TFC: Remote execution, audit logs
Code - Terraform Cloud Configuration:
# terraform.tf - Configure Terraform Cloud backend
terraform {
cloud {
organization = "my-org"
workspaces {
name = "app-prod"
# Or use tags: tags = ["production", "aws"]
}
# Optional: Specify agent pool
# agent_pool_id = "pool-12345"
}
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Configure provider with variables from Terraform Cloud
provider "aws" {
region = var.aws_region
access_key = var.aws_access_key
secret_key = var.aws_secret_key
default_tags {
tags = {
TerraformCloudWorkspace = terraform.workspace
CostCenter = var.cost_center
}
}
}
# Sentinel policies (Terraform Cloud)
# policies/sentinel/enforce-mandatory-labels.sentinel
import "tfplan/v2" as tfplan
mandatory_tags = ["Environment", "CostCenter", "ManagedBy"]
for tfplan.resources as _, instances {
for instances as index, r {
if r.mode == "managed" and r.type == "aws_instance" {
# Check if tags are defined
if not r.values.tags else {
# Check for mandatory tags
for mandatory_tags as tag {
if r.values.tags[tag] else {
print("Resource", r.address, "is missing mandatory tag:", tag)
return false
}
}
}
}
}
}
main = rule { true }
# cost-policy.sentinel
import "tfplan/v2" as tfplan
import "decimal"
total_monthly_cost = decimal.new(0)
for tfplan.resources as _, instances {
for instances as _, r {
if r.mode == "managed" and r.cost_estimate else {
total_monthly_cost += r.cost_estimate.proposed_monthly_cost
}
}
}
max_budget = decimal.new(1000.00) # $1000/month
main = rule {
total_monthly_cost <= max_budget
}
Workspace Configuration (.terraformrc):
# ~/.terraformrc
credentials "app.terraform.io" {
token = "YOUR_TERRAFORM_CLOUD_TOKEN"
}
# Agent configuration (for private infrastructure)
# ~/.tfc-agent.toml
id = "agent-01"
token = "AGENT_TOKEN_FROM_TFC"
# Enable auto-update
auto_update = true
# Log level
log_level = "info"
# Concurrency
concurrency = 5
GitHub Actions Integration:
# .github/workflows/terraform.yml
name: Terraform Cloud Run
on:
push:
branches:
- main
pull_request:
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.6.6
cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }}
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
run: terraform init
- name: Terraform Validate
run: terraform validate
- name: Create Plan Run
if: github.event_name == 'pull_request'
run: |
terraform plan -no-color -out=tfplan \
-var="db_password=${{ secrets.DB_PASSWORD }}"
- name: Apply on Merge
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve \
-var="db_password=${{ secrets.DB_PASSWORD }}"
- name: Comment PR
uses: actions/github-script@v6
if: github.event_name == 'pull_request'
with:
script: |
const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
#### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
#### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
#### Terraform Plan 📖\`${{ steps.plan.outcome }}\`
<details><summary>Show Plan</summary>
\`\`\`terraform\n${{ steps.plan.outputs.stdout }}\n\`\`\`
</details>
*Pushed by: @${{ github.actor }}, Action: \`${{ github.event_name }}\`*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
Terraform Cloud API Usage:
# Trigger run via API
curl \
--header "Authorization: Bearer $TF_API_TOKEN" \
--header "Content-Type: application/vnd.api+json" \
--request POST \
--data @- \
https://app.terraform.io/api/v2/workspaces/ws-123456/runs <<'EOF'
{
"data": {
"type": "runs",
"attributes": {
"message": "Triggered via API",
"auto-apply": false
}
}
}
EOF
# Get run status
curl \
--header "Authorization: Bearer $TF_API_TOKEN" \
https://app.terraform.io/api/v2/runs/run-123456
# Download state
curl \
--header "Authorization: Bearer $TF_API_TOKEN" \
--header "Content-Type: application/vnd.api+json" \
https://app.terraform.io/api/v2/workspaces/ws-123456/current-state-version
Scenario 12: Testing and CI/CD Integration¶
Automated testing of Terraform configurations
sequenceDiagram
participant Git as GitHub
participant CI as CI Pipeline
participant Lint as Terraform Lint
participant Sec as Security Scan
participant Cost as Cost Estimation
participant Plan as Terraform Plan
participant TFC as Terraform Cloud
participant Apply as Terraform Apply
Git->>CI: Pull request opened
CI->>Lint: terraform fmt -check
Lint->>CI: Pass/Fail
CI->>Sec: tfsec + checkov
Sec->>CI: Security report (PASS/WARN/FAIL)
CI->>Cost: infracost breakdown
Cost->>CI: Cost estimate ($150/month)
CI->>Plan: terraform plan -out=tfplan
Plan->>CI: Plan details (24 resources to add)
CI->>Git: Post PR comment with results
Git->>CI: PR approved & merged
CI->>TFC: Trigger remote run
TFC->>Apply: terraform apply tfplan
Apply->>TFC: Apply complete
TFC->>Git: Update commit status (success)
Note over CI, TFC: Automated quality gates
Code: Testing and CI/CD Integration
# Security scanning with tfsec and Checkov
# .github/workflows/security-scan.yml
name: Security Scan
on:
pull_request:
paths:
- '**/*.tf'
jobs:
security:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Run tfsec
uses: tfsec/tfsec-sarif-action@master
with:
sarif_file: tfsec.sarif
- name: Upload SARIF file
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: tfsec.sarif
- name: Run Checkov
uses: bridgecrewio/checkov-action@master
with:
framework: terraform
output_format: sarif
output_file_path: checkov.sarif
- name: Upload Checkov SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: checkov.sarif
Terraform test framework (new in 1.6+):
# tests/web_app.tftest.hcl
mock_provider "aws" {}
variables {
environment = "test"
region = "us-east-1"
}
run "create_instance" {
command = apply
# Input variables for this test
variables {
instance_type = "t3.micro"
instance_count = 1
}
# Assertions to verify behavior
assert {
condition = length(aws_instance.web) == 1
error_message = "Should create exactly 1 instance"
}
assert {
condition = aws_instance.web[0].instance_type == "t3.micro"
error_message = "Instance should be t3.micro"
}
assert {
condition = can(regex(".+-test-.+", aws_instance.web[0].tags.Name))
error_message = "Name tag should contain environment"
}
}
run "enforce_security_group_rules" {
command = apply
variables {
allowed_ports = [22, 443]
}
assert {
condition = length(aws_security_group.web.ingress) == 2
error_message = "Security group should have 2 ingress rules"
}
assert {
condition = alltrue([
for rule in aws_security_group.web.ingress :
contains([22, 443], rule.from_port)
])
error_message = "Only ports 22 and 443 should be allowed"
}
}
run "destroy_cleanly" {
command = destroy
# Verify resources are destroyed
plan {
mode = destroy
}
}
Terratest example (Go-based testing):
// test/terraform_web_app_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestTerraformWebApp(t *testing.T) {
t.Parallel()
// Configure Terraform options
terraformOptions := &terraform.Options{
TerraformDir: "../examples/web-app",
Vars: map[string]interface{}{
"environment": "test",
"instance_type": "t3.micro",
"instance_count": 1,
},
EnvVars: map[string]string{
"AWS_DEFAULT_REGION": "us-east-1",
},
}
// Cleanup resources at end of test
defer terraform.Destroy(t, terraformOptions)
// Initialize and apply
terraform.InitAndApply(t, terraformOptions)
// Get outputs
instanceID := terraform.Output(t, terraformOptions, "instance_id")
publicIP := terraform.Output(t, terraformOptions, "public_ip")
// Verify EC2 is running
instance := aws.GetInstance(t, "us-east-1", instanceID)
assert.Equal(t, "running", *instance.State.Name)
// Verify HTTP endpoint returns 200
url := fmt.Sprintf("http://%s", publicIP)
http_helper.HttpGetWithRetryWithCustomValidation(
t,
url,
nil,
30,
10*time.Second,
func(statusCode int, body string) bool {
return statusCode == 200 && strings.Contains(body, "Hello")
},
)
// Verify tags
tags := aws.GetTagsForEc2Instance(t, "us-east-1", instanceID)
assert.Equal(t, "test", tags["Environment"])
}
// Test performance under load
func TestWebAppScaling(t *testing.T) {
terraformOptions := &terraform.Options{
TerraformDir: "../examples/web-app",
Vars: map[string]interface{}{
"environment": "load-test",
"instance_count": 5,
},
}
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
// Run load test
albDNS := terraform.Output(t, terraformOptions, "alb_dns_name")
runLoadTest(t, albDNS, 1000, 30*time.Second)
// Verify ASG scales up
asgName := terraform.Output(t, terraformOptions, "autoscaling_group_name")
desiredCapacity := aws.GetDesirecCapacityForAsg(t, "us-east-1", asgName)
assert.GreaterOrEqual(t, desiredCapacity, 5)
}
Scenario 13: Secrets Management with Vault¶
Integrating HashiCorp Vault for dynamic secrets
sequenceDiagram
participant TF as Terraform
participant Vault as HashiCorp Vault
participant AWS as AWS Resources
participant App as Application
participant Audit as Audit Log
TF->>Vault: Request dynamic AWS credentials
Vault->>AWS: Generate temporary IAM role
Vault->>TF: Return short-lived credentials
TF->>Vault: Request database credentials
Vault->>AWS: Create temporary DB user
Vault->>TF: Return rotating password
TF->>AWS: Provision resources using Vault secrets
AWS->>App: Pass secrets to application
App->>Vault: Renew lease periodically
Audit->>Vault: Log all secret access
Note over Vault: Secrets automatically expire & rotate
Code:
# provider.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
vault = {
source = "hashicorp/vault"
version = "~> 3.20"
}
}
}
# Configure Vault provider
provider "vault" {
address = "https://vault.example.com:8200"
# Use token from environment or IAM auth
token = var.vault_token
# Or use AWS IAM auth
auth_login {
path = "auth/aws/login"
parameters = {
role = "terraform-deployer"
jwt = var.vault_jwt
}
}
}
# Dynamic AWS credentials from Vault
data "vault_aws_access_credentials" "deploy" {
backend = "aws" # Mount path in Vault
role = "terraform-deploy-role"
# Renew credentials before expiration
renew = true
}
# Use dynamic credentials with AWS provider
provider "aws" {
access_key = data.vault_aws_access_credentials.deploy.access_key
secret_key = data.vault_aws_access_credentials.deploy.secret_key
# STS token if using IAM roles
token = data.vault_aws_access_credentials.deploy.security_token
region = var.aws_region
}
# Database credentials
resource "vault_database_secret_backend_connection" "postgres" {
backend = "database"
name = "postgres-prod"
allowed_roles = ["app-read", "app-write"]
postgresql {
connection_url = "postgres://${var.db_admin_user}:${var.db_admin_pass}@db.example.com:5432/prod"
}
}
resource "vault_database_secret_backend_role" "app_read" {
backend = vault_database_secret_backend_connection.postgres.backend
name = "app-read"
db_name = vault_database_secret_backend_connection.postgres.name
creation_statements = [
"CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';",
"GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";"
]
default_ttl = "1h"
max_ttl = "24h"
}
# Generate dynamic credentials in Terraform
data "vault_generic_secret" "db_creds" {
path = "database/creds/app-read"
}
# Use credentials to provision database
resource "postgresql_database" "app" {
name = "customer_app"
owner = data.vault_generic_secret.db_creds.data["username"]
}
# PostgreSQL provider configuration
provider "postgresql" {
host = "db.example.com"
port = 5432
database = "postgres"
username = data.vault_generic_secret.db_creds.data["username"]
password = data.vault_generic_secret.db_creds.data["password"]
# Superuser for schema changes
superuser = false
}
# Kubernetes secrets integration
data "vault_kv_secret_v2" "app_config" {
mount = "kv"
name = "apps/webapp/config"
}
resource "kubernetes_secret" "app" {
metadata {
name = "app-secret"
}
data = {
db_host = data.vault_kv_secret_v2.app_config.data["db_host"]
db_password = data.vault_kv_secret_v2.app_config.data["db_password"]
api_key = data.vault_kv_secret_v2.app_config.data["api_key"]
}
# This secret will be created with values from Vault
}
# Certificate management
resource "vault_pki_secret_backend_role" "app_cert" {
backend = "pki"
name = "app.example.com"
ttl = "720h"
max_ttl = "8760h"
allow_ip_sans = true
allowed_domains = ["app.example.com"]
allow_subdomains = true
}
data "vault_pki_secret_backend_cert" "app" {
backend = vault_pki_secret_backend_role.app_cert.backend
name = vault_pki_secret_backend_role.app_cert.name
common_name = "app.example.com"
ttl = "720h"
}
# Use Vault-managed certificate
resource "aws_acm_certificate" "app" {
# Use Vault-generated certificate instead of AWS
private_key = data.vault_pki_secret_backend_cert.app.private_key_pem
certificate_body = data.vault_pki_secret_backend_cert.app.certificate_pem
certificate_chain = data.vault_pki_secret_backend_cert.app.issuing_ca_pem
}
# Encryption key
data "vault_transit_encrypt" "app" {
backend = "transit"
key = "app-key"
plaintext = base64encode(var.secret_data)
}
# Store encrypted value
resource "aws_ssm_parameter" "secret" {
name = "/prod/app/secret"
type = "SecureString"
value = data.vault_transit_encrypt.app.ciphertext
tags = {
Source = "Vault-Transit"
}
}
# Approle authentication (for CI/CD)
resource "vault_approle_auth_backend_role" "ci" {
backend = "approle"
role_name = "ci-terraform"
token_policies = ["terraform-deployer"]
token_ttl = 300
token_max_ttl = 600
}
data "vault_approle_auth_backend_role_id" "ci" {
backend = vault_approle_auth_backend_role.ci.backend
role_name = vault_approle_auth_backend_role.ci.role_name
}
resource "vault_approle_auth_backend_secret_id" "ci" {
backend = vault_approle_auth_backend_role.ci.backend
role_name = vault_approle_auth_backend_role.ci.role_name
}
# Output CI credentials (sensitive!)
output "approle_role_id" {
value = data.vault_approle_auth_backend_role_id.ci.role_id
sensitive = true
}
output "approle_secret_id" {
value = vault_approle_auth_backend_secret_id.ci.secret_id
sensitive = true
}
Vault Policies for Terraform:
# vault-policies/terraform-deployer.hcl
path "aws/creds/deploy" {
capabilities = ["read"]
}
path "database/creds/*" {
capabilities = ["read"]
}
path "kv/data/apps/*" {
capabilities = ["read"]
}
path "pki/issue/*" {
capabilities = ["update"]
}
path "transit/encrypt/app-key" {
capabilities = ["update"]
}
# For managing secrets
path "kv/data/apps/webapp/*" {
capabilities = ["create", "read", "update", "delete"]
}
# For workspace-specific access
path "aws/role/{{terraform.workspace}}-deploy" {
capabilities = ["read"]
}
Set up AppRole for CI/CD:
# Enable AppRole auth
vault auth enable approle
# Create policy
vault policy write terraform-deployer vault-policies/terraform-deployer.hcl
# Create role
vault write auth/approle/role/ci-terraform \
secret_id_ttl=600 \
token_ttl=300 \
token_max_ttl=600 \
token_policies=terraform-deployer
# Get credentials
vault read auth/approle/role/ci-terraform/role-id
vault write -f auth/approle/role/ci-terraform/secret-id
# Use in GitHub Actions
# Add these as secrets
export TF_VAR_vault_role_id=${{ secrets.VAULT_ROLE_ID }}
export TF_VAR_vault_secret_id=${{ secrets.VAULT_SECRET_ID }}
# Configure provider
provider "vault" {
auth_login {
path = "auth/approle/login"
parameters = {
role_id = var.vault_role_id
secret_id = var.vault_secret_id
}
}
}
Scenario 14: Policy as Code with OPA¶
Open Policy Agent for advanced policy enforcement
sequenceDiagram
participant Dev as Developer
participant Git as VCS
participant OPA as Open Policy Agent
participant TF as Terraform Plan (JSON)
participant Policy as Rego Policies
participant Result as Policy Decision
Dev->>Git: Push Terraform code
Git->>OPA: Webhook triggers policy check
OPA->>TF: Parse plan file to JSON
TF->>Policy: Evaluate against policies
Policy->>Result: Check: No public S3 buckets
Policy->>Result: Check: Cost under $1000
Policy->>Result: Check: Required tags present
Result->>OPA: Pass/Fail decision
OPA->>Git: Block PR if failed
Note over Policy: Declarative policy language
Code:
# Policy evaluation setup
# policies/enforce.rego
package terraform.policy
# Deny public S3 buckets
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket_public_access_block"
resource.change.after.block_public_acls == false
msg := sprintf("S3 bucket %s must block public ACLs", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket_public_access_block"
resource.change.after.block_public_policy == false
msg := sprintf("S3 bucket %s must block public policies", [resource.address])
}
# Require encryption on all storage
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
not resource.change.after.server_side_encryption_configuration
msg := sprintf("S3 bucket %s must have encryption enabled", [resource.address])
}
# Enforce tagging
deny[msg] {
resource := input.resource_changes[_]
not startswith(resource.address, "data.")
tags := resource.change.after.tags
not tags.Environment
msg := sprintf("Resource %s must have Environment tag", [resource.address])
}
# Cost control
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
instance_type := resource.change.after.instance_type
# No instances larger than t3.xlarge in dev/staging
input.workspace.name != "prod"
not startswith(instance_type, "t3.")
msg := sprintf("Instance %s type %s too large for non-prod", [
resource.address, instance_type
])
}
# Allowed regions only
deny[msg] {
resource := input.resource_changes[_]
resource.provider_name == "registry.terraform.io/hashicorp/aws"
region := input.configuration.provider_config.aws.expressions.region.constant_value
not region in var.allowed_regions
msg := sprintf("Region %s not allowed. Use one of: %s", [
region, concat(", ", var.allowed_regions)
])
}
# No hardcoded credentials
deny[msg] {
resource := input.resource_changes[_]
password_field := [["password"], ["access_key"], ["secret_key"], ["token"]]
some field in password_field
resource.change.after[field]
msg := sprintf("Resource %s has hardcoded credential field: %s", [
resource.address, field
])
}
# Network security: No 0.0.0.0/0 in SG
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_security_group_rule"
resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
resource.change.after.type == "ingress"
msg := sprintf("Security group %s cannot allow 0.0.0.0/0", [resource.address])
}
# Variables
allowed_regions = ["us-east-1", "us-west-2", "eu-west-1"]
# Test file (policies/test.rego)
package terraform.policy
test_deny_public_s3_bucket {
input := {
"resource_changes": [{
"address": "aws_s3_bucket_public_access_block.example",
"type": "aws_s3_bucket_public_access_block",
"change": {
"after": {"block_public_acls": false}
}
}]
}
count(deny) == 1
}
test_allow_encrypted_s3_bucket {
input := {
"resource_changes": [{
"address": "aws_s3_bucket.example",
"type": "aws_s3_bucket",
"change": {
"after": {
"server_side_encryption_configuration": {
"rule": {
"apply_server_side_encryption_by_default": {
"sse_algorithm": "AES256"
}
}
}
}
}
}]
}
count(deny) == 0
}
CI/CD Integration:
#!/bin/bash
# .github/scripts/opa-evaluate.sh
# Generate Terraform plan JSON
terraform show -json tfplan.binary > tfplan.json
# Download OPA
curl -L -o opa https://github.com/open-policy-agent/opa/releases/latest/download/opa_linux_amd64_static
chmod +x opa
# Run policy checks
opa test policies/
# Evaluate against Terraform plan
opa eval --format pretty \
--data policies/enforce.rego \
--input tfplan.json \
"data.terraform.policy.deny"
# Check if any denials
if [ $? -ne 0 ]; then
echo "OPA policy violations detected!"
exit 1
fi
# For PR comments
opa eval --format json \
--data policies/enforce.rego \
--input tfplan.json \
"data.terraform.policy.deny" > opa-results.json
# Parse and comment on PR
python3 .github/scripts/comment-pr.py opa-results.json
Conftest integration:
# Alternative to OPA CLI
conftest test tfplan.json --policy policies/
# With specific namespace
conftest test tfplan.json --policy policies/ --namespace terraform.policy
# Output in TAP format
conftest test tfplan.json --output tap
Scenario 15: Dynamic Provider Configuration¶
Multi-account, multi-region deployments
sequenceDiagram
participant Config as Configuration
participant TF as Terraform
participant Alias as Provider Aliases
participant AWS1 as AWS Account 1
participant AWS2 as AWS Account 2
Config->>TF: Define provider configs
TF->>Alias: Create 3 AWS provider aliases
Alias->>AWS1: Provider "aws.dev"
Alias->>AWS1: Provider "aws.staging"
Alias->>AWS2: Provider "aws.prod"
TF->>TF: for_each = environments
TF->>Alias: Use different provider per env
TF->>AWS1: Deploy to dev & staging
TF->>AWS2: Deploy to prod
Note over Alias: Dynamic provider selection
Code:
# Configure multiple AWS providers
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
# Default provider (for resources without alias)
provider "aws" {
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::${var.accounts.dev}:role/TerraformRole"
}
}
# Provider for staging
provider "aws" {
alias = "staging"
region = "us-west-2"
assume_role {
role_arn = "arn:aws:iam::${var.accounts.staging}:role/TerraformRole"
}
}
# Provider for production (different account)
provider "aws" {
alias = "prod"
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::${var.accounts.prod}:role/TerraformRole"
session_name = "terraform-prod"
external_id = var.prod_external_id
}
# Different credentials profile
profile = "prod-admin"
}
# Variable for account IDs
variable "accounts" {
type = object({
dev = string
staging = string
prod = string
})
default = {
dev = "123456789012"
staging = "123456789013"
prod = "123456789014"
}
}
# Deploy to all environments using for_each
locals {
environments = toset(["dev", "staging", "prod"])
}
# Map environment to provider alias
locals {
env_provider = {
dev = aws
staging = aws.staging
prod = aws.prod
}
}
# Create resources in each environment
resource "aws_instance" "web" {
for_each = local.environments
# Select provider based on environment
provider = local.env_provider[each.key]
ami = data.aws_ami.amazon_linux[each.key].id
instance_type = var.instance_types[each.key]
tags = {
Environment = each.key
Name = "web-server-${each.key}"
}
}
# Data sources per provider
data "aws_ami" "amazon_linux" {
for_each = local.environments
provider = local.env_provider[each.key]
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# Conditional provider usage
resource "aws_s3_bucket" "logs" {
# Only in prod account
provider = aws.prod
count = var.environment == "prod" ? 1 : 0
bucket = "prod-logs-${data.aws_caller_identity.prod.account_id}"
}
# Cross-account resource reference
data "aws_caller_identity" "prod" {
provider = aws.prod
}
# Shared KMS key (in security account)
provider "aws" {
alias = "security"
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::${var.accounts.security}:role/SecurityAdmin"
}
}
resource "aws_kms_key" "shared" {
provider = aws.security
description = "Shared encryption key"
deletion_window_in_days = 7
enable_key_rotation = true
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "EnableIAMUserPermissions"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.accounts.security}:root"
}
Action = "kms:*"
Resource = "*"
},
{
Sid = "AllowProdAccountUse"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.accounts.prod}:root"
}
Action = [
"kms:Encrypt",
"kms:Decrypt",
"kms:ReEncrypt*",
"kms:GenerateDataKey*",
"kms:DescribeKey"
]
Resource = "*"
}
]
})
}
# Use shared KMS key from production
resource "aws_s3_bucket_server_side_encryption_configuration" "prod_data" {
provider = aws.prod
bucket = aws_s3_bucket.prod_data.id
rule {
apply_server_side_encryption_by_default {
kms_master_key_id = aws_kms_key.shared.key_id
sse_algorithm = "aws:kms"
}
}
}
# Multi-region deployment
variable "regions" {
type = map(object({
provider = string
region = string
}))
default = {
us-east-1 = {
provider = "aws"
region = "us-east-1"
}
eu-west-1 = {
provider = "aws.eu"
region = "eu-west-1"
}
ap-south-1 = {
provider = "aws.apac"
region = "ap-south-1"
}
}
}
# Configure regional providers
provider "aws" {
alias = "eu"
region = "eu-west-1"
}
provider "aws" {
alias = "apac"
region = "ap-south-1"
}
# Deploy to all regions
resource "aws_s3_bucket" "global_assets" {
for_each = var.regions
provider = aws[each.value.provider]
bucket = "global-assets-${each.key}-${data.aws_caller_identity.current.account_id}"
provider = lookup({
"us-east-1" = aws
"eu-west-1" = aws.eu
"ap-south-1" = aws.apac
}, each.key)
tags = {
Region = each.key
}
}
# Provider meta-arguments
resource "aws_instance" "web" {
provider = aws.staging
# Explicit provider for each resource
lifecycle {
# Provider changes force recreation
create_before_destroy = true
}
}
Here are the missing Mermaid sequence diagrams for Scenarios 16-20:
Scenario 16: Performance Optimization at Scale¶
Speeding up plans and applies for large infrastructures
sequenceDiagram
participant Dev as Developer
participant TF as Terraform Core
participant Graph as Dependency Graph
participant Cache as Build Cache
participant State as State File
participant AWS as AWS API
Dev->>TF: terraform apply -parallelism=20
TF->>Graph: Build resource dependency graph
Graph->>TF: Identify parallelizable resources
TF->>Cache: Check for cached providers/modules
Cache->>TF: Return cached data
TF->>State: Read current state
State->>TF: Return state (partial refresh)
TF->>AWS: Parallel API calls (20 concurrent)
AWS->>TF: Return resource statuses
TF->>AWS: Create/Update resources in parallel batches
AWS->>TF: Confirm operations
TF->>Dev: Show progress with reduced time
Note over TF: Optimized execution with -target & -refresh=false
Code:
# Parallel resource creation
resource "aws_instance" "worker" {
for_each = { for i in range(var.worker_count) : i => i }
# Parallel provisioning
provisioner "remote-exec" {
inline = [
"sudo yum update -y",
"sudo systemctl start worker"
]
connection {
type = "ssh"
user = "ec2-user"
private_key = file(var.private_key_path)
host = self.public_ip
timeout = "2m"
}
}
}
# Graph dependencies optimization
# Explicit depends_on for complex dependencies
resource "aws_db_instance" "app" {
# Wait for subnet group
depends_on = [aws_db_subnet_group.app]
# Create after security group
depends_on = [aws_security_group.rds]
# Don't start until network is ready
depends_on = [module.vpc]
}
# Use -parallelism flag (default 10)
# terraform apply -parallelism=20
# Refresh only specific resources
terraform refresh -target=aws_instance.web
# State manipulation for speed
# Move resources instead of recreating
terraform state mv aws_instance.old[0] aws_instance.new[0]
# Remove unused resources from state (faster than destroy)
terraform state rm aws_instance.unused
# Data source caching
data "aws_ami" "amazon_linux" {
# Use cache if available
lifecycle {
postcondition {
condition = self.id != ""
error_message = "Failed to fetch AMI"
}
}
}
# Limit provider calls with lifecycle
resource "aws_s3_bucket" "logs" {
# Prevent recreation on tag changes
lifecycle {
ignore_changes = [
tags["LastUpdated"],
server_side_encryption_configuration[0].rule[0].apply_server_side_encryption_by_default[0].kms_master_key_id
]
}
}
# Use -refresh=false for faster plans
terraform plan -refresh=false
# Refresh specific resources only
terraform plan -target=aws_instance.web
# Split monolithic state
# backend.tf
terraform {
backend "s3" {
bucket = "app-state-${terraform.workspace}"
key = "infrastructure.tfstate"
}
}
# Use data sources to reference other states
data "terraform_remote_state" "network" {
backend = "s3"
config = {
bucket = "network-state-${terraform.workspace}"
key = "network.tfstate"
}
}
# Disable unnecessary providers
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
# Only enable Vault in prod
vault = {
source = "hashicorp/vault"
version = "~> 3.0"
configuration_aliases = [vault.prod]
}
}
}
provider "vault" {
alias = "prod"
# Only configured for prod
}
# Conditional provider usage
resource "vault_generic_secret" "prod_secret" {
count = terraform.workspace == "prod" ? 1 : 0
provider = vault.prod
path = "secret/prod/app"
}
Scenario 17: Cross-Region Replication¶
Global infrastructure patterns
sequenceDiagram
participant TF as Terraform
participant Primary as Primary Region (us-east-1)
participant Dr as DR Region (us-west-2)
participant Replicate as Replication Service
participant App as Application
participant DNS as Global DNS
TF->>Primary: Create DynamoDB table
Primary->>Replicate: Enable global tables
TF->>Dr: Create replica table
Replicate->>Dr: Sync data continuously
TF->>Primary: Launch RDS cluster
Primary->>Dr: Create cross-region replica
TF->>DNS: Configure Route53 failover
App->>DNS: Query app.example.com
alt Primary Healthy
DNS->>Primary: Route traffic to ALB
else Primary Down
DNS->>Dr: Failover to DR region
end
Note over Replicate: Async replication with <1s lag
Code:
# Global table with DynamoDB
resource "aws_dynamodb_table" "global" {
name = "global-data"
billing_mode = "PAY_PER_REQUEST"
hash_key = "id"
attribute {
name = "id"
type = "S"
}
# Enable global tables
replicas {
region_name = "us-west-2"
}
replicas {
region_name = "eu-west-1"
}
}
# Cross-region VPC peering
resource "aws_vpc_peering_connection" "east_to_west" {
vpc_id = aws_vpc.east.id
peer_vpc_id = aws_vpc.west.id
peer_region = "us-west-2"
auto_accept = false
tags = {
Name = "east-west-peering"
}
}
# Accept peering in west region
resource "aws_vpc_peering_connection_accepter" "west_accepter" {
provider = aws.west
vpc_peering_connection_id = aws_vpc_peering_connection.east_to_west.id
auto_accept = true
tags = {
Name = "west-accepter"
}
}
# Route tables for peering
resource "aws_route" "east_to_west" {
count = length(aws_subnet.east_private)
route_table_id = aws_subnet.east_private[count.index].route_table_id
destination_cidr_block = aws_vpc.west.cidr_block
vpc_peering_connection_id = aws_vpc_peering_connection.east_to_west.id
}
# Aurora Global Database
resource "aws_rds_global_cluster" "global" {
global_cluster_identifier = "prod-global-db"
engine = "aurora-postgresql"
engine_version = "15.3"
database_name = "globaldb"
}
resource "aws_rds_cluster" "primary" {
provider = aws.primary
engine = "aurora-postgresql"
engine_version = "15.3"
cluster_identifier = "prod-primary"
master_username = var.db_username
master_password = var.db_password
global_cluster_identifier = aws_rds_global_cluster.global.id
db_subnet_group_name = aws_db_subnet_group.primary.name
}
resource "aws_rds_cluster_instance" "primary" {
provider = aws.primary
cluster_identifier = aws_rds_cluster.primary.id
instance_class = "db.r5.large"
}
# Secondary region
resource "aws_rds_cluster" "secondary" {
provider = aws.secondary
engine = "aurora-postgresql"
engine_version = "15.3"
cluster_identifier = "prod-secondary"
global_cluster_identifier = aws_rds_global_cluster.global.id
db_subnet_group_name = aws_db_subnet_group.secondary.name
# Copy from primary
source_region = "us-east-1"
}
# CloudFront global distribution
resource "aws_cloudfront_distribution" "global" {
origin {
domain_name = aws_lb.primary.dns_name
origin_id = "primary"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2"]
}
}
origin {
domain_name = aws_lb.dr.dns_name
origin_id = "dr"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2"]
}
}
enabled = true
is_ipv6_enabled = true
default_root_object = "index.html"
# Primary origin with DR backup
origin_group {
origin_id = "group"
failover_criteria {
status_codes = [403, 404, 500, 502, 503, 504]
}
member {
origin_id = "primary"
}
member {
origin_id = "dr"
}
}
default_cache_behavior {
target_origin_id = "group"
viewer_protocol_policy = "redirect-to-https"
allowed_methods = ["GET", "HEAD", "OPTIONS", "PUT", "POST", "PATCH", "DELETE"]
cached_methods = ["GET", "HEAD"]
forwarded_values {
query_string = true
cookies {
forward = "all"
}
}
}
restrictions {
geo_restriction {
restriction_type = "none"
}
}
viewer_certificate {
acm_certificate_arn = aws_acm_certificate.global.arn
ssl_support_method = "sni-only"
}
}
Scenario 18: Edge Cases & Troubleshooting¶
Common issues and solutions
sequenceDiagram
participant Dev as Developer
participant TF as Terraform CLI
participant State as State Lock
participant API as AWS API
participant Log as Debug Log
participant Fix as Resolution
Dev->>TF: terraform apply
TF->>State: Request state lock
alt State Locked
State->>TF: Error: State locked by another process
TF->>Dev: Show lock ID
Dev->>State: Investigate lock holder
Dev->>TF: terraform force-unlock <ID>
else API Timeout
API->>TF: Request timeout (30s)
TF->>Log: Log timeout error
Log->>Dev: Show error details
Dev->>TF: Increase timeout/retry
TF->>API: Retry with backoff
else Resource Exists
API->>TF: Error: Resource already exists
TF->>Dev: Suggest import command
Dev->>TF: terraform import <resource> <ID>
TF->>State: Import successful
end
TF->>Dev: Apply successful
Note over Log: Enable TF_LOG=DEBUG for details
Code:
# Handle resource already exists
resource "aws_s3_bucket" "example" {
bucket = var.bucket_name
# Import existing bucket instead of failing
lifecycle {
prevent_destroy = false
}
}
# Then run: terraform import aws_s3_bucket.example my-existing-bucket
# Handle circular dependencies
# Use depends_on carefully
resource "aws_instance" "a" {
depends_on = [aws_instance.b] # Don't do this - creates cycle
# Instead use data sources
user_data = <<-EOF
#!/bin/bash
echo "Instance B IP: ${aws_instance.b.private_ip}"
EOF
}
# Handle provider version conflicts
# .terraform.lock.hcl
# Commit this file to lock provider versions
# Handle large state files
terraform state pull > state.json
# Edit state.json
terraform state push state.json
# Handle timeout issues
resource "aws_db_instance" "large" {
# Increase timeout for large DB
timeouts {
create = "2h"
delete = "2h"
update = "2h"
}
}
# Handle count/for_each errors
# Use locals to preprocess data
locals {
healthy_instances = {
for k, v in var.instances : k => v if v.status == "healthy"
}
}
resource "aws_instance" "web" {
for_each = local.healthy_instances
# This avoids errors from invalid instances
}
# Handle provider authentication issues
# Use explicit credentials
provider "aws" {
# Instead of relying on env vars, be explicit
access_key = var.aws_access_key
secret_key = var.aws_secret_key
region = var.aws_region
# For SSO
profile = "prod-admin"
shared_config_files = [~/.aws/config"]
shared_credentials_files = ["~/.aws/credentials"]
}
# Handle resource drift
resource "aws_security_group_rule" "example" {
# Add lifecycle ignore for externally managed rules
lifecycle {
ignore_changes = all
}
}
# Or detect drift
terraform plan -detailed-exitcode
# Exit code 0 = no changes
# Exit code 1 = error
# Exit code 2 = changes detected
# Handle large resources with pagination
data "aws_instances" "all" {
# This may timeout for large accounts
# Instead use filters
filter {
name = "instance-state-name"
values = ["running"]
}
filter {
name = "tag:Environment"
values = ["production"]
}
}
# Handle eventual consistency
resource "aws_iam_role_policy" "example" {
role = aws_iam_role.example.name
# Wait for role to propagate
depends_on = [time_sleep.iam_propagation]
}
resource "time_sleep" "iam_propagation {
depends_on = [aws_iam_role.example]
create_duration = "30s"
}
# Handle API rate limiting
provider "aws" {
# Add delays between requests
max_retries = 10
retry_mode = "adaptive"
# Custom endpoints for debugging
endpoints {
ec2 = "http://localhost:4566" # LocalStack
}
}
# Handle state locks
# Force unlock (use with caution!)
terraform force-unlock LOCK_ID
# Prevent lock issues
terraform {
backend "s3" {
# Use shorter lock timeout
dynamodb_table = "terraform-locks"
# Auto-apply doesn't hold locks long
}
}
# Handle sensitive values
variable "db_password" {
type = string
sensitive = true
validation {
condition = length(var.db_password) >= 16
error_message = "Password must be at least 16 characters."
}
}
# Output with sensitive flag
output "db_endpoint" {
value = aws_db_instance.app.endpoint
description = "Database endpoint"
sensitive = true
}
# Suppress sensitive value warnings
terraform 'output -json' | jq '.db_endpoint.value'
# Handle destroy errors
# Prevent destroy for critical resources
resource "aws_dynamodb_table" "critical" {
lifecycle {
prevent_destroy = true
}
}
# Or use targeted destroy
terraform destroy -target=aws_instance.web
# Handle module version conflicts
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0.0" # Pin exact version
# Use version constraints
# '~> 5.0' means >= 5.0.0, < 6.0.0
}
# Handle complex expressions
# Use local values to simplify
locals {
# Complex logic here
instance_type = (
var.environment == "prod" ? "m5.large" :
var.environment == "staging" ? "m5.medium" :
"t3.micro"
)
}
resource "aws_instance" "web" {
instance_type = local.instance_type
}
# Debugging tips
# Enable detailed logging
export TF_LOG=DEBUG
export TF_LOG_PATH=terraform.log
# Trace provider calls
TF_LOG=TRACE terraform plan
# Graph dependencies
terraform graph | dot -Tpng > graph.png
# Validate JSON syntax
terraform show -json | jq .
# Check provider schemas
terraform providers schema -json
# Performance profiling
terraform plan -profile=cpu.prof
go tool pprof cpu.prof
# Resource tracing
TF_LOG_PROVIDER=TRACE terraform apply
# State directory cleanup
# Remove unused providers
terraform providers lock -platform=linux_amd64 -platform=darwin_amd64
# Compact state file
terraform state pull | jq -S . > compact.json
terraform state push compact.json
Scenario 19: Final Production Checklist¶
Pre-deployment validation script
sequenceDiagram
participant Dev as Developer
participant Script as Validation Script
participant TF as Terraform CLI
participant Linters as Linters/Scanners
participant API as Cloud APIs
participant Result as Final Report
Dev->>Script: Run pre-flight-check.sh
Script->>TF: terraform version
TF->>TF: terraform fmt -check
TF->>TF: terraform validate
Script->>Linters: Run tfsec, checkov
Linters->>Script: Security scan results
Script->>API: Check state lock table
API->>Script: Lock status
Script->>Script: Verify variables & tfvars
Result->>Dev: Print pass/fail report
alt All Checks Pass
Dev->>TF: terraform apply
else Checks Fail
Dev->>Script: Review errors & fix
end
Note over Script: Pre-deployment quality gate
Code:
#!/bin/bash
# pre-flight-check.sh
set -e
echo "Running Terraform pre-flight checks..."
# Check version
terraform version
if ! terraform version | grep -q "1.6"; then
echo "WARNING: Terraform version should be 1.6.x"
fi
# Format check
if ! terraform fmt -check -recursive; then
echo "ERROR: Terraform files not formatted. Run 'terraform fmt -recursive'"
exit 1
fi
# Validate
terraform validate
# Security scan
tfsec .
# Cost estimation
infracost breakdown --path .
# Check for hardcoded secrets
if grep -r "password\|secret\|key" --include="*.tf" . | grep -v "variable\|data\|local"; then
echo "WARNING: Potential hardcoded secrets found"
fi
# Check state lock
aws dynamodb describe-table --table-name terraform-locks --region us-east-1
# Validate variables
if [ ! -f "terraform.tfvars" ]; then
echo "WARNING: No terraform.tfvars file found"
fi
# Check provider locks
if [ ! -f ".terraform.lock.hcl" ]; then
echo "WARNING: Provider lock file missing. Run 'terraform init -upgrade'"
fi
# Check for sensitive outputs
terraform output -json | jq -r 'to_entries[] | select(.value.sensitive == false) | .key'
echo "All checks passed!"
echo "Ready for: terraform apply"
Quick Reference: Essential Commands¶
| Command | Description | Level |
|---|---|---|
terraform init |
Initialize the working directory | Beginner |
terraform plan |
Show changes to be applied | Beginner |
terraform apply |
Apply the changes | Beginner |
terraform destroy |
Destroy the infrastructure | Beginner |
terraform validate |
Check if configuration is valid | Beginner |
terraform fmt |
Format configuration files | Beginner |
terraform output |
Show output values | Beginner |
terraform state list |
List resources in state | Intermediate |
terraform state show <resource> |
Show resource details | Intermediate |
terraform state rm <resource> |
Remove resource from state | Intermediate |
terraform state mv <old> <new> |
Move resource in state | Intermediate |
terraform import <resource> <id> |
Import existing resource | Intermediate |
terraform taint <resource> |
Mark resource for recreation | Intermediate |
terraform untaint <resource> |
Remove taint from resource | Intermediate |
terraform providers |
List providers | Advanced |
terraform providers lock |
Lock provider versions | Advanced |
terraform providers unlock |
Unlock provider versions | Advanced |
terraform workspace list |
List workspaces | Advanced |
terraform workspace new <name> |
Create new workspace | Advanced |
terraform workspace select <name> |
Select a workspace | Advanced |
terraform workspace delete <name> |
Delete a workspace | Advanced |
Pro Tips for All Levels¶
- Always use version control: Keep your Terraform configurations in a Git repository.
- Use modules: Organize your infrastructure into reusable modules.
- Define variables: Use variables for flexibility and reusability.
- Set resource limits: Define timeouts and retry policies for resources.
- Use data sources: Fetch existing resources instead of recreating them.
- Check dependencies: Use
terraform graphto visualize dependencies. - Validate configurations: Regularly run
terraform validateandterraform fmt. - Use remote state: Store state in a remote backend like S3 or Terraform Cloud.
- Lock state files: Prevent concurrent modifications with state locks.
- Monitor costs: Use cost estimation tools like Infracost.
- Automate testing: Integrate Terraform with CI/CD pipelines and testing frameworks.
- Use Sentinel policies: Enforce compliance and best practices with Terraform Cloud.
- Keep Terraform updated: Stay on supported versions and regularly update providers.
- Use workspaces: Manage multiple environments with workspaces.
- Use remote execution: Leverage Terraform Cloud or Enterprise for remote runs.
- Backup state files: Regularly back up your state files to prevent data loss.
Happy infrastructure management! 🧱