Infrastructure as Code :: Terraform
The main reason why terraform is highly sought out is because it is vendor agnostic. Terraform syntax and structure is consistant across cloud providers. We can also extend terraform to other services like local. Hence, to work with terraform, one must only understand how terraform works and all its components.
Using terraform or this demo application, I am going to create a VPC where, i will be deploying an EKS instance to host our demo applications with all of our microservices. We will also be using terraform properties like remote backend and state lock for better management of our terraform state files.
Life cycle of Terraform
- Initialize
- Plan
- Apply and
- Destroy
Assuming I have already wrote all the required file for creating the infrastructure, the terraform life cycle starts with the initializing the project using terraform init
. This will make sure that all the configuration required for proper executing of code along with initializing backend either it be a remote or a local backend.
terraform plan
is the second stage of the life cycle. This command is essentially a dry run of the code that will give us what are all the resources that will be created or removed based on the previous state of the statefile. Finally terraform apply
, all the planned resources will be created. I have included terraform destroy
in the life cycle as this command can be used to destroy all the resources that were created which are stored in the statefile of the project.
Terraform State Files.
Basically terraform state files are like the brains of terraform. Suppose I created a s3 bucket for the first time, terraform will remeber the resources that it deployed which is stored in a state file. This is to prevent users from creating a same resources multiple times and to sync all of our instruction enabling us to manage the entirity of the of infrastructure as code.
Because of this statefile, we can keep creating new and different resources using the same project files without re-deploying any of the exisitng resources. If I delete a particular resource, terraform will also delete the same resources from it’s statefile.
Statefile Management
Usually all the files that terraform creates are very large files, which includes statefiles, lock file etc. The management of these statefile are important, especially when an organization is using the same terraform repository for deploying infrastructure. Also, since the statefile contains unmasked sensitive information, it cannot be shared via public registries or containers like git. Local state file cannot be used by the other members of team cannot build on top of our existing infrastructure without creating a drift between our infrastructure and statefile.
Hence, Remote Backend
and State locking
are introduced. Instead of storing statefiles locally, we can use an S3 bucket or any gitlab to store our terraform statefile remotely. This will make sure that all the resources within the team can create infrastructure building on top of the existing infrastructure without causing any drift, as the statefile is available to all the personal within the team. Now that we have our terraform statefile in a remote backend and multiple people have access to make the changes to the infrastructure. Now, when more gthan one personal tries to update the infrastructure at the same time, this causes a contribution, especially if both personal are trying to apply similar change. This is called locking
. To avoid this, we lock the statefile make sure that only one person can make the change to the infrastructure at any given time.
The most popular method of locking in AWS is Dynamo DB. Terraform has recently released s3 locking as well, but dynamo DB locking is still widely used. Records are mainted on who holds the lock and the changes made. This is ensure us that we are avoiding any conflict within our infrastructure.
For this project, I am going to create an EKS cluster using terraform.
Structure of Terraform
Just to reiterate, any terraform configuration starts with a provider declearation, followed by a resource block with variables and outputs.
1
2
3
4
5
6
7
8
9
10
11
resource "aws_s3_bucket" "remote-backend-tf" {
bucket = "<name>"
lifecycle {
prevent_destroy = false
}
tags = {
//
}
}
Modules in Terraform
This is similar to modules in Bicep, These are reusable blocks of code that can be use for any resource creation. For example, instead of writing a new VPC and EKS code blocks everytime we need to create an EKS cluster, we will reuse the same code to deploy any number of similar resources instead of starting from scratch. In terraform, this is made possible by Modules
Based on the requirements of this project, I am going to create modules for VPC and EKS.
Private subnet associated with a NAT gateway for applications or clusters to to talk to the internet along with a public subnet in assocation with IGW for the internet to reach out cluster/application.
For this infrastructure, specifically for VPC, the following are the components
- VPC
- Private Subnet
- Public Subnet
- NAT (w/ Pub)
- IGW (w/ Pvt)
- Route Tables
Similarly for EKS, for master
- IAM Role & Policy - Cluster
- IAM Role & Policy - Node
- Attach policy to IAM Role
- EKS Cluster
- Node Group
- Associate Node group to Master
For EKS, we are NOT going to use Fargate instead we will be leveraging EC2 node groups.
The backend code is only used for creating the resources that we will be using for the backend statefile storing like S3 bucket and Dynamo DB
Scalable Application :: Kubernetes implementation of the microservices
Before moving on to what we are going to do with the kubernetes cluster we created using terraform, first lets make sure that we can access our cluster and setup or environment for future deployments.
Accessing the kubernetes cluster
Based on the previous terraform documentation, we have create an EKS cluster using terraform with a remote backend. Now, in order to access my cluster from my workstation (which is an ec2 instance), we will need to get the kubeconfig of my cluster and set it as my current context. This can be done by a single aws cli command mentioned below.
1
aws eks update-kubeconfig --region <cluster_region> --name <cluster_name>
To know more on how I have also set up my own kubernetes cluster, follow the links below.
To access a cluster, there are two ways, they are
- User Accounts :: For a user to communicate with a cluster to do any kind of activities
- Service Accounts :: If we want to make use of a service to make any changes or communicate with pods, that where the service accounts comes in.
Kubernetes Service Accounts
Basically, a service account allows services within kubernetes to interact with the cluster and allow a pod to make changes to our kubernetes cluster. In general, it is recommanded to have a service account associated with a pod. By default, when EKS created a cluster, default serivce accounts are created. But in order for these service to actuall talk to any service like an api server then these service accounts needs to be binded with a cluster role.
Similar to policy_attachments in AWS IAM, we have to bind the service account to a cluster role if a service account have to make any changes or read data from the cluster.
By default, kubernetes will create a service account default
with default permissions, which helps us to get started for simple task without having to create a service account manually. In our scenario, we do not need any service accounts, as all our micro services are running independently.
Deployment :: Healing Problem
The main reason why organizations prefer container orchestrastion over containers like docker is mainly because of two important features that kubernetes providers scalablity
and Service Discovery
. Scaling in kubernetes include two main features, first one is the scaling of pods and the other would be auto healing.
Auto-Healing
is also a major part of the kubernetes. When an application receives more traffic, specifically during a peak sale time for example, that than a single pod can handle, we can scale up the 1 pod to 2, to serve the traffic and terminate the second pod to scale down when the traffic goes down to normal.
Now, all the above feature can be deployed using a Deployment
. The lowest level of deployment we can do with a deployment file in k8s is a pod. This deployment will handle the pods that we are creating, based on the intermediate replica sets (which are responsibile for deploying our pods and how many). Now, this replica set is also responsible for bringing up a new pod if something goes wrong with the existing pod and gets destroyed. Replica Set will always make sure that mentioned number of pods are always available. This is Auto Healing
.
Services :: Service Discovery Problem
Service solves one of the most important problem within the world of containers, which is Service Discovery
. The main problem with containers is that, when a container restarts, the IP addreses of that container is changed. This causes a lot of overhead to fix the IP mapping especially if that container is talking to any other containers. This problem is called as service discovery problem.
With kubernetes, we will create a deployment for a frontend service and another deployment for a backend service with some number of replica sets. Now, as mentioned above, deployment only solved scaling and auto healing but not the service discovery issue. But kubernetes dives a bit deep to solve this issue. Kubernetes relies on a resource called Service
. In our example, frontend will not directly talk to the backend directly rather frontend talks to a proxy called Service
which talks to backend. On the other hand, service doesn’t use IP address, rather it will identify the backend based on labels and annotation selector. This helps kubernetes, even if a pod is destroyed and the pods IP address is changed, it doesn’t matter as service is not using IP address to identify the pod(backend in this example) to communicate.
Using service, we can also expose an external IP to talk to any of the pod within the kubernetes. This is a different type of service. There are three different types of services, which include:
Included different types of kubernetes services below
Deploying the microservices that we have containerized to kubernetes
For all deployments within kubernetes, apiVersion
and Kind
are important and are a must need to any deployment file. Then comes, metadata
which is basically name, labels and annotations which are for recognizing the resources created by this deployment to be identified by a services as discussed in the above section. Metadata is followed by a spec
which contains all specification required for the deployment like replicas
and within spec we also have a templete
field, which is essentially a pod template that this deployment will deploy.
When using metadata’s labels and annotation use as many as needs and be specific with the usage of pods, as these fields can be used to create a service for communication between pods. Annotations looks something like app.kubernetes.io/name: <service_name>
. Within this template that deploys a pod, we need to specify the container that can be deployment to pod. (A pod can have mutliple containers, but not recommanded). This container specifics include the docker image name with repo, ports that need to be open, env variables and volumes etc.
NOTE
- The metadata and annotation of the deployment are not used for service discovery, the labels and annotations that are used in the pod template are the ones that are used for service discovery
Before we start with deploying the microservices into kubernetes, lets create our own service account instead of kubernetes’ default account. Once we create this SA, we are going to use that account for communication between pods. After creating this service account, we are going to deploy all the manifest files we have created for the our microservices.
Types of Service Accounts in Kubernetes
There are three types of services in kubernetes:
- ClusterIP
- NodePort
- LoadBalancer
When we deploy a kubernetes cluster, a internal cluster network is created, which is not accessable by the internet. This network is for internal pod communications and for services with the cluster to communicate. Whenever the service has ClusterIP
, that means it is not exposed to the internet and cannot be accessed from the internet.
Now, if we want to communicate with the cluster from the internet, we would ideally need to use the IP of the nodes within the cluster. This type of service is NodePort
. When we expose a service via nodeport, kubernetes will assign a port along with the node ip for external users with access to this node ip to connect from the internet. Internally, kubernetes will be using IP tables with kubeproxy to receive and foreward communications/requests. With NodePort
, anyone who can access the node ip will be able to access the service hosted within the cluster.
If we want to expose a service to internet, we use service of type LoadBalancer
. When the type is specified as lb, API server with talk to CCM(cloud control manager) which will intern talk to the respective cloud provider (AWS in this case) to grant to external IP. If the kubernetes providers doesn’t support CCM (like kind, minikube), this will not change anything with the configruation of service. Once an external IP address is assigned, the service hosted on that service can be accessed by any external users over the internet.
Lets expose our frontend microservices using LoadBalancer
to access the project over the internet.
1
2
kubectl edit svc <frontend_proxy_service_name>
type: LoadBalancer
When we make this change, AWS will create a LoadBalancer for us with an A record which can be used to access the project we just deployed.
LoadBalancer Effective?
Load balancer is very easy to deploy and there is no operation overhead like helm to maintain but that comes with a lot of disadvantages, like:
- Within the whole flow of this LoadBalancer, it is not declarative. Any changes we are making to the LoadBalancer, it has to be within AWS portal, not a manifest file.
- Not Cost Effective, if we want to expose 10 services using LB, AWS will create 10 LoadBalancer which will increase the cost of our infrastructure.
- This LoadBalancer creation is also tied to CCM and CCM in AWS will alway create an ALB, we cannot deploy any other type of LoadBalancer. If we want to deploy this is in a environment without any CCM like kind, minikube, this approach will not work.
Hence to overcome these disadvantages, we are going to use Ingress
, which is a declarative, cost effective (as we can have one loadbalancer to route requests to any service) and we can deploy Nginx, f5 or traefix type of LoadBalancers, which gives us more control on the asset we trying to deploy and manage.
Ingress and Ingress Controller
Kubernetes has a resource of kind Ingress
which helps us define the routes of incoming traffic to the cluster. Most basic example of Ingress routing would be access a webpage like amazon on amazon.com but the same webpage cannot be accessed on the IP of the amazon.com. This is because the routing rules of the application within the ingress which forwards the requests coming to amazon.com but not the IP of the domain. This routing can be done based on path or host.
Ingress –> Ingress Controller –> LoadBalancer
First we will have to deploy Ingress Controller, without which ingress is just a dangling resources in kubernetes. Before this, we will have to bind a service account to an IAM role using OIDC Connector using which, a pod will be able to deploy any resources it needs to create
First install eksctl using the offical documentation eksctl installtion
Get the OIDC token of the cluster using the following command,
1
oidc_id=$(aws eks describe-cluster --name $cluster_name --query "cluster.identity.oidc.issuer" --region <aws_cluster_region> --output text | cut -d '/' -f 5)
Now using eksctl, lets associate the iam oidc with the cluster,
1
eksctl utils associate-iam-oidc-provider --cluster $cluster_name --region us-west-2 --approve
We will need to create the IAM role with required permissions and a service account for our ingress LoadBalancer to attach to. AWS provides the iam policy that are elastic load balancer.
1
curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy.json
Now lets create the IAM policy using the iam document we just downloaded.
1
aws iam create-policy --policy-name <name-of-the-policy> --policy-document file://<name_of_the_document>
Lets create a service account and attach the policy using the following command. Behind the scenes, it is going to create the cloud formation templates to deploy this service account and attach an iam policy to it.
1
2
3
4
5
6
7
eksctl create iamserviceaccount --cluster=$cluster_name \
--region us-west-2 \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--role-name AWSLoadBalancerControllerIAMPolicy \
--attach-policy-arn=arn:aws:iam::296062547772:policy/AWSLoadBalancerControllerIAMPolicy \
--approve
Using helm, lets install the aws controller.
1
2
3
4
5
6
7
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=$cluster_name \
--set serviceAccount.create=false \
--set serviceAccount.name=<sa_name_we_just_created> \
--set region=<aws_eks_region> \
--set vpcId=<vpc_id_associated_with_eks>
lets verify the load balancer pods to verify the installation of ALB. we should see two pods for this load balancer.
1
2
3
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-6c4f465789-dz9lr 1/1 Running 0 2m56s
aws-load-balancer-controller-6c4f465789-t8jsw 1/1 Running 0 2m56s
Lets revert back the LoadBalancer service type to node port and edit the deployment file after which, we will start with out ingress controller deployment(ingress controller annotation)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: <metadata_name>
annotations: ## Can be founf in the documentation of kubernets
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
ingressClassName: <class name so, only the specific call LoadBalancer talk to this>
rules:
- hosts: <custom_domain> ## update the DNS Records in the cloudflare with the DNS IP
http:
paths:
- path: "/"
pathType: Prefix
backend:
service: opentelemetry-demo-frontendproxy
port:
number: 8080
Now, this will create a load balancer in AWS with DNS record. For accessing this project, we will have to point the domain name specified in the ingress to point to the record associated with the load balancer because, we cannot access the project directly with any ip or domain as specifed in the ingress rules.
In my case, I am using cloudflare and I have updated the cloudflare dns records in the dashboard and i was able to access the application on domain specified. This can also be done using Route53 within AWS. We can use a hosted zone within route 53 which essentially acts as a request forwarder.