카이도스의 Tech Blog

Terraform EKS - 03(Observability) 본문

EKS

Terraform EKS - 03(Observability)

카이도스 2025. 2. 23. 22:23
728x90
반응형

2025.02.10 - [EKS] - Terraform EKS - 01(VPC)

 

Terraform EKS - 01(VPC)

2023.04.26 - [EKS] - EKS 1주차 - Amzaon EKS 설치 및 기본 사용 EKS 1주차 - Amzaon EKS 설치 및 기본 사용CloudNet@-가시다(Gasida)님의 EKS 스터디를 기준으로 작성됩니다. Amazon EKS 소개 - 참고링크 Amazon EKS 소개 -

djdakf1234.tistory.com

2025.02.14 - [EKS] - Terraform EKS - 02(Network, Storage)

 

Terraform EKS - 02(Network, Storage)

2025.02.10 - [EKS] - Terraform EKS - 01(VPC) Terraform EKS - 01(VPC)2023.04.26 - [EKS] - EKS 1주차 - Amzaon EKS 설치 및 기본 사용 EKS 1주차 - Amzaon EKS 설치 및 기본 사용CloudNet@-가시다(Gasida)님의 EKS 스터디를 기준으로

djdakf1234.tistory.com


1. 기본 환경 배포

  • 기본 환경 CloudFormation 배포
# CloudFormation 스택 배포
aws cloudformation create-stack \
  --stack-name my-basic-infra \
  --template-body file://basic_infra.yaml

# [모니터링] CloudFormation 스택 상태
while true; do 
  date
  AWS_PAGER="" aws cloudformation list-stacks \
    --stack-status-filter CREATE_IN_PROGRESS CREATE_COMPLETE CREATE_FAILED DELETE_IN_PROGRESS DELETE_FAILED \
    --query "StackSummaries[*].{StackName:StackName, StackStatus:StackStatus}" \
    --output table
  sleep 1
done

 


  • variable.tf
variable "KeyName" {
  description = "Name of an existing EC2 KeyPair to enable SSH access to the instances."
  type        = string
}

variable "MyDomain" {
  description = "Your Domain Name."
  type        = string
}

variable "MyIamUserAccessKeyID" {
  description = "IAM User - AWS Access Key ID."
  type        = string
  sensitive   = true
}

variable "MyIamUserSecretAccessKey" {
  description = "IAM User - AWS Secret Access Key."
  type        = string
  sensitive   = true
}

variable "SgIngressSshCidr" {
  description = "The IP address range that can be used to SSH to the EC2 instances."
  type        = string
  validation {
    condition     = can(regex("^(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})/(\\d{1,2})$", var.SgIngressSshCidr))
    error_message = "The SgIngressSshCidr value must be a valid IP CIDR range of the form x.x.x.x/x."
  }
}

variable "MyInstanceType" {
  description = "EC2 instance type."
  type        = string
  default     = "t3.medium"
  validation {
    condition     = contains(["t2.micro", "t2.small", "t2.medium", "t3.micro", "t3.small", "t3.medium"], var.MyInstanceType)
    error_message = "Invalid instance type. Valid options are t2.micro, t2.small, t2.medium, t3.micro, t3.small, t3.medium."
  }
}

variable "ClusterBaseName" {
  description = "Base name of the cluster."
  type        = string
  default     = "pjh-dev-eks"
}

variable "KubernetesVersion" {
  description = "Kubernetes version for the EKS cluster."
  type        = string
  default     = "1.30"
}

variable "WorkerNodeInstanceType" {
  description = "EC2 instance type for the worker nodes."
  type        = string
  default     = "t3.xlarge"
}

variable "WorkerNodeCount" {
  description = "Number of worker nodes."
  type        = number
  default     = 3
}

variable "WorkerNodeVolumesize" {
  description = "Volume size for worker nodes (in GiB)."
  type        = number
  default     = 30
}

variable "TargetRegion" {
  description = "AWS region where the resources will be created."
  type        = string
  default     = "ap-northeast-2"
}

variable "availability_zones" {
  description = "List of availability zones."
  type        = list(string)
  default     = ["ap-northeast-2a", "ap-northeast-2b", "ap-northeast-2c", "ap-northeast-2d"]
}

 

    • ec2.tf
data "aws_ssm_parameter" "ami" {
  name = "/aws/service/canonical/ubuntu/server/22.04/stable/current/amd64/hvm/ebs-gp2/ami-id"
}

resource "aws_security_group" "eks_sec_group" {
  vpc_id = data.aws_vpc.service_vpc.id

  name        = "${var.ClusterBaseName}-bastion-sg"
  description = "Security group for ${var.ClusterBaseName} Host"

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [var.SgIngressSshCidr]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.ClusterBaseName}-HOST-SG"
  }
}

resource "aws_instance" "eks_bastion" {
  ami                         = data.aws_ssm_parameter.ami.value
  instance_type               = var.MyInstanceType
  key_name                    = var.KeyName
  subnet_id                   = data.aws_subnet.eks_public_1.id
  associate_public_ip_address = true
  private_ip                  = "172.20.10.100"
  vpc_security_group_ids      = [aws_security_group.eks_sec_group.id]

  tags = {
    Name = "${var.ClusterBaseName}-bastion-EC2"
  }

  root_block_device {
    volume_type           = "gp3"
    volume_size           = 30
    delete_on_termination = true
  }

  user_data = <<-EOF
    #!/bin/bash
    hostnamectl --static set-hostname "${var.ClusterBaseName}-bastion-EC2"

    # Config convenience
    echo 'alias vi=vim' >> /etc/profile
    echo "sudo su -" >> /home/ubuntu/.bashrc
    timedatectl set-timezone Asia/Seoul

    # Install Packages
    apt update
    apt install -y tree jq git htop unzip

    # Install kubectl & helm
    curl -O https://s3.us-west-2.amazonaws.com/amazon-eks/1.30.0/2024-05-12/bin/linux/amd64/kubectl
    install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
    curl -s https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash

    # Install eksctl
    curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_Linux_amd64.tar.gz" | tar xz -C /tmp
    mv /tmp/eksctl /usr/local/bin

    # Install aws cli v2
    curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
    unzip awscliv2.zip >/dev/null 2>&1
    ./aws/install
    complete -C '/usr/local/bin/aws_completer' aws
    echo 'export AWS_PAGER=""' >> /etc/profile
    echo "export AWS_DEFAULT_REGION=${var.TargetRegion}" >> /etc/profile

    # Install YAML Highlighter
    wget https://github.com/andreazorzetto/yh/releases/download/v0.4.0/yh-linux-amd64.zip
    unzip yh-linux-amd64.zip
    mv yh /usr/local/bin/

    # Install kube-ps1
    echo 'source <(kubectl completion bash)' >> /root/.bashrc
    echo 'alias k=kubectl' >> /root/.bashrc
    echo 'complete -F __start_kubectl k' >> /root/.bashrc
            
    git clone https://github.com/jonmosco/kube-ps1.git /root/kube-ps1
    cat <<"EOT" >> /root/.bashrc
    source /root/kube-ps1/kube-ps1.sh
    KUBE_PS1_SYMBOL_ENABLE=false
    function get_cluster_short() {
      echo "$1" | grep -o '${var.ClusterBaseName}[^/]*' | cut -c 1-14 
    }
    KUBE_PS1_CLUSTER_FUNCTION=get_cluster_short
    KUBE_PS1_SUFFIX=') '
    PS1='$(kube_ps1)'$PS1
    EOT

    # kubecolor
    apt install kubecolor
    echo 'alias kubectl=kubecolor' >> /root/.bashrc

    # Install kubectx & kubens
    git clone https://github.com/ahmetb/kubectx /opt/kubectx >/dev/null 2>&1
    ln -s /opt/kubectx/kubens /usr/local/bin/kubens
    ln -s /opt/kubectx/kubectx /usr/local/bin/kubectx

    # Install Docker
    curl -fsSL https://get.docker.com -o get-docker.sh
    sh get-docker.sh
    systemctl enable docker

    # Create SSH Keypair
    ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa

    # IAM User Credentials
    export AWS_ACCESS_KEY_ID="${var.MyIamUserAccessKeyID}"
    export AWS_SECRET_ACCESS_KEY="${var.MyIamUserSecretAccessKey}"
    export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
    echo "export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID" >> /etc/profile
    echo "export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY" >> /etc/profile
    echo "export ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)" >> /etc/profile

    # CLUSTER_NAME
    export CLUSTER_NAME="${var.ClusterBaseName}"
    echo "export CLUSTER_NAME=$CLUSTER_NAME" >> /etc/profile

    # VPC & Subnet
    export VPCID=$(aws ec2 describe-vpcs --filters "Name=tag:Name,Values=service" | jq -r .Vpcs[].VpcId)
    echo "export VPCID=$VPCID" >> /etc/profile
    export PublicSubnet1=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=service-public-01" | jq -r '.Subnets[] | select(.CidrBlock | startswith("172.31")).SubnetId')
    export PublicSubnet2=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=service-public-02" | jq -r '.Subnets[] | select(.CidrBlock | startswith("172.31")).SubnetId')
    export PublicSubnet3=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=service-public-03" | jq -r '.Subnets[] | select(.CidrBlock | startswith("172.31")).SubnetId')
    echo "export PublicSubnet1=$PublicSubnet1" >> /etc/profile
    echo "export PublicSubnet2=$PublicSubnet2" >> /etc/profile
    echo "export PublicSubnet3=$PublicSubnet3" >> /etc/profile
    export PrivateSubnet1=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=eks-private-01" | jq -r '.Subnets[] | select(.CidrBlock | startswith("172.31")).SubnetId')
    export PrivateSubnet2=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=eks-private-02" | jq -r '.Subnets[] | select(.CidrBlock | startswith("172.31")).SubnetId')
    export PrivateSubnet3=$(aws ec2 describe-subnets --filters "Name=tag:Name,Values=eks-private-03" | jq -r '.Subnets[] | select(.CidrBlock | startswith("172.31")).SubnetId')
    echo "export PrivateSubnet1=$PrivateSubnet1" >> /etc/profile
    echo "export PrivateSubnet2=$PrivateSubnet2" >> /etc/profile
    echo "export PrivateSubnet3=$PrivateSubnet3" >> /etc/profile
    
    # Domain Name
    export MyDomain="${var.MyDomain}"
    echo "export MyDomain=$MyDomain" >> /etc/profile

    # ssh key-pair
    aws ec2 delete-key-pair --key-name kp_node
    aws ec2 create-key-pair --key-name kp_node --query 'KeyMaterial' --output text > ~/.ssh/kp_node.pem
    chmod 400 ~/.ssh/kp_node.pem

  EOF
  
  user_data_replace_on_change = true
  
}

 

  • main.tf
# provider
provider "aws" {
  region = var.TargetRegion
}

# caller_identity data
data "aws_caller_identity" "current" {}

# vpc data
data "aws_vpc" "service_vpc" {
  tags = {
    Name = "service"
  }
}

# subnet data
data "aws_subnet" "eks_public_1" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "eks-public-01"
  }
}
data "aws_subnet" "eks_public_2" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "eks-public-02"
  }
}
data "aws_subnet" "eks_public_3" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "eks-public-03"
  }
}
data "aws_subnet" "eks_private_1" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "eks-private-01"
  }
}
data "aws_subnet" "eks_private_2" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "eks-private-02"
  }
}
data "aws_subnet" "eks_private_3" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "eks-private-03"
  }
}

# route table data
data "aws_route_table" "service_public" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "service-public"
  }
}
data "aws_route_table" "service_private" {
  vpc_id = data.aws_vpc.service_vpc.id
  tags = {
    Name = "service-private"
  }
}

# node_group sg
resource "aws_security_group" "node_group_sg" {
  name        = "${var.ClusterBaseName}-node-group-sg"
  description = "Security group for EKS Node Group"
  vpc_id      = data.aws_vpc.service_vpc.id

  tags = {
    Name = "${var.ClusterBaseName}-node-group-sg"
  }
}
resource "aws_security_group_rule" "allow_ssh" {
  type        = "ingress"
  from_port   = 22
  to_port     = 22
  protocol    = "tcp"
  cidr_blocks = ["172.20.10.100/32"]

  security_group_id = aws_security_group.node_group_sg.id
}

# eks module
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~>20.0"

  cluster_name    = var.ClusterBaseName
  cluster_version = var.KubernetesVersion

  cluster_endpoint_private_access = true
  cluster_endpoint_public_access  = false

  cluster_addons = {
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent = true
      configuration_values = jsonencode({
        enableNetworkPolicy = "true"
      })
    }
    eks-pod-identity-agent = {
      most_recent = true
    }
    aws-ebs-csi-driver = {
      most_recent = true
      service_account_role_arn = module.irsa-ebs-csi.iam_role_arn
    }
    snapshot-controller = {
      most_recent = true
    }
    aws-efs-csi-driver = {
      most_recent = true
      service_account_role_arn = module.irsa-efs-csi.iam_role_arn
    }
    aws-mountpoint-s3-csi-driver = {
      most_recent = true
      service_account_role_arn = module.irsa-s3-csi.iam_role_arn
    }
  }

  vpc_id     = data.aws_vpc.service_vpc.id
  subnet_ids = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]

  eks_managed_node_group_defaults = {
    ami_type = "AL2_x86_64"

  }

  eks_managed_node_groups = {
    default = {
      name             = "${var.ClusterBaseName}-node-group"
      use_name_prefix  = false
      instance_types    = ["${var.WorkerNodeInstanceType}"]
      desired_size     = var.WorkerNodeCount
      max_size         = var.WorkerNodeCount + 2
      min_size         = var.WorkerNodeCount - 1
      disk_size        = var.WorkerNodeVolumesize
      subnets          = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]
      key_name         = "kp_node"
      vpc_security_group_ids = [aws_security_group.node_group_sg.id]
      iam_role_name    = "${var.ClusterBaseName}-node-group-eks-node-group"
      iam_role_use_name_prefix = false
   }
  }

  # Cluster access entry
  enable_cluster_creator_admin_permissions = false
  access_entries = {
    admin = {
      kubernetes_groups = []
      principal_arn     = "${data.aws_caller_identity.current.arn}" 

      policy_associations = {
        myeks = {
          policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
          access_scope = {
            namespaces = []
            type       = "cluster"
          }
        }
      }
    }
  }
}

# cluster sg - ingress rule add
resource "aws_security_group_rule" "cluster_sg_add" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = ["172.20.0.0/16"]
  security_group_id = module.eks.cluster_security_group_id
  depends_on        = [module.eks]
}

# shared node sg - ingress rule add
resource "aws_security_group_rule" "node_sg_add" {
  type              = "ingress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = ["172.20.0.0/16"]
  security_group_id = module.eks.node_security_group_id
  depends_on        = [module.eks]
}

# vpc endpoint
module "vpc_vpc-endpoints" {
  source  = "terraform-aws-modules/vpc/aws//modules/vpc-endpoints"
  version = "5.8.1"

  vpc_id             = data.aws_vpc.service_vpc.id
  security_group_ids = [module.eks.node_security_group_id]

  endpoints = {
    # gateway endpoints
    s3 = {
      service = "s3"
      route_table_ids = [data.aws_route_table.service_private.id]
      tags = { Name = "s3-vpc-endpoint" }
    }
    
    # interface endpoints
    ec2 = {
      service = "ec2"
      subnet_ids = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]
      tags = { Name = "ec2-vpc-endpoint" }
    }
    elasticloadbalancing = {
      service = "elasticloadbalancing"
      subnet_ids = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]
      tags = { Name = "elasticloadbalancing-vpc-endpoint" }
    }
    ecr_api = {
      service = "ecr.api"
      subnet_ids = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]
      tags = { Name = "ecr-api-vpc-endpoint" }
    }
    ecr_dkr = {
      service = "ecr.dkr"
      subnet_ids = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]
      tags = { Name = "ecr-api-vpc-endpoint" }
    }
    sts = {
      service = "sts"
      subnet_ids = [data.aws_subnet.eks_private_1.id, data.aws_subnet.eks_private_2.id, data.aws_subnet.eks_private_3.id]
      tags = { Name = "sts-vpc-endpoint" }
    }
  }
}

# efs Create
module "efs" {
  source = "terraform-aws-modules/efs/aws"
  version = "1.6.3"

  # File system
  name           = "${var.ClusterBaseName}-efs"
  encrypted      = true

  lifecycle_policy = {
    transition_to_ia = "AFTER_30_DAYS"
  }

  # Mount targets
  mount_targets = {
    "${var.availability_zones[0]}" = {
      subnet_id = data.aws_subnet.eks_private_1.id
    }
    "${var.availability_zones[1]}" = {
      subnet_id = data.aws_subnet.eks_private_2.id
    }
    "${var.availability_zones[2]}" = {
      subnet_id = data.aws_subnet.eks_private_3.id
    }
  }

  # security group (allow - tcp 2049)
  security_group_description = "EFS security group"
  security_group_vpc_id      = data.aws_vpc.service_vpc.id
  security_group_rules = {
    vpc = {
      description = "EFS ingress from VPC private subnets"
      cidr_blocks = ["172.20.0.0/16"]
    }
  }

  tags = {
    Terraform   = "true"
    Environment = "dev"
  }
}

# S3 Bucket Create
resource "aws_s3_bucket" "main" {
  bucket = "${var.ClusterBaseName}-${var.MyDomain}"

  tags = {
    Name        = "${var.ClusterBaseName}-s3-bucket"
  }
}

 

  • iam.tf
#####################
# Create IAM Policy #
#####################

# AWSLoadBalancerController IAM Policy
resource "aws_iam_policy" "aws_lb_controller_policy" {
  name        = "${var.ClusterBaseName}AWSLoadBalancerControllerPolicy"
  description = "Policy for allowing AWS LoadBalancerController to modify AWS ELB"

  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:CreateServiceLinkedRole"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "iam:AWSServiceName": "elasticloadbalancing.amazonaws.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeAccountAttributes",
                "ec2:DescribeAddresses",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInternetGateways",
                "ec2:DescribeVpcs",
                "ec2:DescribeVpcPeeringConnections",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeInstances",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeTags",
                "ec2:GetCoipPoolUsage",
                "ec2:DescribeCoipPools",
                "elasticloadbalancing:DescribeLoadBalancers",
                "elasticloadbalancing:DescribeLoadBalancerAttributes",
                "elasticloadbalancing:DescribeListeners",
                "elasticloadbalancing:DescribeListenerCertificates",
                "elasticloadbalancing:DescribeSSLPolicies",
                "elasticloadbalancing:DescribeRules",
                "elasticloadbalancing:DescribeTargetGroups",
                "elasticloadbalancing:DescribeTargetGroupAttributes",
                "elasticloadbalancing:DescribeTargetHealth",
                "elasticloadbalancing:DescribeTags",
                "elasticloadbalancing:DescribeListenerAttributes"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cognito-idp:DescribeUserPoolClient",
                "acm:ListCertificates",
                "acm:DescribeCertificate",
                "iam:ListServerCertificates",
                "iam:GetServerCertificate",
                "waf-regional:GetWebACL",
                "waf-regional:GetWebACLForResource",
                "waf-regional:AssociateWebACL",
                "waf-regional:DisassociateWebACL",
                "wafv2:GetWebACL",
                "wafv2:GetWebACLForResource",
                "wafv2:AssociateWebACL",
                "wafv2:DisassociateWebACL",
                "shield:GetSubscriptionState",
                "shield:DescribeProtection",
                "shield:CreateProtection",
                "shield:DeleteProtection"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:RevokeSecurityGroupIngress"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSecurityGroup"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:*:*:security-group/*",
            "Condition": {
                "StringEquals": {
                    "ec2:CreateAction": "CreateSecurityGroup"
                },
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags",
                "ec2:DeleteTags"
            ],
            "Resource": "arn:aws:ec2:*:*:security-group/*",
            "Condition": {
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "true",
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:DeleteSecurityGroup"
            ],
            "Resource": "*",
            "Condition": {
                "Null": {
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:CreateLoadBalancer",
                "elasticloadbalancing:CreateTargetGroup"
            ],
            "Resource": "*",
            "Condition": {
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:CreateRule",
                "elasticloadbalancing:DeleteRule"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:RemoveTags"
            ],
            "Resource": [
                "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
                "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
                "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
            ],
            "Condition": {
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "true",
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:RemoveTags"
            ],
            "Resource": [
                "arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*",
                "arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*",
                "arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*",
                "arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:AddTags"
            ],
            "Resource": [
                "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
                "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
                "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*"
            ],
            "Condition": {
                "StringEquals": {
                    "elasticloadbalancing:CreateAction": [
                        "CreateTargetGroup",
                        "CreateLoadBalancer"
                    ]
                },
                "Null": {
                    "aws:RequestTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:SetIpAddressType",
                "elasticloadbalancing:SetSecurityGroups",
                "elasticloadbalancing:SetSubnets",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:ModifyTargetGroup",
                "elasticloadbalancing:ModifyTargetGroupAttributes",
                "elasticloadbalancing:DeleteTargetGroup"
            ],
            "Resource": "*",
            "Condition": {
                "Null": {
                    "aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:RegisterTargets",
                "elasticloadbalancing:DeregisterTargets"
            ],
            "Resource": "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "elasticloadbalancing:SetWebAcl",
                "elasticloadbalancing:ModifyListener",
                "elasticloadbalancing:AddListenerCertificates",
                "elasticloadbalancing:RemoveListenerCertificates",
                "elasticloadbalancing:ModifyRule"
            ],
            "Resource": "*"
        }
    ]
}
EOF
}

# ExternalDNS IAM Policy
resource "aws_iam_policy" "external_dns_policy" {
  name        = "${var.ClusterBaseName}ExternalDNSPolicy"
  description = "Policy for allowing ExternalDNS to modify Route 53 records"

  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "route53:ChangeResourceRecordSets"
        ],
        "Resource": [
          "arn:aws:route53:::hostedzone/*"
        ]
      },
      {
        "Effect": "Allow",
        "Action": [
          "route53:ListHostedZones",
          "route53:ListResourceRecordSets"
        ],
        "Resource": [
          "*"
        ]
      }
    ]
  }
EOF
}

# Mountpoint S3 CSI Driver IAM Policy
resource "aws_iam_policy" "mountpoint_s3_csi_policy" {
  name        = "${var.ClusterBaseName}MountpointS3CSIPolicy"
  description = "Mountpoint S3 CSI Driver Policy"

  policy = <<EOF
{
   "Version": "2012-10-17",
   "Statement": [
        {
            "Sid": "MountpointFullBucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${var.ClusterBaseName}-${var.MyDomain}"
            ]
        },
        {
            "Sid": "MountpointFullObjectAccess",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::${var.ClusterBaseName}-${var.MyDomain}/*"
            ]
        }
   ]
}
EOF
}

#####################
# Create IRSA Roles #
#####################

# ebs-csi irsa
data "aws_iam_policy" "ebs_csi_policy" {
  arn = "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
}

module "irsa-ebs-csi" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "5.39.0"

  create_role                   = true
  role_name                     = "AmazonEKSTFEBSCSIRole-${module.eks.cluster_name}"
  provider_url                  = module.eks.oidc_provider
  role_policy_arns              = [data.aws_iam_policy.ebs_csi_policy.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:ebs-csi-controller-sa"]
}

# efs-csi irsa
data "aws_iam_policy" "efs_csi_policy" {
  arn = "arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy"
}

module "irsa-efs-csi" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "5.39.0"

  create_role                   = true
  role_name                     = "AmazonEKSTFEFSCSIRole-${module.eks.cluster_name}"
  provider_url                  = module.eks.oidc_provider
  role_policy_arns              = [data.aws_iam_policy.efs_csi_policy.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:efs-csi-controller-sa"]
}

# s3-csi irsa
module "irsa-s3-csi" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "5.39.0"

  create_role                   = true
  role_name                     = "AmazonEKSTFS3CSIRole-${module.eks.cluster_name}"
  provider_url                  = module.eks.oidc_provider
  role_policy_arns              = [aws_iam_policy.mountpoint_s3_csi_policy.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:s3-csi-driver-sa"]
  oidc_fully_qualified_audiences = ["sts.amazonaws.com"]
}


# aws-load-balancer-controller irsa
module "irsa-lb-controller" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "5.39.0"

  create_role                   = true
  role_name                     = "AmazonEKSTFLBControllerRole-${module.eks.cluster_name}"
  provider_url                  = module.eks.oidc_provider
  role_policy_arns              = [aws_iam_policy.aws_lb_controller_policy.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:aws-load-balancer-controller-sa"]
  oidc_fully_qualified_audiences = ["sts.amazonaws.com"]
}

# external-dns irsa
module "irsa-external-dns" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version = "5.39.0"

  create_role                   = true
  role_name                     = "AmazonEKSTFExternalDnsRole-${module.eks.cluster_name}"
  provider_url                  = module.eks.oidc_provider
  role_policy_arns              = [aws_iam_policy.external_dns_policy.arn]
  oidc_fully_qualified_subjects = ["system:serviceaccount:kube-system:external-dns-sa"]
  oidc_fully_qualified_audiences = ["sts.amazonaws.com"]
}

 

  • out.tf
output "public_ip" {
  value       = aws_instance.eks_bastion.public_ip
  description = "The public IP of the myeks-host EC2 instance."
}

 

  • Terraform 변수 선언 및 배포
# Terraform 환경 변수 저장
export TF_VAR_KeyName=[ssh keypair]
export TF_VAR_MyDomain=[Domain Name]
export TF_VAR_MyIamUserAccessKeyID=[iam 사용자의 access key id]
export TF_VAR_MyIamUserSecretAccessKey=[iam 사용자의 secret access key]
export TF_VAR_SgIngressSshCidr=$(curl -s ipinfo.io/ip)/32

# Terraform 배포 : 12~13분 소요
terraform init
terraform plan
terraform apply -auto-approve

 

    • 기본 설정 (aws-loadbalancer-controller, externaldns 설치 포함)
# ec2 접속
terraform output
ssh -i /Users/jeongheepark/.ssh/PJH-aws-test.pem ubuntu@$(terraform output -raw public_ip)

# EKS 클러스터 인증 정보 업데이트
aws eks update-kubeconfig --region $AWS_DEFAULT_REGION --name $CLUSTER_NAME

# kubectl 명령을 수행할 네임스페이스 지정
kubens default

# 노드 IP 변수 저장
N1=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2a -o jsonpath={.items[0].status.addresses[0].address})
N2=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2b -o jsonpath={.items[0].status.addresses[0].address})
N3=$(kubectl get node --label-columns=topology.kubernetes.io/zone --selector=topology.kubernetes.io/zone=ap-northeast-2c -o jsonpath={.items[0].status.addresses[0].address})
echo "export N1=$N1" >> /etc/profile
echo "export N2=$N2" >> /etc/profile
echo "export N3=$N3" >> /etc/profile
echo $N1, $N2, $N3

# 노드에 ssh 접근 확인
for node in $N1 $N2 $N3; do ssh -i ~/.ssh/kp_node.pem -o StrictHostKeyChecking=no ec2-user@$node hostname; done

# aws-loadbalancer-controller 설치 및 확인
kubectl create serviceaccount aws-load-balancer-controller-sa -n kube-system
kubectl annotate serviceaccount aws-load-balancer-controller-sa \
  -n kube-system \
  eks.amazonaws.com/role-arn=arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKSTFLBControllerRole-${CLUSTER_NAME}

helm repo add eks https://aws.github.io/eks-charts
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system \
  --set clusterName=$CLUSTER_NAME \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller-sa

kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
kubectl get deployment -n kube-system aws-load-balancer-controller
  
# externaldns 설치 및 확인
MyDnsHostedZoneId=`aws route53 list-hosted-zones-by-name --dns-name "${MyDomain}." --query "HostedZones[0].Id" --output text`
echo "export MyDnsHostedZoneId=$MyDnsHostedZoneId" >> /etc/profile
echo $MyDnsHostedZoneId

kubectl create serviceaccount external-dns-sa -n kube-system
kubectl annotate serviceaccount external-dns-sa \
  -n kube-system \
  eks.amazonaws.com/role-arn=arn:aws:iam::${ACCOUNT_ID}:role/AmazonEKSTFExternalDnsRole-${CLUSTER_NAME}

helm repo add external-dns https://kubernetes-sigs.github.io/external-dns/
helm repo update
helm install external-dns external-dns/external-dns -n kube-system \
  --set serviceAccount.create=false \
  --set serviceAccount.name=external-dns-sa \
  --set domainFilters={${MyDomain}} \
  --set txtOwnerId=${MyDnsHostedZoneId}
  
kubectl get pods -n kube-system -l app.kubernetes.io/name=external-dns
kubectl get deployment -n kube-system external-dns

2. Observability Backends 설치

  • 사전 준비
# 변수 선언
export CERT_ARN=`aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text`
echo "export CERT_ARN=$CERT_ARN" >> /etc/profile; echo $CERT_ARN

export NICKNAME=pjh
echo "export NICKNAME=$NICKNAME" >> /etc/profile; echo $NICKNAME

export OIDC_ARN=$(aws iam list-open-id-connect-providers --query 'OpenIDConnectProviderList[*].Arn' --output text)
echo "export OIDC_ARN=$OIDC_ARN" >> /etc/profile; echo $OIDC_ARN

export OIDC_URL=${OIDC_ARN#*oidc-provider/}
echo "export OIDC_URL=$OIDC_URL" >> /etc/profile; echo $OIDC_URL

# gp3 storage class 생성
cat <<EOT | kubectl apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp3
allowVolumeExpansion: true
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  allowAutoIOPSPerGBIncrease: 'true'
  encrypted: 'true'
EOT

# helm repo - prometheus-community & grafana
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# namespace 생성
kubectl create ns monitoring
kubectl create ns logging
kubectl create ns tracing
kubectl create ns grafana

# values 파일 다운로드
wget https://github.com/cloudneta/cnaeelab/raw/master/_data/values.zip
unzip values.zip; rm values.zip
cd values
tree
--
.
├── grafana-temp-values.yaml
├── loki-temp-values.yaml
├── mimir-temp-values.yaml
├── otelcol-values.yaml
└── tempo-temp-values.yaml

  • [Metrics] Mimir
# [모니터링1] monitoring 네임스페이스 - pod, pv, pvc, configmap 모니터링
watch kubectl get pod,pv,pvc,cm -n monitoring

# [모니터링2] 동적 프로비저닝으로 생성되는 EBS 볼륨 확인
while true; do aws ec2 describe-volumes \
  --filters Name=tag:ebs.csi.aws.com/cluster,Values=true \
  --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" \
  --output text; date; sleep 1; done
  
mkdir ~/irsa; cd ~/irsa; echo $NICKNAME

# mimir용 s3 bucket 생성
aws s3api create-bucket \
  --bucket mimir-${NICKNAME} \
  --region $AWS_DEFAULT_REGION \
  --create-bucket-configuration LocationConstraint=$AWS_DEFAULT_REGION
{
    "Location": "http://mimir-pjh.s3.amazonaws.com/"
}

# s3 bucket 이름 변수 저장
export MIMIR_BUCKET_NAME="mimir-${NICKNAME}"
echo "export MIMIR_BUCKET_NAME=$MIMIR_BUCKET_NAME" >> /etc/profile; echo $MIMIR_BUCKET_NAME
mimir-pjh

# grafana-mimir-s3-poilcy.json 파일 생성
cat >grafana-mimir-s3-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "MimirStorage",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::${MIMIR_BUCKET_NAME}",
                "arn:aws:s3:::${MIMIR_BUCKET_NAME}/*"
            ]
        }
    ]
}
EOF
cat grafana-mimir-s3-policy.json

# aws-mimir-s3 IAM Policy 생성
aws iam create-policy --policy-name aws-mimir-s3 --policy-document file://grafana-mimir-s3-policy.json

# Mimir IAM Role Trust rs 생성
cat >trust-rs-mimir.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "${OIDC_ARN}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "${OIDC_URL}:sub": "system:serviceaccount:monitoring:mimir",
                    "${OIDC_URL}:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}
EOF
cat trust-rs-mimir.json

# AWS-Mimir-Role 생성
aws iam create-role --role-name AWS-Mimir-Role --assume-role-policy-document file://trust-rs-mimir.json

# IAM Policy와 IAM Role 연결
aws iam attach-role-policy --role-name AWS-Mimir-Role --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/aws-mimir-s3

# Mimir IAM Role ARN 변수 선언
export MIMIR_ROLE_ARN=arn:aws:iam::${ACCOUNT_ID}:role/AWS-Mimir-Role
echo "export MIMIR_ROLE_ARN=$MIMIR_ROLE_ARN" >> /etc/profile; echo $MIMIR_ROLE_ARN

# mimir-values.yaml 확인
export | grep MIMIR # 변수 출력 확인
cd ~/values; envsubst < mimir-temp-values.yaml > mimir-values.yaml
cat mimir-values.yaml | yh
--
image: 
  repository: grafana/mimir
  tag: 2.10.3
  pullPolicy: IfNotPresent
mimir: 
  structuredConfig: 
    limits: 
      max_label_names_per_series: 60
      compactor_blocks_retention_period: 30d
    blocks_storage: 
      backend: s3
      s3: 
        bucket_name: mimir-pjh
        endpoint: s3.ap-northeast-2.amazonaws.com
        region: ap-northeast-2
      tsdb: 
        retention_period: 13h
      bucket_store: 
        ignore_blocks_within: 10h
    querier: 
      query_store_after: 12h
    ingester: 
      ring: 
        replication_factor: 3
serviceAccount: 
  create: true
  name: "mimir"
  annotations: 
    "eks.amazonaws.com/role-arn": "arn:aws:iam::20....:role/AWS-Mimir-Role"
minio: 
  enabled: false
alertmanager: 
  enabled: false
ruler: 
  enabled: false
compactor: 
  persistentVolume: 
    enabled: true
    annotations: {}
    accessModes: 
      - ReadWriteOnce
    size: 5Gi
    storageClass: gp3
ingester: 
  zoneAwareReplication: 
    enabled: false
  persistentVolume: 
    enabled: true
    annotations: {}
    accessModes: 
      - ReadWriteOnce
    size: 5Gi
    storageClass: gp3
store_gateway: 
  zoneAwareReplication: 
    enabled: false
  persistentVolume: 
    enabled: true
    annotations: {}
    accessModes: 
      - ReadWriteOnce
    size: 5Gi
    storageClass: gp3

# mimir-values 파일을 활용해서 Mimir를 helm chart로 설치
helm install mimir grafana/mimir-distributed -n monitoring -f mimir-values.yaml --version 5.4.0
--
NAME: mimir
LAST DEPLOYED: Wed Feb 19 13:28:53 2025
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
Welcome to Grafana Mimir!
Remote write endpoints for Prometheus or Grafana Agent:
Ingress is not enabled, see the nginx.ingress values.
From inside the cluster:
  http://mimir-nginx.monitoring.svc:80/api/v1/push

Read address, Grafana data source (Prometheus) URL:
Ingress is not enabled, see the nginx.ingress values.
From inside the cluster:
  http://mimir-nginx.monitoring.svc:80/prometheus

**IMPORTANT**: Always consult CHANGELOG.md file at https://github.com/grafana/mimir/blob/main/operations/helm/charts/mimir-distributed/CHANGELOG.md and the deprecation list there to learn about breaking changes that require action during upgrade.


    • [Logs] Loki Backend
# [모니터링1] logging 네임스페이스 - pod, pv, pvc 모니터링
watch kubectl get pod,pv,pvc -n logging

# [모니터링2] 동적 프로비저닝으로 생성되는 EBS 볼륨 확인
while true; do aws ec2 describe-volumes \
  --filters Name=tag:ebs.csi.aws.com/cluster,Values=true \
  --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" \
  --output text; date; sleep 1; done
  
  cd ~/irsa

# loki용 s3 bucket 생성
aws s3api create-bucket \
  --bucket loki-${NICKNAME} \
  --region $AWS_DEFAULT_REGION \
  --create-bucket-configuration LocationConstraint=$AWS_DEFAULT_REGION

# s3 bucket 이름 변수 저장
export LOKI_BUCKET_NAME="loki-${NICKNAME}"
echo "export LOKI_BUCKET_NAME=$LOKI_BUCKET_NAME" >> /etc/profile; echo $LOKI_BUCKET_NAME

# grafana-loki-s3-poilcy.json 파일 생성
cat >grafana-loki-s3-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "LokiStorage",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::${LOKI_BUCKET_NAME}",
                "arn:aws:s3:::${LOKI_BUCKET_NAME}/*"
            ]
        }
    ]
}
EOF
cat grafana-loki-s3-policy.json

# aws-loki-s3 IAM Policy 생성
aws iam create-policy --policy-name aws-loki-s3 --policy-document file://grafana-loki-s3-policy.json

# Loki IAM Role Trust rs 생성
cat >trust-rs-loki.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "${OIDC_ARN}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "${OIDC_URL}:sub": "system:serviceaccount:logging:loki",
                    "${OIDC_URL}:aud": "sts.amazonaws.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "${OIDC_ARN}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "${OIDC_URL}:sub": "system:serviceaccount:logging:loki-compactor",
                    "${OIDC_URL}:aud": "sts.amazonaws.com"
                }
            }
        }          
    ]
}
EOF
cat trust-rs-loki.json

# AWS-Loki-Role 생성
aws iam create-role --role-name AWS-Loki-Role --assume-role-policy-document file://trust-rs-loki.json

# IAM Policy와 IAM Role 연결
aws iam attach-role-policy --role-name AWS-Loki-Role --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/aws-loki-s3

# Loki IAM Role ARN 변수 선언
export LOKI_ROLE_ARN=arn:aws:iam::${ACCOUNT_ID}:role/AWS-Loki-Role
echo "export LOKI_ROLE_ARN=$LOKI_ROLE_ARN" >> /etc/profile; echo $LOKI_ROLE_ARN

# loki-values.yaml 파일 확인
export | grep LOKI # 변수 출력 확인
cd ~/values; envsubst < loki-temp-values.yaml > loki-values.yaml
cat loki-values.yaml | yh
--
image: 
  repository: grafana/loki
  tag: 2.9.8
  pullPolicy: IfNotPresent
loki: 
  server: 
    http_listen_port: 3100
    grpc_server_max_recv_msg_size: 8938360
    grpc_server_max_send_msg_size: 8938360
  structuredConfig: 
    auth_enabled: false
    compactor: 
      apply_retention_interval: 1h
      compaction_interval: 10m
      retention_delete_delay: 2h
      retention_delete_worker_count: 150
      retention_enabled: true
      shared_store: s3
      working_directory: /var/loki/compactor
    limits_config: 
      max_global_streams_per_user: 100000
      max_streams_per_user: 100000
      reject_old_samples: false
      retention_period: 30d
      per_stream_rate_limit: 3MB
      per_stream_rate_limit_burst: 10MB
      max_query_parallelism: 90
      ingestion_rate_mb: 512
      ingestion_burst_size_mb: 1024
    ingester: 
      max_transfer_retries: 0
      chunk_idle_period: 1h
      chunk_target_size: 1572864
      max_chunk_age: 2h
      chunk_encoding: snappy
      lifecycler: 
        ring: 
          kvstore: 
            store: memberlist
          replication_factor: 3
        heartbeat_timeout: 10m
      wal: 
        dir: /var/loki/wal
        replay_memory_ceiling: 800mb
    storage_config: 
      aws: 
        region: ap-northeast-2
        bucketnames: loki-pjh
        s3forcepathstyle: false
        insecure: false
      tsdb_shipper: 
        shared_store: s3
        active_index_directory: /var/loki/tsdb-index
        cache_location: /var/loki/tsdb-cache
      index_queries_cache_config: 
        memcached: 
          batch_size: 100
          parallelism: 100
    schema_config: 
      configs: 
        - from: 2023-10-31
          store: tsdb
          object_store: s3
          schema: v12
          index: 
            prefix: loki_index_
            period: 24h
    chunk_store_config: 
      max_look_back_period: 48h
      chunk_cache_config: 
        memcached: 
          batch_size: 100
          parallelism: 100
      write_dedupe_cache_config: 
        memcached: 
          batch_size: 100
          parallelism: 100
    querier: 
      max_concurrent: 16
    query_scheduler: 
      max_outstanding_requests_per_tenant: 32768
serviceAccount: 
  create: true
  name: "loki"
  annotations: 
    "eks.amazonaws.com/role-arn": "arn:aws:iam::20....:role/AWS-Loki-Role"
  automountServiceAccountToken: true
ingester: 
  replicas: 3
  maxUnavailable: 1
  resources: 
    requests: 
      cpu: 100m
      memory: 256Mi
    limits: 
      memory: 1Gi
  persistence: 
    enabled: true
    claims: 
      - name: data
        size: 10Gi
        storageClass: gp3
distributor: 
  resources: 
    requests: 
      cpu: 100m
      memory: 256Mi
    limits: 
      memory: 256Mi
querier: 
  resources: 
    requests: 
      cpu: 100m
      memory: 512Mi
    limits: 
      memory: 512Mi
queryFrontend: 
  resources: 
    requests: 
      cpu: 100m
      memory: 512Mi
    limits: 
      memory: 512Mi
gateway: 
  resources: 
    requests: 
      cpu: 100m
      memory: 512Mi
    limits: 
      memory: 512Mi
compactor: 
  enabled: true
  serviceAccount: 
    create: true
    name: "loki-compactor"
    annotations: 
      "eks.amazonaws.com/role-arn": "arn:aws:iam::20....:role/AWS-Loki-Role"
    automountServiceAccountToken: true
indexGateway: 
  enabled: true
memcachedChunks: 
  enabled: true
  extraArgs: 
    - -I 32m
memcachedFrontend: 
  enabled: true
  extraArgs: 
    - -I 32m
memcachedIndexQueries: 
  enabled: true
  extraArgs: 
    - -I 32m

# loki-values 파일을 활용해서 loki를 helm chart로 설치
helm install loki grafana/loki-distributed -n logging -f loki-values.yaml --version 0.79.1
--
NAME: loki
LAST DEPLOYED: Wed Feb 19 13:34:42 2025
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
 Welcome to Grafana Loki
 Chart version: 0.79.1
 Loki version: 2.9.8
***********************************************************************

Installed components:
* gateway
* ingester
* distributor
* querier
* query-frontend
* compactor
* index-gateway
* memcached-chunks
* memcached-frontend
* memcached-index-queries


    • [Traces] Tempo Backend
# [모니터링1] tracing 네임스페이스 - pod, pv, pvc 모니터링
watch kubectl get pod,pv,pvc -n tracing

cd ~/irsa

# tempo용 s3 bucket 생성
aws s3api create-bucket \
  --bucket tempo-${NICKNAME} \
  --region $AWS_DEFAULT_REGION \
  --create-bucket-configuration LocationConstraint=$AWS_DEFAULT_REGION
{
    "Location": "http://tempo-pjh.s3.amazonaws.com/"
}

# s3 bucket 이름 변수 저장
export TEMPO_BUCKET_NAME="tempo-${NICKNAME}"
echo "export TEMPO_BUCKET_NAME=$TEMPO_BUCKET_NAME" >> /etc/profile; echo $TEMPO_BUCKET_NAME

# grafana-tempo-s3-poilcy.json 파일 생성
cat >grafana-tempo-s3-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TempoStorage",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::${TEMPO_BUCKET_NAME}",
                "arn:aws:s3:::${TEMPO_BUCKET_NAME}/*"
            ]
        }
    ]
}
EOF
cat grafana-tempo-s3-policy.json

# aws-tempo-s3 IAM Policy 생성
aws iam create-policy --policy-name aws-tempo-s3 --policy-document file://grafana-tempo-s3-policy.json

# Tempo IAM Role Trust rs 생성
cat >trust-rs-tempo.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "${OIDC_ARN}"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "${OIDC_URL}:sub": "system:serviceaccount:tracing:tempo",
                    "${OIDC_URL}:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}
EOF
cat trust-rs-tempo.json

# AWS-Tempo-Role 생성
aws iam create-role --role-name AWS-Tempo-Role --assume-role-policy-document file://trust-rs-tempo.json

# IAM Policy와 IAM Role 연결
aws iam attach-role-policy --role-name AWS-Tempo-Role --policy-arn arn:aws:iam::${ACCOUNT_ID}:policy/aws-tempo-s3

# Tempo IAM Role ARN 변수 선언
export TEMPO_ROLE_ARN=arn:aws:iam::${ACCOUNT_ID}:role/AWS-Tempo-Role
echo "export TEMPO_ROLE_ARN=$TEMPO_ROLE_ARN" >> /etc/profile; echo $TEMPO_ROLE_ARN

# Mimir, Loki, Tempo관련 IAM Policy와 IAM Role 확인
aws iam list-policies \
  --query 'Policies[?contains(PolicyName, `mimir`) || contains(PolicyName, `loki`) || contains(PolicyName, `tempo`)].PolicyName' \
  --output text
--
aws-loki-s3     aws-mimir-s3    aws-tempo-s3

aws iam list-roles \
  --query 'Roles[?contains(RoleName, `Mimir`) || contains(RoleName, `Loki`) || contains(RoleName, `Tempo`)].RoleName' \
  --output text
--
AWS-Loki-Role   AWS-Mimir-Role  AWS-Tempo-Role

# tempo-values.yaml 파일 확인
export | grep TEMPO # 변수 출력 확인
cd ~/values; envsubst < tempo-temp-values.yaml > tempo-values.yaml
cat tempo-values.yaml | yh
--
image: 
  repository: grafana/tempo
  tag: 2.2.3
  pullPolicy: IfNotPresent
traces: 
  otlp: 
    http: 
      enabled: true
    grpc: 
      enabled: true
tempo: 
  structuredConfig: 
    ingester: 
      lifecycler: 
        ring: 
          replication_factor: 3
      max_block_bytes: 104857600
      max_block_duration: 10m
      complete_block_timeout: 15m
storage: 
  trace: 
    backend: s3
    s3: 
      region: ap-northeast-2
      bucket: "tempo-pjh"
      endpoint: "s3.ap-northeast-2.amazonaws.com"
      insecure: true
    search: 
      cache_control: 
        footer: true
    pool: 
      max_workers: 400
      queue_depth: 20000
    wal: 
      path: /var/tempo/wal
distributor: 
  replicas: 2
  config: 
    log_received_spans: 
      enabled: true
ingester: 
  replicas: 3
  persistence: 
    enabled: true
    size: 10Gi
    storageClass: gp3
serviceAccount: 
  create: true
  name: "tempo"
  annotations: 
    "eks.amazonaws.com/role-arn": "arn:aws:iam::20....:role/AWS-Tempo-Role"
  automountServiceAccountToken: true

# tempo-values 파일을 활용해서 tempo를 helm chart로 설치
helm install tempo grafana/tempo-distributed -n tracing -f tempo-values.yaml --version 1.15.2
--
NAME: tempo
LAST DEPLOYED: Wed Feb 19 16:17:28 2025
NAMESPACE: tracing
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
 Welcome to Grafana Tempo
 Chart version: 1.15.2
 Tempo version: 2.5.0
***********************************************************************

Installed components:
* ingester
* distributor
* querier
* query-frontend
* compactor
* memcached


  • 확인
# helm chart로 설치한 Obervability Backend 시스템 확인
for ns in monitoring logging tracing; do helm list -n $ns; done
--
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
mimir   monitoring      1               2025-02-19 16:12:02.594137962 +0900 KST deployed        mimir-distributed-5.4.0 2.13.0     
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
loki    logging         1               2025-02-19 16:13:06.131815687 +0900 KST deployed        loki-distributed-0.79.1 2.9.8      
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                         APP VERSION
tempo   tracing         1               2025-02-19 16:17:28.168600647 +0900 KST deployed        tempo-distributed-1.15.2      2.5.0

3. OpenTelemetry Operator & Collector 설치

    • cert-manager 설치
# helm repository 추가
helm repo add jetstack https://charts.jetstack.io --force-update
helm repo update

# cert-manager 설치
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.15.1 \
  --set crds.enabled=true
--
NAME: cert-manager
LAST DEPLOYED: Wed Feb 19 16:20:24 2025
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.15.1 has been deployed successfully!
....

# cert-manager 생성 확인
kubectl get all -n cert-manager
NAME                                           READY   STATUS    RESTARTS   AGE
pod/cert-manager-84489bc478-mtcmh              1/1     Running   0          50s
pod/cert-manager-cainjector-7477d56b47-z49d6   1/1     Running   0          50s
pod/cert-manager-webhook-6d5cb854fc-j9p6q      1/1     Running   0          50s

NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/cert-manager           ClusterIP   10.100.49.72     <none>        9402/TCP   50s
service/cert-manager-webhook   ClusterIP   10.100.129.211   <none>        443/TCP    50s

NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cert-manager              1/1     1            1           50s
deployment.apps/cert-manager-cainjector   1/1     1            1           50s
deployment.apps/cert-manager-webhook      1/1     1            1           50s

NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-84489bc478              1         1         1       50s
replicaset.apps/cert-manager-cainjector-7477d56b47   1         1         1       50s
replicaset.apps/cert-manager-webhook-6d5cb854fc      1         1         1       50s

kubectl get pod -n cert-manager
NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-84489bc478-mtcmh              1/1     Running   0          20s
cert-manager-cainjector-7477d56b47-z49d6   1/1     Running   0          20s
cert-manager-webhook-6d5cb854fc-j9p6q      1/1     Running   0          20s

# cert-manager crd 확인
kubectl get crd | grep cert-manager
--
certificaterequests.cert-manager.io                        2025-02-19T07:20:26Z
certificates.cert-manager.io                               2025-02-19T07:20:26Z
challenges.acme.cert-manager.io                            2025-02-19T07:20:26Z
clusterissuers.cert-manager.io                             2025-02-19T07:20:26Z
issuers.cert-manager.io                                    2025-02-19T07:20:26Z
orders.acme.cert-manager.io                                2025-02-19T07:20:26Z

 

    • OTel Kubernetes Operator 설치
# 모니터링 - otel 네임스페이스
watch kubectl get all -n otel

# helm repository 추가
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# otel-operator 설치
helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
  --namespace otel \
  --create-namespace \
  --set "manager.collectorImage.repository=otel/opentelemetry-collector-k8s"
--
NAME: opentelemetry-operator
LAST DEPLOYED: Wed Feb 19 16:21:52 2025
NAMESPACE: otel
STATUS: deployed
REVISION: 1
NOTES:
[WARNING] No resource limits or requests were set. Consider setter resource requests and limits via the `resources` field.


opentelemetry-operator has been installed. Check its status by running:
  kubectl --namespace otel get pods -l "app.kubernetes.io/name=opentelemetry-operator"
...

# otel 생성 확인
kubectl get all -n otel
NAME                                          READY   STATUS    RESTARTS   AGE
pod/opentelemetry-operator-6f4c546d56-p57fh   2/2     Running   0          41s

NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/opentelemetry-operator           ClusterIP   10.100.193.96   <none>        8443/TCP,8080/TCP   41s
service/opentelemetry-operator-webhook   ClusterIP   10.100.102.81   <none>        443/TCP             41s

NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/opentelemetry-operator   1/1     1            1           41s

NAME                                                DESIRED   CURRENT   READY   AGE
replicaset.apps/opentelemetry-operator-6f4c546d56   1         1         1       41s

# otel crd 설치 확인
kubectl get crd | grep open
--
instrumentations.opentelemetry.io                          2025-02-19T07:21:54Z
opampbridges.opentelemetry.io                              2025-02-19T07:21:54Z
opentelemetrycollectors.opentelemetry.io                   2025-02-19T07:21:54Z

 

  • OTel Kubernetes Collector 설치
# otelcol-values.yaml 확인
cd ~/values; cat otelcol-values.yaml | yh
--
mode: deployment
replicaCount: 3
clusterRole: 
  create: true
  rules: 
  - apiGroups: 
    - ""
    resources: 
    - pods
    - namespaces
    - nodes
    - nodes/proxy
    - services
    - endpoints
    verbs: 
    - get
    - watch
    - list
  - apiGroups: 
    - extensions
    resources: 
    - ingresses
    verbs: 
    - get
    - list
    - watch
  - nonResourceURLs: 
    - /metrics
    verbs: 
    - get
config: 
  exporters: 
    prometheusremotewrite: 
      endpoint: http://mimir-nginx.monitoring/api/v1/push
      tls: 
        insecure: true
      external_labels: 
        label_name: $KUBE_NODE_NAME
    loki: 
      endpoint: http://loki-loki-distributed-gateway.logging/loki/api/v1/push
    otlp: 
      endpoint: http://tempo-distributor-discovery.tracing:4317
      tls: 
        insecure: true
  receivers: 
    jaeger: null
    otlp: 
      protocols: 
        grpc: 
          endpoint: 0.0.0.0:4317
        http: 
          endpoint: 0.0.0.0:4318
    prometheus: 
      config: 
        global: 
          scrape_interval: 60s
          scrape_timeout: 30s
        scrape_configs: 
        - job_name: kubernetes-nodes-cadvisor
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs: 
          - role: node
          relabel_configs: 
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - replacement: kubernetes.default.svc:443
            target_label: __address__
          - regex: (.+)
            replacement: /api/v1/nodes/$$1/proxy/metrics/cadvisor
            source_labels: 
            - __meta_kubernetes_node_name
            target_label: __metrics_path__
          - action: keep
            regex: $KUBE_NODE_NAME
            source_labels: [__meta_kubernetes_node_name]
          scheme: https
          tls_config: 
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
        - job_name: kubernetes-nodes
          bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          kubernetes_sd_configs: 
          - role: node
          relabel_configs: 
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - replacement: kubernetes.default.svc:443
            target_label: __address__
          - regex: (.+)
            replacement: /api/v1/nodes/$$1/proxy/metrics
            source_labels: 
            - __meta_kubernetes_node_name
            target_label: __metrics_path__
          - action: keep
            regex: $KUBE_NODE_NAME
            source_labels: [__meta_kubernetes_node_name]
          scheme: https
          tls_config: 
            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
            insecure_skip_verify: true
        - job_name: kubernetes-service-endpoints
          kubernetes_sd_configs: 
          - role: endpoints
          relabel_configs: 
          - action: keep
            regex: true
            source_labels: 
            - __meta_kubernetes_service_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels: 
            - __meta_kubernetes_service_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels: 
            - __meta_kubernetes_service_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $$1:$$2
            source_labels: 
            - __address__
            - __meta_kubernetes_service_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
            replacement: __param_$$1
          - action: labelmap
            regex: __meta_kubernetes_service_label_(.+)
          - action: replace
            source_labels: 
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels: 
            - __meta_kubernetes_service_name
            target_label: kubernetes_name
          - action: replace
            source_labels: 
            - __meta_kubernetes_pod_node_name
            target_label: kubernetes_node
          - action: keep
            regex: $KUBE_NODE_NAME
            source_labels: [__meta_kubernetes_endpoint_node_name]
        - job_name: kubernetes-pods
          kubernetes_sd_configs: 
          - role: pod
          relabel_configs: 
          - action: keep
            regex: true
            source_labels: 
            - __meta_kubernetes_pod_annotation_prometheus_io_scrape
          - action: replace
            regex: (https?)
            source_labels: 
            - __meta_kubernetes_pod_annotation_prometheus_io_scheme
            target_label: __scheme__
          - action: replace
            regex: (.+)
            source_labels: 
            - __meta_kubernetes_pod_annotation_prometheus_io_path
            target_label: __metrics_path__
          - action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $$1:$$2
            source_labels: 
            - __address__
            - __meta_kubernetes_pod_annotation_prometheus_io_port
            target_label: __address__
          - action: labelmap
            regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
            replacement: __param_$$1
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - action: replace
            source_labels: 
            - __meta_kubernetes_namespace
            target_label: kubernetes_namespace
          - action: replace
            source_labels: 
            - __meta_kubernetes_pod_name
            target_label: kubernetes_pod_name
          - action: drop
            regex: Pending|Succeeded|Failed|Completed
            source_labels: 
            - __meta_kubernetes_pod_phase
          - action: keep
            regex: $KUBE_NODE_NAME
            source_labels: [__meta_kubernetes_pod_node_name]
    filelog: 
      include: [ /var/log/pods/*/*/*.log ]
      start_at: beginning
      include_file_path: true
      include_file_name: false
      retry_on_failure: 
        enabled: true
      operators: 
        - type: router
          id: get-format
          routes: 
            - output: parser-containerd
              expr: 'body matches "^[^ Z]+Z"'
        - type: regex_parser
          id: parser-containerd
          regex: '^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'
          output: extract_metadata_from_filepath
          timestamp: 
            parse_from: attributes.time
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        - type: regex_parser
          id: extract_metadata_from_filepath
          regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'
          parse_from: attributes["log.file.path"]
          cache: 
            size: 128
        - type: move
          from: attributes.stream
          to: attributes["log.iostream"]
        - type: move
          from: attributes.container_name
          to: resource["k8s.container.name"]
        - type: move
          from: attributes.namespace
          to: resource["k8s.namespace.name"]
        - type: move
          from: attributes.pod_name
          to: resource["k8s.pod.name"]
        - type: move
          from: attributes.restart_count
          to: resource["k8s.container.restart_count"]
        - type: move
          from: attributes.uid
          to: resource["k8s.pod.uid"]
        - type: remove
          field: attributes.time
        - type: move
          from: attributes.log
          to: body
  processors: 
    batch: 
      send_batch_size: 10000
      timeout: 10s
    memory_limiter: 
      check_interval: 1s
      limit_percentage: 75
      spike_limit_percentage: 15
    attributes: 
      actions: 
        - action: insert
          key: loki.attribute.labels
          value: log.file.path, log.iostream, time, logtag
    resource: 
      attributes: 
      - action: insert
        key: loki.resource.labels
        value: k8s.pod.name, k8s.node.name, k8s.namespace.name, k8s.container.name, k8s.container.restart_count, k8s.pod.uid
  service: 
    extensions: 
      - health_check
      - memory_ballast
    pipelines: 
      metrics: 
        exporters: 
        - prometheusremotewrite
        processors: 
        - memory_limiter
        - batch
        receivers: 
        - prometheus
      logs: 
        exporters: 
        - loki
        processors: 
        - batch
        - resource
        - attributes
        receivers: 
        - filelog
      traces: 
        exporters: 
        - otlp
        processors: 
        - memory_limiter
        - batch
        receivers: 
        - otlp
presets: 
  logsCollection: 
    enabled: true
    includeCollectorLogs: true
  kubernetesAttributes: 
    enabled: true
  kubeletMetrics: 
    enabled: true
extraEnvs: 
- name: KUBE_NODE_NAME
  valueFrom: 
    fieldRef: 
      apiVersion: v1
      fieldPath: spec.nodeName

# Otel Collector 설치
helm install opentelemetry open-telemetry/opentelemetry-collector -n otel -f otelcol-values.yaml --version 0.81.0

# 확인
kubectl get all -n otel
NAME                                                         READY   STATUS    RESTARTS   AGE
pod/opentelemetry-opentelemetry-collector-576cc64897-57xfl   1/1     Running   0          32s
pod/opentelemetry-opentelemetry-collector-576cc64897-qwgvj   1/1     Running   0          32s
pod/opentelemetry-opentelemetry-collector-576cc64897-rqv8f   1/1     Running   0          32s
pod/opentelemetry-operator-6f4c546d56-p57fh                  2/2     Running   0          109s

NAME                                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                   AGE
service/opentelemetry-opentelemetry-collector   ClusterIP   10.100.224.197   <none>        6831/UDP,14250/TCP,14268/TCP,4317/TCP,4318/TCP,9411/TCP   32s
service/opentelemetry-operator                  ClusterIP   10.100.193.96    <none>        8443/TCP,8080/TCP                                         109s
service/opentelemetry-operator-webhook          ClusterIP   10.100.102.81    <none>        443/TCP                                                   109s

NAME                                                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/opentelemetry-opentelemetry-collector   3/3     3            3           32s
deployment.apps/opentelemetry-operator                  1/1     1            1           109s

NAME                                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/opentelemetry-opentelemetry-collector-576cc64897   3         3         3       32s
replicaset.apps/opentelemetry-operator-6f4c546d56                  1         1         1       109s


4. Grafana 설치

# [모니터링1] grafana 네임스페이스 - pod, svc, ep, ingress, pv, pvc 모니터링
watch kubectl get pod,svc,ep,ingress,pv,pvc -n grafana

# [모니터링2] 동적 프로비저닝으로 생성되는 EBS 볼륨 확인
while true; do aws ec2 describe-volumes \
  --filters Name=tag:ebs.csi.aws.com/cluster,Values=true \
  --query "Volumes[].{VolumeId: VolumeId, VolumeType: VolumeType, InstanceId: Attachments[0].InstanceId, State: Attachments[0].State}" \
  --output text; date; sleep 1; done
  
  # grafana-values.yaml 확인
cd ~/values; envsubst < grafana-temp-values.yaml > grafana-values.yaml
cat grafana-values.yaml | yh

cd ~/values
cat >grafana-values.yaml <<EOF
adminUser: admin
adminPassword: qwer1234

grafana.ini:
  server:
    root_url: "%(protocol)s://%(domain)s:%(http_port)s/grafana/"
    serve_from_sub_path: true
  users:
    default_theme: light
  dashboards:
    default_timezone: Asia/Seoul

persistence:
  enabled: true
  accessModes:
    - ReadWriteOnce
  size: 20Gi
  storageClassName: gp3

resources:
  requests:
    memory: 400Mi
    cpu: 200m
  limits:
    memory: 700Mi
    cpu: 300m

ingress:
  enabled: true
  ingressClassName: alb
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
    alb.ingress.kubernetes.io/success-codes: 200-399
    alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
    alb.ingress.kubernetes.io/group.name: study
    alb.ingress.kubernetes.io/ssl-redirect: '443'
  hosts:
    - grafana.$MyDomain
  path: /
sidecar:
  dashboards:
    enabled: true
  datasources:
    enabled: true
EOF
cat grafana-values.yaml

# grafana-values 파일을 활용해서 grafana를 helm chart로 설치
helm install grafana grafana/grafana -n grafana -f grafana-values.yaml
--
NAME: grafana
LAST DEPLOYED: Wed Feb 19 16:25:21 2025
NAMESPACE: grafana
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo


2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   grafana.grafana.svc.cluster.local

   If you bind grafana to 80, please update values in values.yaml and reinstall:
   ```
   securityContext:
     runAsUser: 0
     runAsGroup: 0
     fsGroup: 0

   command:
   - "setcap"
   - "'cap_net_bind_service=+ep'"
   - "/usr/sbin/grafana-server &&"
   - "sh"
   - "/run.sh"
   ```
   Details refer to https://grafana.com/docs/installation/configuration/#http-port.
   Or grafana would always crash.

   From outside the cluster, the server URL(s) are:
     http://grafana.pjhtest.click

3. Login with the password from step 1 and the username: admin

⇒ https://grafana.pjhtest.click/grafana 접속 확인 (admin/qwer1234)


5. Observability 검증

💡 Trace 계측 방안 정리

  • 수동 계측: 세밀한 제어와 커스터마이즈 가능성, 그러나 코드 수정 필요.
  • 프로그래밍 계측: 코드 내에서 계측을 설정하며 유연성 제공, 일부 코드 변경 필요.
  • 자동 계측: 코드 수정 없이 빠르고 간편하게 계측 적용, 세밀한 제어는 어려움.

728x90
반응형

'EKS' 카테고리의 다른 글

Terraform EKS - 02(Network, Storage)  (2) 2025.02.14
Terraform EKS - 01(VPC)  (0) 2025.02.10
EKS CloudFormation  (0) 2024.03.03
5주차 - EKS Autoscaling  (0) 2023.05.30
EKS 4주차 - EKS Observability  (0) 2023.05.16
Comments