본문 바로가기

K8S

Cloud Native PostgreSQL 오퍼레이터

Gisida님의 Database Operator In Kubernetes study(=DOIK) 스터디 진행 중 테스트 한 내용입니다.

Cloud Native PostgreSQL(CNPG) 오퍼레이터 테스트
1. CNPG(Cloud Native PostgreSQL) 설치
2. CNPG 기본사용
3. CNPG 장애 테스트
4. CNPG Scale & 롤링 업데이트
5. CNPG 기타


  • Cloud Native PostgreSQL(CNPG)?
      CloudNativePG는 프라이빗, 퍼블릭, 하이브리드 또는 멀티 클라우드 환경에서 실행되는 지원되는 모든 Kubernetes 클러스터에서 PostgreSQL 워크로드를 관리하도록 설계된 오픈 소스 오퍼레이터입니다

  • Architecture 
    - Read-write workloads 

https://cloudnative-pg.io/documentation/1.15.1/architecture/

  애플리케이션은 아래 그림과 같이 -rw suffix service를 사용하여, Kubernetes 오퍼레이터가 현재 Primary Instance선택한 PostgreSQL 인스턴스에 연결할 수 있습니다. Primary Instance를 일시적 또는 영구적으로 사용할 수 없는 경우 Kubernetes는 고가용성을 위해 -rw 서비스를 클러스터의 다른 인스턴스로 이동합니다.

 -  Read-Only workloads 

https://cloudnative-pg.io/documentation/1.15.1/architecture/

  애플리케이션은 운영자가 제공한 -ro 서비스를 통해 상시 대기 복제본에 액세스할 수 있습니다. 이 서비스를 사용하면 애플리케이션이 기본 노드에서 읽기 전용 쿼리를 오프로드할 수 있습니다.

 - Multi-cluster deployments

https://cloudnative-pg.io/documentation/1.15.1/architecture/

  두 개의 서로 다른 Kubernetes 클러스터에 복제되는 PostgreSQL입니다. 여기서 기본 클러스터는 첫 번째 Kubernetes 클러스터에 있고 복제본 클러스터는 두 번째 클러스터에 있습니다. 두 번째 Kubernetes 클러스터는 DR역할합니다.


1. CNPG(Cloud Native PostgreSQL) 설치

▶  Helm Repo추가

$ helm repo add cnpg https://cloudnative-pg.github.io/charts
"cnpg" has been added to your repositories

 

  오퍼레이터 설치

$ helm install cnpg cnpg/cloudnative-pg -f ~/DOIK/5/values.yaml
NAME: cnpg
LAST DEPLOYED: Wed Jun 22 10:27:50 2022
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
CloudNativePG operator should be installed in namespace "default".
You can now create a PostgreSQL cluster with 3 nodes in the current namespace as follows:

cat <<EOF | kubectl apply -f -
# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cluster-example
spec:
  instances: 3
  storage:
    size: 1Gi
EOF

$ kubectl get cluster

 

  CRD 확인

$ kubectl get crd
NAME                                  CREATED AT
backups.postgresql.cnpg.io            2022-06-22T01:27:52Z
clusters.postgresql.cnpg.io           2022-06-22T01:27:52Z
poolers.postgresql.cnpg.io            2022-06-22T01:27:52Z
scheduledbackups.postgresql.cnpg.io   2022-06-22T01:27:52Z

 

  클러스터 설치 Deploy a PostgreSQL cluster

$ cat ~/DOIK/5/mycluster1.yaml

# Example of PostgreSQL cluster
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: mycluster
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:14.2
  instances: 3
  storage:
    size: 3Gi
  postgresql:
    parameters:
      max_worker_processes: "40"
      timezone: "Asia/Seoul"
    pg_hba:
      - host all postgres all trust
  primaryUpdateStrategy: unsupervised
  enableSuperuserAccess: true
  bootstrap:
    initdb:
      database: app
      encoding: UTF8
      localeCType: C
      localeCollate: C
      owner: app

$  kubectl apply -f ~/DOIK/5/mycluster1.yaml
cluster.postgresql.cnpg.io/mycluster created

$ kubectl get pod -w
NAME                                   READY   STATUS    RESTARTS   AGE
cnpg-cloudnative-pg-5f8cc75df5-jk2v4   1/1     Running   0          17m
mycluster-1                            1/1     Running   0          4m23s
mycluster-2                            1/1     Running   0          3m36s
mycluster-3                            1/1     Running   0          2m52s

 

  클러스터 확인

$ kubectl get all
NAME                                       READY   STATUS    RESTARTS   AGE
pod/cnpg-cloudnative-pg-5f8cc75df5-jk2v4   1/1     Running   0          30m
pod/mycluster-1                            1/1     Running   0          16m
pod/mycluster-2                            1/1     Running   0          16m
pod/mycluster-3                            1/1     Running   0          15m

NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/cnpg-webhook-service   ClusterIP   10.200.1.33    <none>        443/TCP    30m

$  kubectl get cluster
NAME        AGE   INSTANCES   READY   STATUS                     PRIMARY
mycluster   17m   3           3       Cluster in healthy state   mycluster-1

$ kubectl cnpg status mycluster --verbose ## -v , config 설정 적용 확인
Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-1
Status:             Cluster in healthy state
Instances:          3
Ready instances:    3
Current Write LSN:  0/6000060 (Timeline: 1 - WAL File: 000000010000000000000006)

PostgreSQL Configuration
archive_command = '/controller/manager wal-archive --log-destination /controller/log/postgres.json %p'
archive_mode = 'on'
archive_timeout = '5min'
cluster_name = 'mycluster'
dynamic_shared_memory_type = 'posix'
full_page_writes = 'on'
hot_standby = 'true'
listen_addresses = '*'
log_destination = 'csvlog'
log_directory = '/controller/log'
log_filename = 'postgres'
log_rotation_age = '0'
log_rotation_size = '0'
log_truncate_on_rotation = 'false'
logging_collector = 'on'
max_parallel_workers = '32'
max_replication_slots = '32'
max_worker_processes = '40'
port = '5432'
restart_after_crash = 'false'
shared_memory_type = 'mmap'
shared_preload_libraries = ''
ssl = 'on'
ssl_ca_file = '/controller/certificates/client-ca.crt'
ssl_cert_file = '/controller/certificates/server.crt'
ssl_key_file = '/controller/certificates/server.key'
timezone = 'Asia/Seoul'
unix_socket_directories = '/controller/run'
wal_keep_size = '512MB'
wal_level = 'logical'
wal_log_hints = 'on'
wal_receiver_timeout = '5s'
wal_sender_timeout = '5s'
cnpg.config_sha256 = '8a3f5126327a68ea817baaac7b4e96184e10c86208e93c126750f40c92a30747'

PostgreSQL HBA Rules

# Grant local access
local all all peer map=local

# Require client certificate authentication for the streaming_replica user
hostssl postgres streaming_replica all cert
hostssl replication streaming_replica all cert
hostssl all cnpg_pooler_pgbouncer all cert


host all postgres all trust


# Otherwise use the default authentication method
host all all all scram-sha-256


Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.98
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.98
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.98

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority
----         --------   ---------  ---------  ----------  ---------  ---------  ----------  -----      ----------  -------------
mycluster-2  0/6000060  0/6000060  0/6000060  0/6000060   00:00:00   00:00:00   00:00:00    streaming  async       0
mycluster-3  0/6000060  0/6000060  0/6000060  0/6000060   00:00:00   00:00:00   00:00:00    streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-1  33 MB          0/6000060    Primary           OK      BestEffort  1.15.1
mycluster-2  33 MB          0/6000060    Standby (async)   OK      BestEffort  1.15.1
mycluster-3  33 MB          0/6000060    Standby (async)   OK      BestEffort  1.15.1

 

  클러스터된 기본리소스 확인

$ kubectl get pod,deploy
NAME                                       READY   STATUS    RESTARTS   AGE
pod/cnpg-cloudnative-pg-5f8cc75df5-jk2v4   1/1     Running   0          34m
pod/mycluster-1                            1/1     Running   0          21m
pod/mycluster-2                            1/1     Running   0          20m
pod/mycluster-3                            1/1     Running   0          19m

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cnpg-cloudnative-pg   1/1     1            1           34m


$ kubectl get svc,ep,endpointslices -l cnpg.io/cluster=mycluster
NAME                    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/mycluster-any   ClusterIP   10.200.1.195   <none>        5432/TCP   22m
service/mycluster-r     ClusterIP   10.200.1.232   <none>        5432/TCP   22m
service/mycluster-ro    ClusterIP   10.200.1.69    <none>        5432/TCP   22m
service/mycluster-rw    ClusterIP   10.200.1.174   <none>        5432/TCP   22m

NAME                      ENDPOINTS                                         AGE
endpoints/mycluster-any   172.16.1.4:5432,172.16.2.6:5432,172.16.3.4:5432   22m
endpoints/mycluster-r     172.16.1.4:5432,172.16.2.6:5432,172.16.3.4:5432   22m
endpoints/mycluster-ro    172.16.1.4:5432,172.16.2.6:5432                   22m
endpoints/mycluster-rw    172.16.3.4:5432                                   22m

NAME                                                 ADDRESSTYPE   PORTS   ENDPOINTS                          AGE
endpointslice.discovery.k8s.io/mycluster-any-gjmlp   IPv4          5432    172.16.3.4,172.16.1.4,172.16.2.6   22m
endpointslice.discovery.k8s.io/mycluster-r-98r65     IPv4          5432    172.16.3.4,172.16.1.4,172.16.2.6   22m
endpointslice.discovery.k8s.io/mycluster-ro-kh6fm    IPv4          5432    172.16.1.4,172.16.2.6              22m
endpointslice.discovery.k8s.io/mycluster-rw-4gcpn    IPv4          5432    172.16.3.4                         22m


$ kubectl get cm,secret
NAME                                       DATA   AGE
configmap/cnpg-controller-manager-config   0      34m
configmap/cnpg-default-monitoring          1      34m
configmap/kube-root-ca.crt                 1      85m

NAME                                     TYPE                                  DATA   AGE
secret/cnpg-ca-secret                    Opaque                                2      34m
secret/cnpg-cloudnative-pg-token-c7sqs   kubernetes.io/service-account-token   3      34m
secret/cnpg-webhook-cert                 kubernetes.io/tls                     2      34m
secret/default-token-j8k8p               kubernetes.io/service-account-token   3      85m
secret/mycluster-app                     kubernetes.io/basic-auth              3      22m
secret/mycluster-ca                      Opaque                                2      22m
secret/mycluster-replication             kubernetes.io/tls                     2      22m
secret/mycluster-server                  kubernetes.io/tls                     2      22m
secret/mycluster-superuser               kubernetes.io/basic-auth              3      22m
secret/mycluster-token-jzdtw             kubernetes.io/service-account-token   3      22m
secret/sh.helm.release.v1.cnpg.v1        helm.sh/release.v1                    1      34m


$ kubectl get pdb
NAME                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
mycluster           1               N/A               1                     22m
mycluster-primary   1               N/A               0                     22m

 

2. CNPG 기본사용

   2개의 자격 증명이 저장된 secret 확인

$ kubectl get secret -l cnpg.io/cluster=mycluster
NAME                  TYPE                       DATA   AGE
mycluster-app         kubernetes.io/basic-auth   3      123m
mycluster-superuser   kubernetes.io/basic-auth   3      123m

 

   DB계정정보 확인

# Super계정명 확인
$ kubectl get secrets mycluster-superuser -o jsonpath={.data.username} | base64 -d ;echo
postgres

# app 계정명
$ kubectl get secrets mycluster-app -o jsonpath={.data.username} | base64 -d ;echo
app


# app 계정 암호
kubectl get secrets mycluster-app -o jsonpath={.data.password} | base64 -d ; echo
wgyiBXwnWHPoXF9plDaulXX34U8xQ0WvZOKdsoP5ozD95j9PMkkSRbHXW6AArRaq

 

   app계정 암호 변수 지정

$ AUSERPW=$(kubectl get secrets mycluster-app -o jsonpath={.data.password} | base64 -d)

 

  myclient 파드 2대 배포 : envsubst 활용

$ cat ~/DOIK/5/myclient.yaml
 apiVersion: v1
 kind: Pod
 metadata:
   name: ${PODNAME}
   labels:
     app: myclient
 spec:
   nodeName: k8s-m
   containers:
   - name: ${PODNAME}
     image: bitnami/postgresql:${VERSION}
     command: ["tail"]
     args: ["-f", "/dev/null"]
   terminationGracePeriodSeconds: 0
   
$ for ((i=1; i<=2; i++)); do PODNAME=myclient$i VERSION=14.3.0 envsubst < ~/DOIK/5/myclient.yaml | kubectl apply -f - ; done
pod/myclient1 created
pod/myclient2 created

 

  [myclient1] superuser 계정으로 mycluster-rw 서비스 접속

# 접속
$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432
psql (14.3, server 14.2 (Debian 14.2-1.pgdg110+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
postgres=#

# 연결정보 확인
postgres=# \conninfo
You are connected to database "postgres" as user "postgres" on host "mycluster-rw" (address "10.200.1.174") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

 

   테스트를 위한 sample데이터 다운로드((https://www.postgresqltutorial.com/postgresql-getting-started)

$ curl -LO https://www.postgresqltutorial.com/wp-content/uploads/2019/05/dvdrental.zip
$ apt install unzip -y && unzip dvdrental.zip

 

  myclient1 파드에 dvdrental.tar 복사

$ kubectl cp dvdrental.tar myclient1:/tmp

 

▶ [myclient1] superuser 계정으로 mycluster-rw 서비스 접속 후 데이터베이스 생성

$ kubectl exec -it myclient1 -- createdb -U postgres -h mycluster-rw -p 5432 dvdrental
$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -l
                             List of databases
   Name    |  Owner   | Encoding | Collate | Ctype |   Access privileges
-----------+----------+----------+---------+-------+-----------------------
 app       | app      | UTF8     | C       | C     |
 dvdrental | postgres | UTF8     | C       | C     |
 postgres  | postgres | UTF8     | C       | C     |
 template0 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres
 template1 | postgres | UTF8     | C       | C     | =c/postgres          +
           |          |          |         |       | postgres=CTc/postgres
(5 rows)

 

 DVD Rental Sample Database Import

$ kubectl exec -it myclient1 -- pg_restore -U postgres -d dvdrental /tmp/dvdrental.tar -h  mycluster-rw -p 5432

# DVD Rental Database에서 테이블 조회
kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d dvdrental -c "SELECT * FROM actor"
actor_id | first_name  |  last_name   |      last_update
----------+-------------+--------------+------------------------
        1 | Penelope    | Guiness      | 2013-05-26 14:47:57.62
        2 | Nick        | Wahlberg     | 2013-05-26 14:47:57.62
        3 | Ed          | Chase        | 2013-05-26 14:47:57.62
        4 | Jennifer    | Davis        | 2013-05-26 14:47:57.62
        5 | Johnny      | Lollobrigida | 2013-05-26 14:47:57.62
        6 | Bette       | Nicholson    | 2013-05-26 14:47:57.62
~ (생략)

 

 각 파드에 접근해서 DVD Rental Database복제 동기화 확인

# 파드IP 변수 지정
$ POD1=$(kubectl get pod mycluster-1 -o jsonpath={.status.podIP})
$ POD2=$(kubectl get pod mycluster-2 -o jsonpath={.status.podIP})
$ POD3=$(kubectl get pod mycluster-3 -o jsonpath={.status.podIP})

# 파드별 actor 테이블 카운트 조회 
$ kubectl exec -it myclient1 -- psql -U postgres -h $POD1 -p 5432 -d dvdrental -c "SELECT COUNT(*) FROM actor"
 count
-------
   200
(1 row)

$ kubectl exec -it myclient1 -- psql -U postgres -h $POD2 -p 5432 -d dvdrental -c "SELECT COUNT(*) FROM actor"
 count
-------
   200
(1 row)

$ kubectl exec -it myclient1 -- psql -U postgres -h $POD3 -p 5432 -d dvdrental -c "SELECT COUNT(*) FROM actor"
 count
-------
   200
(1 row)

 

rw vs ro vs r(any) 차이 확인

  →   rw : write가 가능한 노드로만 접속

$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -c "select inet_server_addr();"
for i in {1..30}; do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -c "select inet_server_addr();"; done | sort | uniq -c | sort -nr | grep 172
==> 30  172.16.3.4

 

  →   ro : read만 가능한 노드로만 접속

$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -c "select inet_server_addr();"
for i in {1..30}; do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -c "select inet_server_addr();"; done | sort | uniq -c | sort -nr | grep 172
==>  18  172.16.1.4
==>  12  172.16.2.6

 

  →   r : read + write가 가능한 노드로 접속

$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-r -p 5432 -c "select inet_server_addr();"
for i in {1..30}; do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-r -p 5432 -c "select inet_server_addr();"; done | sort | uniq -c | sort -nr | grep 172
==>  11  172.16.1.4
==>  10  172.16.2.6
==>   9  172.16.3.4

 

  →   any : 아무거나 접속

$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-any -p 5432 -c "select inet_server_addr();"
for i in {1..30}; do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-any -p 5432 -c "select inet_server_addr();"; done | sort | uniq -c | sort -nr | grep 172
==>  12  172.16.1.4
==>   9  172.16.3.4
==>   9  172.16.2.6

 

 

3. CNPG 장애 테스트

장애 테스트를 위한 사전준비

# 파드IP 변수 지정
POD1=$(kubectl get pod mycluster-1 -o jsonpath={.status.podIP})
POD2=$(kubectl get pod mycluster-2 -o jsonpath={.status.podIP})
POD3=$(kubectl get pod mycluster-3 -o jsonpath={.status.podIP})

# query.sql
$ cat ~/DOIK/5/query.sql
CREATE DATABASE test;
\c test;
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL);
INSERT INTO t1 VALUES (1, 'Luis');

# SQL 파일 query 실행
$ kubectl cp ~/DOIK/5/query.sql myclient1:/tmp
$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -f /tmp/query.sql
CREATE DATABASE
psql (14.3, server 14.2 (Debian 14.2-1.pgdg110+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
You are now connected to database "test" as user "postgres".
CREATE TABLE
INSERT 0 1

# 확인
$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT * FROM t1"
 c1 |  c2
----+------
  1 | Luis
  (1 row)

# INSERT

# test 데이터베이스에 97개의 데이터 INSERT
 c1 |  c2
----+-------
  1 | Luis
  2 | Luis2
(2 rows)

test 데이터베이스에 97개의 데이터 INSERT
$ for ((i=3; i<=100; i++)); do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done
$ kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"

 count
-------
   100
(1 row)


# [터미널2] 모니터링
$ while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

 

[장애1] 프라이머리 파드(인스턴스) 1대 강제 삭제 및 동작 확인

  →   프라이머리 파드 정보 확인

$ kubectl cnpg status mycluster

 

  →   [터미널1], [터미널2] 모니터링모니터링

# [터미널1] 모니터링
$ watch kubectl get pod -l cnpg.io/cluster=mycluster

# [터미널2] 모니터링
$ while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

 

  →   [터미널3] test 데이터베이스에 다량의 데이터 반복 INSERT

$ for ((i=301; i<=10000; i++)); do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done
# $ for ((i=10001; i<=20000; i++)); do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

 

  →   [터미널4] 파드 삭제 

$ kubectl delete pvc/mycluster-1 pod/mycluster-2
pod "mycluster-2" deleted

 

  →  삭제 확인

$ kubectl cnpg status mycluster
Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-3
Status:             Failing over Failing over from mycluster-2 to mycluster-3
Instances:          3
Ready instances:    2
Current Write LSN:  0/E007850 (Timeline: 3 - WAL File: 00000003000000000000000E)

Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.88

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag        Flush Lag        Replay Lag       State      Sync State  Sync Priority
----         --------   ---------  ---------  ----------  ---------        ---------        ----------       -----      ----------  -------------
mycluster-4  0/E007850  0/E007850  0/E007850  0/E007850   00:00:00.000721  00:00:00.001346  00:00:00.001411  streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status             QoS         Manager Version
----         -------------  -----------  ----------------  ------             ---         ---------------
mycluster-2  -              -            -                 pod not available  BestEffort  -
mycluster-3  57 MB          0/E007850    Primary           OK                 BestEffort  1.15.1
mycluster-4  57 MB          0/E007908    Standby (async)   OK                 BestEffort  1.15.1

 

  →  수초 순단 후 Insert 재개

psql: error: connection to server at "mycluster-rw" (10.200.1.174), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
command terminated with exit code 2

psql: error: connection to server at "mycluster-rw" (10.200.1.174), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
command terminated with exit code 2

INSERT 0 1

INSERT 0 1

INSERT 0 1

 

  →  파드 정보 확인 : 파드 재생 성

$ kubectl get pod -l cnpg.io/cluster=mycluster
NAME          READY   STATUS    RESTARTS   AGE
mycluster-2   1/1     Running   0          157m
mycluster-3   1/1     Running   0          157m
mycluster-4   1/1     Running   0          49s

 

[장애2] 프라이머리 파드(인스턴스) 가 배포된 노드 1대 drain 설정 및 동작 확인

  →  (옵션) 오퍼레이터 로그 확인

$ kubetail -l app.kubernetes.io/instance=cnpg -f

 

  →  워커노드 drain

$ kubectl drain k8s-w1 --delete-emptydir-data --force --ignore-daemonsets && kubectl get node -w
node/k8s-w1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-gb76j, kube-system/kube-proxy-h22x6
evicting pod kube-system/coredns-64897985d-sbc7f
evicting pod default/mycluster-3
evicting pod kube-system/coredns-64897985d-mwx5s
error when evicting pods/"mycluster-3" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod default/mycluster-3


pod/coredns-64897985d-sbc7f evicted
pod/coredns-64897985d-mwx5s evicted
pod/mycluster-3 evicted
node/k8s-w1 drained
NAME     STATUS                     ROLES                  AGE     VERSION
k8s-m    Ready                      control-plane,master   3h47m   v1.23.6
k8s-w1   Ready,SchedulingDisabled   <none>                 3h47m   v1.23.6
k8s-w2   Ready                      <none>                 3h47m   v1.23.6
k8s-w3   Ready                      <none>                 3h47m   v1.23.6
k8s-w3   Ready                      <none>                 3h47m   v1.23.6
k8s-w2   Ready                      <none>                 3h47m   v1.23.6

 

  →  클러스터 정보 확인

$ kubectl cnpg status mycluster
Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-3
Status:             Failing over Failing over from mycluster-2 to mycluster-3
Instances:          3
Ready instances:    2
Current Write LSN:  0/E007850 (Timeline: 3 - WAL File: 00000003000000000000000E)

Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.88

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag        Flush Lag        Replay Lag       State      Sync State  Sync Priority
----         --------   ---------  ---------  ----------  ---------        ---------        ----------       -----      ----------  -------------
mycluster-4  0/E007850  0/E007850  0/E007850  0/E007850   00:00:00.000721  00:00:00.001346  00:00:00.001411  streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status             QoS         Manager Version
----         -------------  -----------  ----------------  ------             ---         ---------------
mycluster-2  -              -            -                 pod not available  BestEffort  -
mycluster-3  57 MB          0/E007850    Primary           OK                 BestEffort  1.15.1
mycluster-4  57 MB          0/E007908    Standby (async)   OK                 BestEffort  1.15.1

 

  →  동작 확인 후 uncordon 설정

$ kubectl uncordon k8s-w1

 

  →  클러스터 정보 확인

$ kubectl cnpg status mycluster

Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-2
Status:             Switchover in progress Switching over to mycluster-2, because primary instance was running on unschedulable node k8s-w1
Instances:          3
Ready instances:    2
Current Write LSN:  0/F003308 (Timeline: 4 - WAL File: 00000004000000000000000F)

Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.88

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority
----         --------   ---------  ---------  ----------  ---------  ---------  ----------  -----      ----------  -------------
mycluster-4  0/F003308  0/F003308  0/F003308  0/F003308   00:00:00   00:00:00   00:00:00    streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status             QoS         Manager Version
----         -------------  -----------  ----------------  ------             ---         ---------------
mycluster-2  57 MB          0/F003308    Primary           OK                 BestEffort  1.15.1
mycluster-3  -              -            -                 pod not available  BestEffort  -
mycluster-4  57 MB          0/F003308    Standby (async)   OK                 BestEffort  1.15.1
$ kubectl cnpg status mycluster
Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-2
Status:             Cluster in healthy state
Instances:          3
Ready instances:    3
Current Write LSN:  0/F003340 (Timeline: 4 - WAL File: 00000004000000000000000F)

Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.88
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.88

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN   Write LSN  Flush LSN  Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority
----         --------   ---------  ---------  ----------  ---------  ---------  ----------  -----      ----------  -------------
mycluster-4  0/F003340  0/F003340  0/F003340  0/F003340   00:00:00   00:00:00   00:00:00    streaming  async       0
mycluster-3  0/F003340  0/F003340  0/F003340  0/F003340   00:00:00   00:00:00   00:00:00    streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-2  57 MB          0/F003340    Primary           OK      BestEffort  1.15.1
mycluster-3  57 MB          0/F003340    Standby (async)   OK      BestEffort  1.15.1
mycluster-4  57 MB          0/F003340    Standby (async)   OK      BestEffort  1.15.1

 

4. CloudNativePG Scale & 롤링 업데이트 

▶CNPG Scale 설정

  →  사전정보 확인

$ kubectl cnpg status mycluster
Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-2
Status:             Cluster in healthy state
Instances:          3
Ready instances:    3
Current Write LSN:  0/11000000 (Timeline: 4 - WAL File: 000000040000000000000010)

Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.87
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.87
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.87

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN    Write LSN   Flush LSN   Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority
----         --------    ---------   ---------   ----------  ---------  ---------  ----------  -----      ----------  -------------
mycluster-4  0/11000000  0/11000000  0/11000000  0/11000000  00:00:00   00:00:00   00:00:00    streaming  async       0
mycluster-3  0/11000000  0/11000000  0/11000000  0/11000000  00:00:00   00:00:00   00:00:00    streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-2  57 MB          0/11000000   Primary           OK      BestEffort  1.15.1
mycluster-3  57 MB          0/11000000   Standby (async)   OK      BestEffort  1.15.1
mycluster-4  57 MB          0/11000000   Standby (async)   OK      BestEffort  1.15.1
$ kubectl get cluster mycluster
NAME        AGE    INSTANCES   READY   STATUS                     PRIMARY
mycluster   176m   3           3       Cluster in healthy state   mycluster-2


NAME        AGE    INSTANCES   READY   STATUS                     PRIMARY
mycluster   167m   3           3       Cluster in healthy state   mycluster-6

 

  →  모니터링

$ watch kubectl get pod -l postgresql=mycluster

 

  →  5EA로 Scale-Out 실행

$ kubectl patch cluster mycluster --type=merge -p '{"spec":{"instances":5}}'
cluster.postgresql.cnpg.io/mycluster patched

 

  →  Scale-Out 확인

$ kubectl get cluster mycluster
NAME        AGE    INSTANCES   READY   STATUS                     PRIMARY
mycluster   178m   5           5       Cluster in healthy state   mycluster-2
$ kubectl cnpg status mycluster
Cluster Summary
Name:               mycluster
Namespace:          default
System ID:          7111873327487692819
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2
Primary instance:   mycluster-2
Status:             Cluster in healthy state
Instances:          5
Ready instances:    5
Current Write LSN:  0/14000060 (Timeline: 4 - WAL File: 000000040000000000000014)

Certificates Status
Certificate Name       Expiration Date                Days Left Until Expiration
----------------       ---------------                --------------------------
mycluster-ca           2022-09-20 01:35:08 +0000 UTC  89.87
mycluster-replication  2022-09-20 01:35:08 +0000 UTC  89.87
mycluster-server       2022-09-20 01:35:08 +0000 UTC  89.87

Continuous Backup status
Not configured

Streaming Replication status
Name         Sent LSN    Write LSN   Flush LSN   Replay LSN  Write Lag  Flush Lag  Replay Lag  State      Sync State  Sync Priority
----         --------    ---------   ---------   ----------  ---------  ---------  ----------  -----      ----------  -------------
mycluster-4  0/14000060  0/14000060  0/14000060  0/14000060  00:00:00   00:00:00   00:00:00    streaming  async       0
mycluster-3  0/14000060  0/14000060  0/14000060  0/14000060  00:00:00   00:00:00   00:00:00    streaming  async       0
mycluster-5  0/14000060  0/14000060  0/14000060  0/14000060  00:00:00   00:00:00   00:00:00    streaming  async       0
mycluster-6  0/14000060  0/14000060  0/14000060  0/14000060  00:00:00   00:00:00   00:00:00    streaming  async       0

Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-2  57 MB          0/14000060   Primary           OK      BestEffort  1.15.1
mycluster-3  57 MB          0/14000060   Standby (async)   OK      BestEffort  1.15.1
mycluster-4  57 MB          0/14000060   Standby (async)   OK      BestEffort  1.15.1
mycluster-5  57 MB          0/14000060   Standby (async)   OK      BestEffort  1.15.1
mycluster-6  57 MB          0/14000060   Standby (async)   OK      BestEffort  1.15.1

 

  →  any 접속 확인

$ for i in {1..30}; do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-any -p 5432 -c "select inet_server_addr();"; done | sort | uniq -c | sort -nr | grep 172
      8  172.16.1.9
      7  172.16.2.7
      6  172.16.2.10
      5  172.16.3.8
      4  172.16.1.5

 

  →  3EA로 Scale-In 수행

$ kubectl patch cluster mycluster --type=merge -p '{"spec":{"instances":3}}'
cluster.postgresql.cnpg.io/mycluster patched

 

  →  Scale-In 확인

$ kubectl get cluster mycluster
NAME        AGE   INSTANCES   READY   STATUS                     PRIMARY
mycluster   3h    3           3       Cluster in healthy state   mycluster-2

 

▶ 롤링 업데이트 : Standby 시작 → Primary 갱신 전 SwitchOver로 다운타임 최소화

  →  모니터링

# [터미널1] 모니터링
$ watch kubectl get pod -l cnpg.io/cluster=mycluster
NAME          READY   STATUS    RESTARTS   AGE
mycluster-2   1/1     Running   0          20m
mycluster-3   1/1     Running   0          16m
mycluster-4   1/1     Running   0          22m


# [터미널2] 모니터링
while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

# [터미널3] test 데이터베이스에 다량의 데이터 INSERT
for ((i=10000; i<=20000; i++)); do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done

 

  →   [터미널4] postgresql:14.2 → postgresql:14.3 로 업데이트

$ kubectl patch cluster mycluster --type=merge -p '{"spec":{"imageName":"ghcr.io/cloudnative-pg/postgresql:14.3"}}' && kubectl get pod -l postgresql=mycluster -w

 

  →  클러스터 정보 확인

$ kubectl get cluster mycluster
NAME        AGE    INSTANCES   READY   STATUS              PRIMARY
mycluster   3h1m   2           2       Upgrading cluster   mycluster-2

$ kubectl cnpg status mycluster | grep Image
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.3

 

  → Patch 도중 Insert작업 잠시 순단 발생

psql: error: connection to server at "mycluster-rw" (10.200.1.174), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
command terminated with exit code 2

psql: error: connection to server at "mycluster-rw" (10.200.1.174), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
command terminated with exit code 2

psql: error: connection to server at "mycluster-rw" (10.200.1.174), port 5432 failed: Connection refused
        Is the server running on that host and accepting TCP/IP connections?
command terminated with exit code 2

INSERT 0 1

INSERT 0 1

INSERT 0 1

 

 

5. CNPG 기타

   파드 볼륨증가 : 온라인 볼륨 확장은 현재 local-path 는 미지원되어 불가능!!

  → pod, pvc정보 확인

$ watch kubectl get pod,pvc
NAME                                       READY   STATUS    RESTARTS   AGE
pod/cnpg-cloudnative-pg-5f8cc75df5-jk2v4   1/1     Running   0          3h19m
pod/myclient1                              1/1     Running   0          60m
pod/myclient2                              1/1     Running   0          60m
pod/mycluster-2                            1/1     Running   0          4m55s
pod/mycluster-3                            1/1     Running   0          5m25s
pod/mycluster-4                            1/1     Running   0          5m50s

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/mycluster-2   Bound    pvc-c490f766-3a2b-4658-90d6-49dff5dc03e4   3Gi        RWO            local-path     3h6m
persistentvolumeclaim/mycluster-3   Bound    pvc-d370b23b-14dc-4b90-bfb4-83fa9aa51290   3Gi        RWO            local-path     3h5m
persistentvolumeclaim/mycluster-4   Bound    pvc-ca35067b-0291-4559-b919-639e91e8df67   3Gi        RWO            local-path     29m

 

  → PVC 3G -> 5G 로 증가 설정 : 증가 후 감소는 안됨

$ kubectl patch cluster mycluster --type=merge -p '{"spec":{"storage":{"resizeInUseVolumes":false}}}'
cluster.postgresql.cnpg.io/mycluster patched

$ kubectl patch cluster mycluster --type=merge -p '{"spec":{"storage":{"size":"5Gi"}}}'
cluster.postgresql.cnpg.io/mycluster patched

$ kubectl describe cluster mycluster

 

  → 파드/PVC 삭제 실행으로 재실행

$ kubectl delete pvc/mycluster-4 pod/mycluster-4
persistentvolumeclaim "mycluster-4" deleted
pod "mycluster-4" deleted

$ kubectl delete pvc/mycluster-3 pod/mycluster-3
persistentvolumeclaim "mycluster-3" deleted
pod "mycluster-3" deleted

$ kubectl delete pvc/mycluster-2 pod/mycluster-2
persistentvolumeclaim "mycluster-2" deleted
pod "mycluster-2" deleted

 

  →  볼륨 사이즈 증가 확인

kubectl get pod,pvc
NAME                                       READY   STATUS    RESTARTS   AGE
pod/cnpg-cloudnative-pg-5f8cc75df5-jk2v4   1/1     Running   0          3h22m
pod/myclient1                              1/1     Running   0          62m
pod/myclient2                              1/1     Running   0          62m
pod/mycluster-7                            1/1     Running   0          47s
pod/mycluster-8                            1/1     Running   0          24s
pod/mycluster-9                            1/1     Running   0          9s

NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/mycluster-7   Bound    pvc-5c07d1ae-f6e1-4b05-829a-6512f6f33d30   5Gi        RWO            local-path     59s
persistentvolumeclaim/mycluster-8   Bound    pvc-121e78c4-5f59-4895-aa14-5844a4b3098d   5Gi        RWO            local-path     35s
persistentvolumeclaim/mycluster-9   Bound    pvc-75db182a-b42c-402a-8f23-119dfe9a848d   5Gi        RWO            local-path     20s