Posted Updated 4 minutes read (About 612 words)
节点根目录被打满导致的ETCD憨批修复记录
背景
事情发生在 UAT 环境的其中一台 Controller 节点,节点根目录被打满,同时 etcd 数据没有落盘到独立的磁盘中,导致 etcd 憨批,节点出现 notready

修复过程
参考了各种网络资料,最终形成如下修复手段:
- 移除 
statis pod yaml,从而停止坏掉的 etcd pod 
- 通过 
etcdctl member remove 移除坏掉的 etcd 实例 
- 备份数据目录并移除
 
- 通过 
etcdctl member add 添加新实例,记录 etcdctl 输出的配置信息 
- 通过裸起容器的方式,启动 etcd 容器,启动需要用到的参数,参考 
statis pod yaml 和第 4 步输出的配置信息 
- 启动后会与 leader 进行数据的同步,可以通过 
etcdctl endpoint status -w table 查看状态 
- 如果同步成功则可以停止 etcd 容器,将 
statis pod yaml 放回对应的目录中,集群修复 
具体的操作命令:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
   |  mv /etc/kubernetes/manifests/etcd.yaml .
 
  export endpoints="https://10.82.69.10:2379,https://10.82.69.11:2379,https://10.82.69.12:2379,https://10.82.69.19:2379,https://10.66.10.83:2379" export cacert="/etc/kubernetes/pki/etcd/ca.crt" export cert="/etc/kubernetes/pki/etcd/peer.crt" export key="/etc/kubernetes/pki/etcd/peer.key"
 
  alias e="etcdctl --endpoints $endpoints --cacert $cacert --cert $cert --key $key"
 
  eval $(kubectl get nodes -owide|grep -E "etcd|control-plane" |awk '{printf "https://"$6":2379,"}'|awk '{gsub(",$","");print "export ETCDCTL_ENDPOINTS=\""$1"\""}') && export ETCDCTL_CACERT=/etc/kubernetes/ssl/etcd/ca.crt && export ETCDCTL_CERT=/etc/kubernetes/ssl/etcd/peer.crt && export ETCDCTL_KEY=/etc/kubernetes/ssl/etcd/peer.key
 
  etcdctl member remove $issue_etcd_id
 
  rm -rf /var/lib/etcd/*
 
  etcdctl member add wcn-gduvm-mwdcm1 --peer-urls=https://10.82.69.10:2380
 
  nerdctl run -d --name restore_etcd \  -v /etc/kubernetes/ssl/etcd:/etc/kubernetes/ssl/etcd \  -v /var/lib/etcd:/var/lib/etcd \  --network=host \  -e ETCD_NAME="wcn-gduvm-mwdcm1" \  -e ETCD_INITIAL_CLUSTER="wcn-gduvm-mwdcm2=https://10.82.69.11:2380,wcn-gduvm-mwdcm1=https://10.82.69.10:2380,wcn-gduvm-mwdcm3=https://10.82.69.12:2380" \  -e ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.82.69.10:2380" \  -e ETCD_INITIAL_CLUSTER_STATE="existing" \  --entrypoint=etcd 10.82.49.238/quay.io/coreos/etcd:v3.5.6 --advertise-client-urls=https://10.82.69.10:2379 --auto-compaction-retention=8 --cert-file=/etc/kubernetes/ssl/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --election-timeout=5000 --experimental-initial-corrupt-check=true --experimental-watch-progress-notify-interval=5s --heartbeat-interval=250 --key-file=/etc/kubernetes/ssl/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.82.69.10:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://10.82.69.10:2380 --metrics=basic --peer-cert-file=/etc/kubernetes/ssl/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/ssl/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/ssl/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/ssl/etcd/ca.crt
 
  nerdctl stop restore_etcd
 
  mv ./etcd.yaml /etc/kubernetes/manifests/etcd.yaml
 
  | 
You need to set install_url to use ShareThis. Please set it in _config.yml.