Posted Updated 4 minutes read (About 612 words)
节点根目录被打满导致的ETCD憨批修复记录
背景
事情发生在 UAT 环境的其中一台 Controller 节点,节点根目录被打满,同时 etcd 数据没有落盘到独立的磁盘中,导致 etcd 憨批,节点出现 notready
修复过程
参考了各种网络资料,最终形成如下修复手段:
- 移除
statis pod yaml
,从而停止坏掉的 etcd pod
- 通过
etcdctl member remove
移除坏掉的 etcd 实例
- 备份数据目录并移除
- 通过
etcdctl member add
添加新实例,记录 etcdctl 输出的配置信息
- 通过裸起容器的方式,启动 etcd 容器,启动需要用到的参数,参考
statis pod yaml
和第 4 步输出的配置信息
- 启动后会与 leader 进行数据的同步,可以通过
etcdctl endpoint status -w table
查看状态
- 如果同步成功则可以停止 etcd 容器,将
statis pod yaml
放回对应的目录中,集群修复
具体的操作命令:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
| mv /etc/kubernetes/manifests/etcd.yaml .
export endpoints="https://10.82.69.10:2379,https://10.82.69.11:2379,https://10.82.69.12:2379,https://10.82.69.19:2379,https://10.66.10.83:2379" export cacert="/etc/kubernetes/pki/etcd/ca.crt" export cert="/etc/kubernetes/pki/etcd/peer.crt" export key="/etc/kubernetes/pki/etcd/peer.key"
alias e="etcdctl --endpoints $endpoints --cacert $cacert --cert $cert --key $key"
eval $(kubectl get nodes -owide|grep -E "etcd|control-plane" |awk '{printf "https://"$6":2379,"}'|awk '{gsub(",$","");print "export ETCDCTL_ENDPOINTS=\""$1"\""}') && export ETCDCTL_CACERT=/etc/kubernetes/ssl/etcd/ca.crt && export ETCDCTL_CERT=/etc/kubernetes/ssl/etcd/peer.crt && export ETCDCTL_KEY=/etc/kubernetes/ssl/etcd/peer.key
etcdctl member remove $issue_etcd_id
rm -rf /var/lib/etcd/*
etcdctl member add wcn-gduvm-mwdcm1 --peer-urls=https://10.82.69.10:2380
nerdctl run -d --name restore_etcd \ -v /etc/kubernetes/ssl/etcd:/etc/kubernetes/ssl/etcd \ -v /var/lib/etcd:/var/lib/etcd \ --network=host \ -e ETCD_NAME="wcn-gduvm-mwdcm1" \ -e ETCD_INITIAL_CLUSTER="wcn-gduvm-mwdcm2=https://10.82.69.11:2380,wcn-gduvm-mwdcm1=https://10.82.69.10:2380,wcn-gduvm-mwdcm3=https://10.82.69.12:2380" \ -e ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.82.69.10:2380" \ -e ETCD_INITIAL_CLUSTER_STATE="existing" \ --entrypoint=etcd 10.82.49.238/quay.io/coreos/etcd:v3.5.6 --advertise-client-urls=https://10.82.69.10:2379 --auto-compaction-retention=8 --cert-file=/etc/kubernetes/ssl/etcd/server.crt --client-cert-auth=true --data-dir=/var/lib/etcd --election-timeout=5000 --experimental-initial-corrupt-check=true --experimental-watch-progress-notify-interval=5s --heartbeat-interval=250 --key-file=/etc/kubernetes/ssl/etcd/server.key --listen-client-urls=https://127.0.0.1:2379,https://10.82.69.10:2379 --listen-metrics-urls=http://127.0.0.1:2381 --listen-peer-urls=https://10.82.69.10:2380 --metrics=basic --peer-cert-file=/etc/kubernetes/ssl/etcd/peer.crt --peer-client-cert-auth=true --peer-key-file=/etc/kubernetes/ssl/etcd/peer.key --peer-trusted-ca-file=/etc/kubernetes/ssl/etcd/ca.crt --snapshot-count=10000 --trusted-ca-file=/etc/kubernetes/ssl/etcd/ca.crt
nerdctl stop restore_etcd
mv ./etcd.yaml /etc/kubernetes/manifests/etcd.yaml
|
You need to set install_url
to use ShareThis. Please set it in _config.yml
.