概述
Have you ever thought about a “low-level” way of changing the etcd data of your Kubernetes cluster? That is, you alter etcd-stored values without using any common Kubernetes tooling like its native CLI utilities or even API. We’ve been made to perform such a task and here’s our story: why and how we’ve done it. 您是否曾经考虑过更改Kubernetes集群的etcd数据的“低级”方法? 也就是说,您无需使用任何通用的Kubernetes工具(例如其本机CLI实用程序甚至API)即可更改etcd存储的值。 我们被迫执行这样的任务,这就是我们的故事:为什么以及如何做到这一点。 An increasing number of customers (that’s basically developers) ask us to provide access to the Kubernetes cluster in order to interact with internal services. They want to be able to connect directly to a database or a service, to connect their local application to other applications within the cluster, etc. 越来越多的客户(基本上是开发人员)要求我们提供对Kubernetes集群的访问权限,以便与内部服务进行交互。 他们希望能够直接连接到数据库或服务,将本地应用程序连接到集群中的其他应用程序,等等。 For example, you might need to connect to the 例如,您可能需要从本地计算机连接到 We configure K8s clusters using 我们使用 The We have several clusters (production, stage, multiple dev clusters). In this case, all of them will have the same subnets for pods and services by default, which makes it very difficult to use services in multiple clusters simultaneously. 我们有几个集群( production , stage ,多个dev集群)。 在这种情况下,默认情况下,它们全部将具有用于pod和服务的相同子网,这使得同时使用多个群集中的服务非常困难。 We have been using different subnets for different services and pods within the same project for quite a while. In this case, any cluster has its own networks. At the same time, we are maintaining a large number of K8s clusters that we would prefer not to redeploy from the scratch since they have many running services, stateful applications, and so on. 我们已经在同一项目中使用了不同的子网用于不同的服务和Pod。 在这种情况下,任何群集都有其自己的网络。 同时,我们正在维护大量的K8集群,我们不希望它们从头开始重新部署,因为它们具有许多正在运行的服务,有状态的应用程序等。 At some point, we’ve asked ourselves: how do we change a subnet in the existing cluster? 在某个时候,我们问自己:我们如何更改现有集群中的子网? The most common way is to recreate all services of the ClusterIP type. You can find this kind of advices as well: 最常见的方法是重新创建所有 ClusterIP类型的服务。 您也可以找到这种建议 : The following process has a problem: after everything configured, the pods come up with the old IP as a DNS nameserver in /etc/resolv.conf. 以下过程存在问题:配置完所有内容后,pod在/etc/resolv.conf中将旧IP用作DNS名称服务器。 Since I still did not find the solution, I had to reset the entire cluster with kubeadm reset and init it again. 由于仍然找不到解决方案,因此我不得不使用kubeadm reset重置整个集群,然后再次将其初始化。 Unfortunately, that does not work for everyone… Let’s have a more detailed problem definition for our case: 不幸的是,这并不适合所有人。让我们为我们的案例提供更详细的问题定义: The goal is to replace the 我们的目标是取代 As a matter of fact, we have long been tempted to investigate how Kubernetes stores its data in etcd and what can be done with this storage at all… So we just thought: “Why don’t we update the data in etcd by replacing old subnet IPs with the new ones?” 事实上,我们很久以来一直很想研究Kubernetes如何将其数据存储在etcd中以及该存储可以做什么……因此,我们只是想:“ 为什么不通过替换旧的来更新etcd中的数据子网IP与新IP? ” We have been looking for ready-made tools for modifying data in etcd… and nothing has met our needs. But it’s not all bad: etcdhelper by OpenShift was a good starting point (thanks to its creators!). This tool can connect to etcd using certificates, and read etcd data using 我们一直在寻找现成的工具来修改etcd中的数据……而没有任何东西可以满足我们的需求。 但这并不尽然 :OpenShift的etcdhelper是一个不错的起点( 感谢其创建者! )。 该工具可以使用证书连接到etcd,并使用 By the way, do not hesitate to share links if you are aware of tools for direct processing data in etcd! 顺便说一下,如果您知道直接在etcd中处理数据的工具,请不要犹豫共享链接! Looking at etcdhelper we thought: “Why don’t we expand this utility so it will write data to etcd?” 看着etcdhelper,我们想到:“为什么不扩展此实用程序,以便它将数据写入 etcd?” Our efforts have resulted in creating an updated version of etcdhelper with two new functions: 我们的努力导致了两个新的功能创造etcdhelper的更新版本: What do the new features do? Here is the algorithm of 新功能有什么作用? 这是 Here are our operations: 这是我们的操作: The 该 This task is very straightforward to implement. However, it involves a downtime while all the pods in the cluster are being recreated. First, we will describe the main steps, and later, we will share our thoughts on how to minimize that downtime. 该任务非常容易实现。 但是,这将导致停机,而集群中的所有Pod都将被重新创建。 首先,我们将描述主要步骤,然后,我们将分享有关如何最大程度地减少停机时间的想法。 Preparatory steps: 准备步骤: install the necessary software and build the patched 安装必要的软件并构建修补的 back up your etcd and 备份您的etcd和 Here is a summary of actions for changing serviceCIDR: 以下是更改serviceCIDR的操作摘要: Below is a detailed description of the steps. 以下是这些步骤的详细说明。 1. Install 1.安装 2. Build the 2.构建 Install 安装 Copy 复制 3. Back up the etcd data: 3.备份etcd数据: 4. Switch the services subnet in the manifests of the Kubernetes control plane. Replace the value of the 4.在Kubernetes控制平面清单中切换服务子网。 用 5. Since we are making changes to the service subnet for which 5.由于我们正在更改 5.1. Check which domains and IP addresses current certificate is issued for: 5.1。 检查当前证书颁发给的域和IP地址: 5.2. Prepare the basic config for 5.2。 准备 5.3. Delete the old 5.3。 删除旧的 5.4. Reissue certificates for the API server: 5.4。 重新颁发API服务器的证书: 5.5. Check that the certificate is issued for the new subnet: 5.5。 检查是否为新子网颁发了证书: 5.6. After the API server certificate reissue, you’ll have to restart its container: 5.6。 重新颁发API服务器证书后,您必须重新启动其容器: 5.7. Renew the certificate embedded in the 5.7。 续订 5.8. Edit the data in etcd: 5.8。 编辑etcd中的数据: Caution! At this point, the DNS stops resolving domain names in the cluster. It happens because the existing pods still have the old CoreDNS (kube-dns) address in 警告! 此时,DNS停止解析群集中的域名。 发生这种情况是因为现有的Pod在 5.9. Edit ConfigMaps in the 5.9。 在 a) In this CM: a)在此CM中: — replace —用 b) In this CM: b)在此CM中: — switch the —将 5.10. Since the kube-dns address has changed, you need to update the kubelet config on all nodes: 5.10。 由于kube-dns地址已更改,因此您需要在所有节点上更新kubelet配置: 5.11. It is time to restart all pods in the cluster: 5.11。 现在该重新启动集群中的所有Pod了: Here are a few ideas on how to minimize downtime: 以下是有关如何最大程度地减少停机时间的一些想法: After editing the control plane manifests, you can create a new kube-dns service with a new name (e.g., 编辑控制平面清单后,可以使用新名称(例如 Then you can insert the 然后,您可以将 Delete the 删除 This plan will shorten downtime approximately to a minute: the period required to delete the 该计划将停机时间缩短到大约一分钟:删除 Along the way, we have decided to modify podNetwork using our 在此过程中,我们决定使用 edit configurations in the 在 Below is a detailed description of the above actions: 以下是上述操作的详细说明: Edit ConfigMaps in the 在 a) Here: a)在这里: — replace —用新子网( b) Here: b)在这里: — specify the new —指定新的 2. Edit the manifest of the controller-manager: 2.编辑控制器管理器的清单: — specify: —指定:-- 3. Verify the current values of 3.验证所有集群节点的 4. Replace 4.通过直接编辑etcd来替换 5. Check if 5.检查 6. Restart all nodes of the cluster one at a time. 6.一次重新启动集群的所有节点。 7. If there is at least one node with the old podCIDR, kube-controller-manager will not start, and pods in the cluster will not be scheduled. 7.如果至少有一个带有旧podCIDR的节点,则kube-controller-manager将不会启动,并且集群中的Pod也不会被调度。 As a matter of fact, there are easier ways to change podCIDR (example). But still, we wanted to learn how to work with etcd directly since there are cases when editing Kubernetes objects right in etcd is the only possible solution (for example, there is no way to avoid downtime when changing the 实际上,有更简单的方法来更改podCIDR( 示例 )。 但是,我们仍然想直接学习如何使用etcd,因为在某些情况下, 只能在etcd中直接编辑Kubernetes对象(例如,在更改Service的 In this article, we have explored the possibility of working with the data in etcd directly (i.e., without using the Kubernetes API). At times, this approach allows you to do some “tricky things”. We have successfully tested all the above steps using our etcdhelper on real K8s clusters. However, the whole scenario is still PoC (proof of concept) only. Please use it at your own risk. 在本文中,我们探讨了直接在etcd中处理数据的可能性(即,无需使用Kubernetes API)。 有时,这种方法使您可以做一些“棘手的事情”。 我们已经使用我们的etcdhelper在真实的K8s集群上成功测试了上述所有步骤。 但是,整个场景仍然只是PoC(概念验证) 。 请自行承担风险。 This article has been written by our engineers, Vitaly Snurnitsyn & Andrey Sidorov. Follow our blog to get new excellent content from Flant! 本文由我们的工程师 Vitaly Snurnitsyn 和 Andrey Sidorov 撰写 。 跟随 我们的博客 ,从Flant获得新的优秀内容! 翻译自: https://medium.com/flant-com/modifying-kubernetes-etcd-data-ed3d4bb42379 一切如何开始 (How it all started)
memcached.staging.svc.cluster.local
service from your local machine. We accomplish that via a VPN inside the cluster to which the client connects. To do this, we announce subnets related to pods & services and push cluster’s DNS to the client. As a result, when the client tries to connect to the memcached.staging.svc.cluster.local
service, the request goes to the cluster’s DNS. It returns the address for this service from the cluster’s service network or the address of the pod.memcached.staging.svc.cluster.local
服务。 我们通过客户端所连接的集群内部的VPN来实现。 为此,我们宣布与Pod和服务相关的子网,并将集群的DNS推送到客户端。 因此,当客户端尝试连接到memcached.staging.svc.cluster.local
服务时,请求将转到群集的DNS。 它从群集的服务网络返回此服务的地址或Pod的地址。 kubeadm
. In this case, the default service subnet is 192.168.0.0/16
, and the pod’s subnet is 10.244.0.0/16
. Generally, this approach works just fine. However, there are a couple of subtleties:kubeadm
配置K8s集群。 在这种情况下,默认服务子网是192.168.0.0/16
,并且容器的子网是10.244.0.0/16
。 通常,此方法效果很好。 但是,有一些微妙之处: 192.168.*.*
subnet is often used in our customers’ offices, and even more often in the home offices of developers. And that is a recipe for disaster: home routers use the same address space, so the VPN pushes these subnets from the cluster to the client.192.168.*.*
子192.168.*.*
常在客户的办公室中使用,甚至在开发人员的家庭办公室中也更常用。 这就是灾难的秘诀:家用路由器使用相同的地址空间,因此VPN将这些子网从群集推送到客户端。 寻找解决方案 (Searching for a solution)
192.168.0.0/16
service subnet with 172.24.0.0/16
in the cluster deployed using kubeadm
.192.168.0.0/16
与服务子网172.24.0.0/16
使用部署在集群中kubeadm
。 ls
, get
, dump
commands.ls
, get
, dump
命令读取 etcd数据。 扩展etcdhelper (Extending etcdhelper)
changeServiceCIDR
and changePodCIDR
. Its source code is available here.changeServiceCIDR
和changePodCIDR
。 它的源代码 在这里 。 changeServiceCIDR
:changeServiceCIDR
的算法: changePodCIDR
function is essentially the same as changeServiceCIDR
. The only difference is that instead of services, we edit the specification of nodes and replace the value of .spec.PodCIDR
with the new subnet.changePodCIDR
功能是基本相同changeServiceCIDR
。 唯一的不同是,我们代替了服务,而是编辑节点的规范,并用新的子网替换了.spec.PodCIDR
的值。 用法 (Usage)
更换服务CIDR (Replacing serviceCIDR)
etcdhelper
tool;etcdhelper
工具; /etc/kubernetes
./etc/kubernetes
。 etcd-client
for dumping the data:etcd-client
以转储数据: apt install etcd-client
etcdhelper
tool:etcdhelper
工具: golang
:golang
: GOPATH=/root/golang
mkdir -p $GOPATH/local
curl -sSL https://dl.google.com/go/go1.14.1.linux-amd64.tar.gz | tar -xzvC $GOPATH/local
echo "export GOPATH="$GOPATH"" >> ~/.bashrc
echo 'export GOROOT="$GOPATH/local/go"' >> ~/.bashrc
echo 'export PATH="$PATH:$GOPATH/local/go/bin"' >> ~/.bashrcetcdhelper.go
, download dependencies, build the tool:etcdhelper.go
,下载依赖项,构建工具: wget https://raw.githubusercontent.com/flant/examples/master/2020/04-etcdhelper/etcdhelper.go
go get go.etcd.io/etcd/clientv3 k8s.io/kubectl/pkg/scheme k8s.io/apimachinery/pkg/runtime
go build -o etcdhelper etcdhelper.gobackup_dir=/root/backup
mkdir ${backup_dir}
cp -rL /etc/kubernetes ${backup_dir}
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --key=/etc/kubernetes/pki/etcd/server.key --cert=/etc/kubernetes/pki/etcd/server.crt --endpoints https://192.168.199.100:2379 snapshot save ${backup_dir}/etcd.snapshot--service-cluster-ip-range
parameter with the new subnet (172.24.0.0/16
instead of 192.168.0.0/16
) in /etc/kubernetes/manifests/kube-apiserver.yaml
and /etc/kubernetes/manifests/kube-controller-manager.yaml
./etc/kubernetes/manifests/kube-apiserver.yaml
和/etc/kubernetes/manifests/kube-controller-manager.yaml
的新子网( 172.24.0.0/16
而不是192.168.0.0/16
)替换--service-cluster-ip-range
参数的值。 /etc/kubernetes/manifests/kube-controller-manager.yaml
。 kubeadm
issues the apiserver certificates (among others), you have to reissue them:kubeadm
为其颁发apiserver证书(以及其他证书)的服务子网,因此您必须重新发布它们: openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
X509v3 Subject Alternative Name:
DNS:dev-1-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:apiserver, IP Address:192.168.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100kubeadm
:kubeadm
的基本配置: cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
networking:
podSubnet: "10.244.0.0/16"
serviceSubnet: "172.24.0.0/16"
apiServer:
certSANs:
- "192.168.199.100" # master node's IP addresscrt
and key
files (you have to remove them in order to issue the new certificate):crt
和key
文件(您必须删除它们才能颁发新证书): rm /etc/kubernetes/pki/apiserver.{key,crt}
kubeadm init phase certs apiserver --config=kubeadm-config.yaml
openssl x509 -noout -ext subjectAltName </etc/kubernetes/pki/apiserver.crt
X509v3 Subject Alternative Name:
DNS:kube-2-master, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:172.24.0.1, IP Address:10.0.0.163, IP Address:192.168.199.100docker ps | grep k8s_kube-apiserver | awk '{print $1}' | xargs docker restart
admin.conf
:admin.conf
嵌入的证书: kubeadm alpha certs renew admin.conf
./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-service-cidr 172.24.0.0/16
/etc/resolv.conf
, while kube-proxy has already changed iptables’ rules using our new subnet instead of the old one. Below, we will discuss possible ways to minimize downtime./etc/resolv.conf
仍然具有旧的CoreDNS(kube-dns)地址,而kube-proxy已经使用新的子网而不是旧的子网更改了iptables的规则。 下面,我们将讨论减少停机时间的可能方法。 kube-system
namespace:kube-system
名称空间中编辑ConfigMap: kubectl -n kube-system edit cm kubelet-config-1.16
ClusterDNS
with the new IP address of the kube-dns service: kubectl -n kube-system get svc kube-dns
.ClusterDNS
-dns服务的新IP地址替换ClusterDNS
: kubectl -n kube-system get svc kube-dns
。 kubectl -n kube-system edit cm kubeadm-config
data.ClusterConfiguration.networking.serviceSubnet
parameter to the new subnet.data.ClusterConfiguration.networking.serviceSubnet
参数切换到新的子网。 kubeadm upgrade node phase kubelet-config && systemctl restart kubelet
kubectl get pods --no-headers=true --all-namespaces |sed -r 's/(S+)s+(S+).*/kubectl --namespace 1 delete pod 2/e'
减少停机时间 (Minimizing downtime)
kube-dns-tmp
) and a new address (172.24.0.10
).kube-dns-tmp
)和新地址( 172.24.0.10
)创建新的kube-dns服务。 if
condition in etcdhelper
. It will prevent modifying the kube-dns service.if
条件插入etcdhelper
。 这将防止修改kube-dns服务。 kube-dns-tmp
service and edit serviceSubnetCIDR
for the kube-dns service.kube-dns-tmp
服务,并为kube-dns服务编辑serviceSubnetCIDR
。 kube-dns-tmp
service and switch the subnet of the kube-dns service.kube-dns-tmp
服务和切换kube-dns服务的子网所需的时间。 修改podNetwork (Modifying podNetwork)
etcdhelper
. Here is the required sequence of actions:etcdhelper
修改etcdhelper
。 这是必需的操作顺序: kube-system
namespace;kube-system
名称空间中编辑配置; kube-system
namespace:kube-system
名称空间中编辑ConfigMap: kubectl -n kube-system edit cm kubeadm-config
data.ClusterConfiguration.networking.podSubnet
with the new subnet (10.55.0.0/16
).10.55.0.0/16
)替换data.ClusterConfiguration.networking.podSubnet
。 kubectl -n kube-system edit cm kube-proxy
data.config.conf.clusterCIDR: 10.55.0.0/16
.data.config.conf.clusterCIDR: 10.55.0.0/16
。 vim /etc/kubernetes/manifests/kube-controller-manager.yaml
--cluster-cidr=10.55.0.0/16
.--cluster-cidr=10.55.0.0/16
。 .spec.podCIDR
, .spec.podCIDRs
, .InternalIP
, .status.addresses
for all cluster nodes:.spec.podCIDR
, .spec.podCIDRs
, .InternalIP
, .status.addresses
的当前值: kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'[
{
"name": "kube-2-master",
"podCIDR": "10.244.0.0/24",
"podCIDRs": [
"10.244.0.0/24"
],
"InternalIP": "192.168.199.2"
},
{
"name": "kube-2-master",
"podCIDR": "10.244.0.0/24",
"podCIDRs": [
"10.244.0.0/24"
],
"InternalIP": "10.0.1.239"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.244.1.0/24",
"podCIDRs": [
"10.244.1.0/24"
],
"InternalIP": "192.168.199.222"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.244.1.0/24",
"podCIDRs": [
"10.244.1.0/24"
],
"InternalIP": "10.0.4.73"
}
]podCIDR
by editing etcd directly:podCIDR
: ./etcdhelper -cacert /etc/kubernetes/pki/etcd/ca.crt -cert /etc/kubernetes/pki/etcd/server.crt -key /etc/kubernetes/pki/etcd/server.key -endpoint https://127.0.0.1:2379 change-pod-cidr 10.55.0.0/16
podCIDR
has changed:podCIDR
是否已更改: kubectl get no -o json | jq '[.items[] | {"name": .metadata.name, "podCIDR": .spec.podCIDR, "podCIDRs": .spec.podCIDRs, "InternalIP": (.status.addresses[] | select(.type == "InternalIP") | .address)}]'[
{
"name": "kube-2-master",
"podCIDR": "10.55.0.0/24",
"podCIDRs": [
"10.55.0.0/24"
],
"InternalIP": "192.168.199.2"
},
{
"name": "kube-2-master",
"podCIDR": "10.55.0.0/24",
"podCIDRs": [
"10.55.0.0/24"
],
"InternalIP": "10.0.1.239"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.55.1.0/24",
"podCIDRs": [
"10.55.1.0/24"
],
"InternalIP": "192.168.199.222"
},
{
"name": "kube-2-worker-01f438cf-579f9fd987-5l657",
"podCIDR": "10.55.1.0/24",
"podCIDRs": [
"10.55.1.0/24"
],
"InternalIP": "10.0.4.73"
}
]spec.clusterIP
field of the Service).spec.clusterIP
字段时,无法避免停机) )。 摘要 (Summary)
最后
以上就是醉熏鱼为你收集整理的如何直接修改Kubernetes的etcd数据(不使用K8s API) 一切如何开始 (How it all started) 寻找解决方案 (Searching for a solution) 用法 (Usage) 摘要 (Summary)的全部内容,希望文章能够帮你解决如何直接修改Kubernetes的etcd数据(不使用K8s API) 一切如何开始 (How it all started) 寻找解决方案 (Searching for a solution) 用法 (Usage) 摘要 (Summary)所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复