过去一段时间,公司事情比较多,现在稍微能好点,今天进一步验证自己K8S 集群环境,遇到不少问题, 发现从自己的master 上无法访问node 的pod, 然后一堆search 。 config 。。

突然发现自己的master 不能用了, 晕。。。。。。。

[root@k8s-master ~]# systemctl status  kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf, 90-local-extras.conf
   Active: activating (auto-restart) (Result: exit-code) since Wed 2017-11-29 02:21:38 EST; 8s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 6741 (code=exited, status=1/FAILURE)

Nov 29 02:21:38 k8s-master systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Nov 29 02:21:38 k8s-master systemd[1]: Unit kubelet.service entered failed state.
Nov 29 02:21:38 k8s-master systemd[1]: kubelet.service failed.


[root@k8s-master ~]# setenforce 0
setenforce: SELinux is disabled
[root@k8s-master ~]# docker info
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[root@k8s-master ~]# systemctl start docker
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@k8s-master ~]# vi /etc/docker/daemon.json
[root@k8s-master ~]# systemctl start docker
Job for docker.service failed because start of the service was attempted too often. See "systemctl status docker.service" and "journalctl -xe" for details.
To force a start use "systemctl reset-failed docker.service" followed by "systemctl start docker.service" again.

真是一波未平,一波又起, 身心俱疲, 如果之前把文档做好何苦这样盲目跟从,哎。。。。。接着看 上面红色标记的error, yea。。。


[root@k8s-master ~]# systemctl reset-failed docker.service
[root@k8s-master ~]# docker info
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[root@k8s-master ~]# systemctl start docker.service
[root@k8s-master ~]# docker info
Containers: 16
 Running: 0
 Paused: 0
 Stopped: 16
Images: 13
Server Version: 17.09.0-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: false
Logging Driver: json-file
Cgroup Driver: systemd
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
  Profile: default
Kernel Version: 3.10.0-327.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.451GiB
Name: k8s-master
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy:
Https Proxy:
Username: ibmlei
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
Live Restore Enabled: false

WARNING: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior.
         Reformat the filesystem with ftype=1 to enable d_type support.
         Running without d_type support will not be supported in future releases.
[root@k8s-master ~]# systemctl start kubelet


完美,docker 总算起来了哈哈。 继续。。。


[root@k8s-master ~]# kubectl get no

Unable to connect to the server: unexpected EOF


什么情况,之前的cluster 挂了 ?!!!! My GOD..... 苍天啊, 别开玩笑。。。。


[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]#
[root@k8s-master ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─http-proxy.conf, https-proxy.conf
   Active: active (running) since Wed 2017-11-29 02:46:45 EST; 1min 37s ago
     Docs: https://docs.docker.com
 Main PID: 2944 (dockerd)
   Memory: 75.0M
   CGroup: /system.slice/docker.service
           ├─2944 /usr/bin/dockerd
           ├─2947 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --sta...
           ├─3169 docker-containerd-shim 2f95b9091c314e10a2a872e7b4c4166ba69ea46fa669e03afb5d118aa6700daf /var/run/docker/libcontainerd/2f95b909...
           ├─3183 docker-containerd-shim 575f3c40f549715f8d35f0731851867d0f9cd357597ab0186f07bfe1d46e5d05 /var/run/docker/libcontainerd/575f3c40...
           ├─3186 docker-containerd-shim c6ee9aaaac4fd48bd7ec4403dc7448db526c40b4c055171b37c6c9374b872ee6 /var/run/docker/libcontainerd/c6ee9aaa...
           ├─3205 docker-containerd-shim 3c18a8120ad3835c3cb4397bc8c1e5c2368f05fe937abd0189d2135472f013ef /var/run/docker/libcontainerd/3c18a812...
           ├─3305 docker-containerd-shim fd3a3dedec3fe9d3c81da355695b8f17d028a52e042ff944f297134f68b4d07d /var/run/docker/libcontainerd/fd3a3ded...
           └─3329 docker-containerd-shim ac39dcb95a5f16ed427487dbf106e91a3164725b64ee3b496ce92b05a9bf5abb /var/run/docker/libcontainerd/ac39dcb9...

Nov 29 02:46:45 k8s-master dockerd[2944]: time="2017-11-29T02:46:45.857210172-05:00" level=info msg="Docker daemon" commit=afdb6d4 graph....09.0-ce
Nov 29 02:46:45 k8s-master dockerd[2944]: time="2017-11-29T02:46:45.857294369-05:00" level=info msg="Daemon has completed initialization"
Nov 29 02:46:45 k8s-master dockerd[2944]: time="2017-11-29T02:46:45.870612645-05:00" level=info msg="API listen on /var/run/docker.sock"
Nov 29 02:46:45 k8s-master systemd[1]: Started Docker Application Container Engine.
Nov 29 02:46:52 k8s-master dockerd[2944]: time="2017-11-29T02:46:52.974240485-05:00" level=warning msg="Unknown healthcheck type 'NONE' ...8b4d07d"
Nov 29 02:46:53 k8s-master dockerd[2944]: time="2017-11-29T02:46:53.025851894-05:00" level=warning msg="Unknown healthcheck type 'NONE' ...c2e872d"
Nov 29 02:46:53 k8s-master dockerd[2944]: time="2017-11-29T02:46:53.042314432-05:00" level=warning msg="Unknown healthcheck type 'NONE' ...9bf5abb"
Nov 29 02:46:53 k8s-master dockerd[2944]: time="2017-11-29T02:46:53.128104304-05:00" level=warning msg="Unknown healthcheck type 'NONE' ...4d32a0c"
Nov 29 02:47:16 k8s-master dockerd[2944]: time="2017-11-29T02:47:16.810426068-05:00" level=warning msg="Unknown healthcheck type 'NONE' ...c9b372a"
Nov 29 02:47:39 k8s-master dockerd[2944]: time="2017-11-29T02:47:39.219843254-05:00" level=warning msg="Unknown healthcheck type 'NONE' ...04d0f35"
Hint: Some lines were ellipsized, use -l to show in full.

[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf, 90-local-extras.conf
   Active: active (running) since Wed 2017-11-29 02:46:46 EST; 1min 47s ago
     Docs: http://kubernetes.io/docs/
 Main PID: 3092 (kubelet)
   Memory: 35.9M
   CGroup: /system.slice/kubelet.service
           └─3092 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --pod...

Nov 29 02:48:28 k8s-master kubelet[3092]: E1129 02:48:28.428551    3092 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver... refused
Nov 29 02:48:28 k8s-master kubelet[3092]: E1129 02:48:28.429561    3092 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: ... refused
Nov 29 02:48:28 k8s-master kubelet[3092]: E1129 02:48:28.440378    3092 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: ... refused
Nov 29 02:48:29 k8s-master kubelet[3092]: E1129 02:48:29.431624    3092 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver... refused
Nov 29 02:48:29 k8s-master kubelet[3092]: E1129 02:48:29.431668    3092 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: ... refused
Nov 29 02:48:29 k8s-master kubelet[3092]: E1129 02:48:29.443368    3092 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: ... refused
Nov 29 02:48:31 k8s-master kubelet[3092]: I1129 02:48:31.414224    3092 kubelet_node_status.go:276] Setting node annotation to enable vo...h/detach
Nov 29 02:48:31 k8s-master kubelet[3092]: I1129 02:48:31.417611    3092 kubelet_node_status.go:83] Attempting to register node k8s-master
Nov 29 02:48:31 k8s-master kubelet[3092]: E1129 02:48:31.928092    3092 helpers.go:468] PercpuUsage had 0 cpus, but the actual number is...tra CPUs
Nov 29 02:48:33 k8s-master kubelet[3092]: I1129 02:48:33.009272    3092 kubelet_node_status.go:276] Setting node annotation to enable vo...h/detach
Hint: Some lines were ellipsized, use -l to show in full.
[root@k8s-master ~]# systemctl


关键services 没问题, 依旧 不能得到cluster 信息。。。。


[root@k8s-master ~]# kubectl cluster-info
Kubernetes master is running at

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root@k8s-master ~]# kubectl  get rs
Unable to connect to the server: Service Unavailable
[root@k8s-master ~]# kubectl cluster-info dump
Unable to connect to the server: unexpected EOF


好吧,还是重建初始化,在加node 吧。。


[root@k8s-master ~]# kubeadm init --pod-network-cidr= --apiserver-advertise-address= --token-ttl 0
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] WARNING: Connection to "" uses proxy "". If that is not intended, adjust your proxy settings
[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.
[preflight] Some fatal errors occurred:
    Port 10250 is in use
    Port 10251 is in use
    Port 10252 is in use
    /etc/kubernetes/manifests is not empty
    /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
[root@k8s-master ~]# swapoff -a
[root@k8s-master ~]# kubeadm init --pod-network-cidr= --apiserver-advertise-^Cdress= --token-ttl 0
[root@k8s-master ~]# swapoff -acc^C
[root@k8s-master ~]# kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[root@k8s-master ~]# kubeadm init --pod-network-cidr= --apiserver-advertise-address= --token-ttl 0
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] WARNING: Connection to "" uses proxy "". If that is not intended, adjust your proxy settings
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs []
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp: lookup localhost on no such host.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp: lookup localhost on no such host.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp: lookup localhost on no such host.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp: lookup localhost on no such host.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp: lookup localhost on no such host.
[root@k8s-master ~]# ^C
[root@k8s-master ~]# swapoff -a^C
[root@k8s-master ~]# ^C
[root@k8s-master ~]# vi /etc/hosts
[root@k8s-master ~]# kubeadm reset
[preflight] Running pre-flight checks
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Removing kubernetes-managed containers
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/etcd]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[root@k8s-master ~]# kubeadm init --pod-network-cidr= --apiserver-advertise-address= --token-ttl 0
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.8.4
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.09.0-ce. Max validated version: 17.03
[preflight] WARNING: Connection to "" uses proxy "". If that is not intended, adjust your proxy settings
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs []
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.
[apiclient] All control plane components are healthy after 28.643612 seconds
[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[markmaster] Will mark node k8s-master as master by adding a label and a taint
[markmaster] Master k8s-master tainted and labelled with key/value: node-role.kubernetes.io/master=""
[bootstraptoken] Using token: 6db7fb.8277cbc6027e38f6
[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: kube-dns
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token 6db7fb.8277cbc6027e38f6 --discovery-token-ca-cert-hash sha256:b7e0b58d3141d1bf0a8522a258136882f8f4d61f8d5fc16e66ff8c0d40552f64

[root@k8s-master ~]# mkdir -p $HOME/.kube
[root@k8s-master ~]#  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp: overwrite ‘/root/.kube/config’? y
[root@k8s-master ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
[root@k8s-master ~]# kubectl get nodes
NAME                             STATUS     ROLES     AGE       VERSION
huleib.eng.platformlab.ibm.com   NotReady   <none>    43s       v1.8.1
k8s-master                       NotReady   master    3m        v1.8.1
[root@k8s-master ~]# kubectl get nodes
NAME                             STATUS     ROLES     AGE       VERSION
huleib.eng.platformlab.ibm.com   NotReady   <none>    51s       v1.8.1
k8s-master                       NotReady   master    3m        v1.8.1
[root@k8s-master ~]# kubectl get nodes
NAME                             STATUS     ROLES     AGE       VERSION
huleib.eng.platformlab.ibm.com   NotReady   <none>    1m        v1.8.1
k8s-master                       NotReady   master    3m        v1.8.1


我去,node节点状态又不对了, 来看看这又是什么鬼:


[root@k8s-master ~]# kubectl describe nodes huleib.eng.platformlab.ibm.com
Name:               huleib.eng.platformlab.ibm.com
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
Annotations:        node.alpha.kubernetes.io/ttl=0
Taints:             <none>
CreationTimestamp:  Wed, 29 Nov 2017 03:01:17 -0500
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Wed, 29 Nov 2017 03:19:12 -0500   Wed, 29 Nov 2017 03:03:21 -0500   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 29 Nov 2017 03:19:12 -0500   Wed, 29 Nov 2017 03:03:21 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 29 Nov 2017 03:19:12 -0500   Wed, 29 Nov 2017 03:03:21 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            False   Wed, 29 Nov 2017 03:19:12 -0500   Wed, 29 Nov 2017 03:03:21 -0500   KubeletNotReady              runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

  Hostname:    huleib.eng.platformlab.ibm.com
 cpu:     1



[root@k8s-master ~]# kubectl apply -f kube-flannel.yml.2 (自己之前已经下载好哈哈,省事了。。。。。)
clusterrole "flannel" created
clusterrolebinding "flannel" created
serviceaccount "flannel" created
configmap "kube-flannel-cfg" created
daemonset "kube-flannel-ds" created




[root@k8s-master ~]# kubectl get nodes
NAME                             STATUS    ROLES     AGE       VERSION
huleib.eng.platformlab.ibm.com   Ready     <none>    18m       v1.8.1
k8s-master                       Ready     master    20m       v1.8.1


