概述
目录
环境:
硬件规格:
集群10.253.24.4
单机10.253.15.56
编译和运行:
运行日志:
集群串行执行:
集群并行执行:
失败的测试用例收集:
需要解决的失败用例:
一. [sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
二. [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
三. [sig-cli] oc observe works as expected [Suite:openshift/conformance/parallel]
环境:
集群环境:10.253.24.4 root/123456
单机环境:10.253.15.55 、10.253.15.56、10.253.15.59
硬件规格:
集群10.253.24.4
三节点
[root@master0 ~]# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master0 Ready master,worker 3d3h v1.22.3+4dd1b5a 10.253.24.4 <none> CCLinux 2203 5.15.13-0.el9.x86_64 cri-o://1.23.2 master1 Ready master,worker 3d3h v1.22.3+4dd1b5a 10.253.24.5 <none> CCLinux 2203 5.15.13-0.el9.x86_64 cri-o://1.23.2 master2 Ready master,worker 3d3h v1.22.3+4dd1b5a 10.253.24.6 <none> CCLinux 2203 5.15.13-0.el9.x86_64 cri-o://1.23.2 |
Capacity: cpu: 8 ephemeral-storage: 184230Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32603308Ki pods: 250 Allocatable: cpu: 7500m ephemeral-storage: 173861240545 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 31452332Ki pods: 250 |
processor : 7 vendor_id : HygonGenuine cpu family : 24 model : 1 model name : Hygon C86 7285 32-core Processor stepping : 1 microcode : 0x80901047 cpu MHz : 1999.999 cache size : 512 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 14 initial apicid : 14 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext cpb ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero arat overflow_recov succor bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips : 3999.99 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 45 bits physical, 48 bits virtual power management: |
单机10.253.15.56
Capacity: cpu: 8 ephemeral-storage: 194465Mi example.com/fakecpu: 1k hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32603860Ki pods: 250 Allocatable: cpu: 7500m ephemeral-storage: 183520198353 example.com/fakecpu: 1k hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 31452884Ki pods: 250 |
processor : 7 vendor_id : HygonGenuine cpu family : 24 model : 1 model name : Hygon C86 7285 32-core Processor stepping : 1 microcode : 0x80901047 cpu MHz : 1999.999 cache size : 512 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 14 initial apicid : 14 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext cpb ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero arat overflow_recov succor bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass bogomips : 3999.99 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 45 bits physical, 48 bits virtual power management: |
编译和运行:
release测试执行指导
运行日志:
集群串行执行:
集群并行执行:
失败的测试用例收集:
CCOS 0.0.0 单节点失败用例
CCOS 0.0.0 集群失败用例
需要解决的失败用例:
一. [sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
问题描述:
告警非info、firing状态的告警数大于1 alertstate="firing",severity!="info" 告警信息 metric": { "__name__": "ALERTS", "alertname": "CannotRetrieveUpdates", "alertstate": "firing", "endpoint": "metrics", "instance": "10.255.245.137:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-79fd7675bd-nz5hr", "prometheus": "openshift-monitoring/k8s", "service": "cluster-version-operator", "severity": "warning" }, "value": [ 1649754743.841, "1" ] }, { "metric": { "__name__": "ALERTS", "alertname": "KubeContainerWaiting", "alertstate": "firing", "container": "machine-config-server", "namespace": "default", "pod": "bootstrap-machine-config-operator-qqmaster0", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, "value": [ 1649754743.841, "1" ] }, { "metric": { "__name__": "ALERTS", "alertname": "KubePodNotReady", "alertstate": "firing", "namespace": "default", "pod": "bootstrap-machine-config-operator-qqmaster0", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, "value": [ 1649754743.841, "1" ] } |
started: (0/4/333) "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" passed: (500ms) 2022-04-18T07:08:20 "[sig-arch][Early] Managed cluster should start all core operators [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" skip [github.com/openshift/origin/test/extended/machines/cluster.go:44]: cluster does not have machineset resources skipped: (500ms) 2022-04-18T07:08:20 "[sig-cluster-lifecycle][Feature:Machines][Early] Managed cluster should have same number of Machines and Nodes [Suite:openshift/conformance/parallel]" passed: (500ms) 2022-04-18T07:08:20 "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]" [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:61 [BeforeEach] [sig-instrumentation] Prometheus github.com/openshift/origin/test/extended/util/client.go:142 STEP: Creating a kubernetes client [BeforeEach] [sig-instrumentation] Prometheus github.com/openshift/origin/test/extended/prometheus/prometheus.go:250 [It] shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel] github.com/openshift/origin/test/extended/prometheus/prometheus.go:506 Apr 18 15:08:22.024: INFO: Creating namespace "e2e-test-prometheus-v8jrp" Apr 18 15:08:22.291: INFO: Waiting for ServiceAccount "default" to be provisioned... Apr 18 15:08:22.396: INFO: Creating new exec pod STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 Apr 18 15:08:26.433: INFO: Running '/usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1"' Apr 18 15:08:26.633: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1'n" Apr 18 15:08:26.633: INFO: stdout: "{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"CannotRetrieveUpdates","alertstate":"firing","endpoint":"metrics","instance":"10.253.24.6:9099","job":"cluster-version-operator","namespace":"openshift-cluster-version","pod":"cluster-version-operator-57f968f56-mv9s8","prometheus":"openshift-monitoring/k8s","service":"cluster-version-operator","severity":"warning"},"value":[1650265706.616,"1"]},{"metric":{"__name__":"ALERTS","alertname":"SystemMemoryExceedsReservation","alertstate":"firing","node":"master1","prometheus":"openshift-monitoring/k8s","severity":"warning"},"value":[1650265706.616,"1"]}]}}n" STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 Apr 18 15:08:36.633: INFO: Running '/usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1"' Apr 18 15:08:36.804: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1'n" Apr 18 15:08:36.804: INFO: stdout: "{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"ALERTS","alertname":"CannotRetrieveUpdates","alertstate":"firing","endpoint":"metrics","instance":"10.253.24.6:9099","job":"cluster-version-operator","namespace":"openshift-cluster-version","pod":"cluster-version-operator-57f968f56-mv9s8","prometheus":"openshift-monitoring/k8s","service":"cluster-version-operator","severity":"warning"},"value":[1650265716.792,"1"]},{"metric":{"__name__":"ALERTS","alertname":"SystemMemoryExceedsReservation","alertstate":"firing","node":"master1","prometheus":"openshift-monitoring/k8s","severity":"warning"},"value":[1650265716.792,"1"]}]}}n" Apr 18 15:08:36.804: INFO: promQL query returned unexpected results: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 [ { "metric": { "__name__": "ALERTS", "alertname": "CannotRetrieveUpdates", "alertstate": "firing", "endpoint": "metrics", "instance": "10.253.24.6:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-57f968f56-mv9s8", "prometheus": "openshift-monitoring/k8s", "service": "cluster-version-operator", "severity": "warning" }, "value": [ 1650265706.616, "1" ] }, { "metric": { "__name__": "ALERTS", "alertname": "SystemMemoryExceedsReservation", "alertstate": "firing", "node": "master1", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, "value": [ 1650265706.616, "1" ] } ] STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 Apr 18 15:08:46.805: INFO: Running '/usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1"' Apr 18 15:08:46.899: INFO: rc: 1 Apr 18 15:08:46.899: INFO: promQL query returned unexpected results: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 [ { "metric": { "__name__": "ALERTS", "alertname": "CannotRetrieveUpdates", "alertstate": "firing", "endpoint": "metrics", "instance": "10.253.24.6:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-57f968f56-mv9s8", "prometheus": "openshift-monitoring/k8s", "service": "cluster-version-operator", "severity": "warning" }, "value": [ 1650265716.792, "1" ] }, { "metric": { "__name__": "ALERTS", "alertname": "SystemMemoryExceedsReservation", "alertstate": "firing", "node": "master1", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, "value": [ 1650265716.792, "1" ] } ] STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 Apr 18 15:08:56.900: INFO: Running '/usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1"' Apr 18 15:08:56.998: INFO: rc: 1 STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 Apr 18 15:09:06.999: INFO: Running '/usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1"' Apr 18 15:09:07.078: INFO: rc: 1 [AfterEach] [sig-instrumentation] Prometheus github.com/openshift/origin/test/extended/util/client.go:140 STEP: Collecting events from namespace "e2e-test-prometheus-v8jrp". STEP: Found 7 events. Apr 18 15:09:17.092: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for execpod: { } Scheduled: Successfully assigned e2e-test-prometheus-v8jrp/execpod to master1 Apr 18 15:09:17.092: INFO: At 2022-04-18 15:08:24 +0800 CST - event for execpod: {multus } AddedInterface: Add eth0 [21.100.0.244/23] from ovn-kubernetes Apr 18 15:09:17.092: INFO: At 2022-04-18 15:08:24 +0800 CST - event for execpod: {kubelet master1} Pulled: Container image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" already present on machine Apr 18 15:09:17.092: INFO: At 2022-04-18 15:08:24 +0800 CST - event for execpod: {kubelet master1} Created: Created container agnhost-container Apr 18 15:09:17.092: INFO: At 2022-04-18 15:08:24 +0800 CST - event for execpod: {kubelet master1} Started: Started container agnhost-container Apr 18 15:09:17.092: INFO: At 2022-04-18 15:08:40 +0800 CST - event for execpod: {taint-controller } TaintManagerEviction: Marking for deletion Pod e2e-test-prometheus-v8jrp/execpod Apr 18 15:09:17.092: INFO: At 2022-04-18 15:08:40 +0800 CST - event for execpod: {kubelet master1} Killing: Stopping container agnhost-container Apr 18 15:09:17.094: INFO: POD NODE PHASE GRACE CONDITIONS Apr 18 15:09:17.094: INFO: Apr 18 15:09:17.097: INFO: skipping dumping cluster info - cluster too large [AfterEach] [sig-instrumentation] Prometheus github.com/openshift/origin/test/extended/util/client.go:141 STEP: Destroying namespace "e2e-test-prometheus-v8jrp" for this suite. fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:533]: Unexpected error: <errors.aggregate | len:1, cap:1>: [ { s: "unable to execute query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1: unable to execute query host command failed: error running /usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1":nCommand stdout:nnstderr:nError from server (NotFound): pods "execpod" not foundnnerror:nexit status 1n", }, ] unable to execute query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1: unable to execute query host command failed: error running /usr/bin/kubectl --server=https://api.ocp4e2e.samuele2e.cn:6443 --kubeconfig=/root/.kube/config --namespace=e2e-test-prometheus-v8jrp exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ijc1T01GM2Qtc2NLQjZLb2ZENkFKcEcxLW5USVhkbGpVNUY1cGV5UTUtOVUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNnB4YngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiYmFlNzhkZTktZGE4NS00OTdmLTkyNzItOWY5ZjI4MWVlZGM3Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.5VmNcWMFRgyXdUZVcUSXQazo-hoA3Tzhyq3zJJn-Zbuqvdxs0c9lb2iobTriV-bwA4Ub6e9pw3dHIJOoaqBBD4nSEmXGRm6RfrHaKeU_t-d_BfHAyP-K4wUsyA6DV0Rpk3JhONz1vFX2OdEvu5aiZXJOyxdHKbOvn4y_caeUDPOj1TKFHkfE81zoG-mpZomYuEW7rudk2yHblTS_jfinSelC9Hdi62czl-omycez6XCyqHvCI4yFwRBQv3o409s5Xj2y5oMik2YVmo__TIgi0bO-VKzYT58KYRSW9uK4UIJMfyvSzDN-6j2eugyfWwfSPCKUh6NmLxq2NDSPMd4yFw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%7CPrometheusRemoteWriteDesiredShards%22%2Calertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%7D+%3E%3D+1": Command stdout: stderr: Error from server (NotFound): pods "execpod" not found error: exit status 1 occurred failed: (56.7s) 2022-04-18T07:09:17 "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" |
Apr 18 15:08:36.804: INFO: promQL query returned unexpected results: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 [ { "metric": { "__name__": "ALERTS", "alertname": "CannotRetrieveUpdates", "alertstate": "firing", "endpoint": "metrics", "instance": "10.253.24.6:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-57f968f56-mv9s8", "prometheus": "openshift-monitoring/k8s", "service": "cluster-version-operator", "severity": "warning" }, "value": [ 1650265706.616, "1" ] }, { "metric": { "__name__": "ALERTS", "alertname": "SystemMemoryExceedsReservation", "alertstate": "firing", "node": "master1", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, "value": [ 1650265706.616, "1" ] } ] STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 |
问题分析:
"metric": { "__name__": "ALERTS", "alertname": "CannotRetrieveUpdates", "alertstate": "firing", "endpoint": "metrics", "instance": "10.253.24.6:9099", "job": "cluster-version-operator", "namespace": "openshift-cluster-version", "pod": "cluster-version-operator-57f968f56-mv9s8", "prometheus": "openshift-monitoring/k8s", "service": "cluster-version-operator", "severity": "warning" }, |
"metric": { "__name__": "ALERTS", "alertname": "SystemMemoryExceedsReservation", "alertstate": "firing", "node": "master1", "prometheus": "openshift-monitoring/k8s", "severity": "warning" }, |
可以看出:
- 在firing阶段有两个告警,
- CannotRetrieveUpdates
- SystemMemoryExceedsReservation
1953846 – SystemMemoryExceedsReservation alert should consider hugepage reservation
SystemMemoryExceedsReservation alert which is added from OCP 4.6 should consider Hugepage reservation. The SystemMemoryExceedsReservation alert uses following Prometheus query: ~~~ sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"})) * 0.9) ~~~ As per the above query, If hugepages were set on worker node, the right side of the check would contain hugepages that are supposed to be allocated by the applications. The left side indicates working memory allocated by system processes related to containers running inside the node. In this case, the right side would be added much more application memory size that is irrelevant to the system reserved memory, so the alert would become meaningless. For example, if a node has 30GiB of hugepages like below: ~~~ $ oc describe node <node-name> ... Capacity: cpu: 80 ephemeral-storage: 2096613Mi hugepages-1Gi: 30Gi hugepages-2Mi: 0 memory: 527977304Ki openshift.io/dpdk_ext0: 0 openshift.io/f1u: 10 openshift.io/sriov_ext0: 10 pods: 250 Allocatable: cpu: 79500m ephemeral-storage: 1977538520680 hugepages-1Gi: 30Gi hugepages-2Mi: 0 memory: 495369048Ki openshift.io/dpdk_ext0: 0 openshift.io/f1u: 10 openshift.io/sriov_ext0: 10 pods: 250 .. ~~~ The system-reserved contains the 30GiB of huge pages which will be allocated by the applications. SystemReserved = (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"})) = 527977304Ki - 495369048Ki = 31GiB And (container_memory_rss {id = "/system.slice "}) is unlikely to be larger than the right side, as the underlying system process rarely uses huge pages as far as I know. I am not sure If my understanding is correct or not , if I am wrong please let me know. |
https://github.com/openshift/machine-config-operator/blob/f86955971533aacbb4bb66f5c7041057d3f33566/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L53-L60
- name: system-memory-exceeds-reservation rules: - alert: SystemMemoryExceedsReservation expr: | sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"})) * 0.9) for: 15m labels: severity: warning annotations: message: "System memory usage of {{ $value | humanize }} on {{ $labels.node }} exceeds 90% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The default reservation is expected to be sufficient for most configurations and should be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods (either due to rate of change or at steady state)." |
Allocating resources for nodes - Working with nodes | Nodes | OpenShift Container Platform 4.6
Managing nodes - Working with nodes | Nodes | OpenShift Container Platform 4.10
以单机环境10.253.15.56做计算:
Capacity: cpu: 8 ephemeral-storage: 194465Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 32603860Ki pods: 250 Allocatable: cpu: 7500m ephemeral-storage: 183520198353 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 31452884Ki pods: 250 |
(32603860Ki-31452884Ki) * 0.9
= 1,150,976Ki * 0.9
= 1,035,878.4 Ki
= 1,011.6 MB
= 0.98 GB
一旦容器所需要的内存超过 0.98GB, 则普罗米修斯将开始报警
问题解决:
解决方法一: 买内存, 给单节点和集群的openshift都扩充内存
所需要的内存, 需要满足测试时:
https://github.com/openshift/machine-config-operator/blob/f86955971533aacbb4bb66f5c7041057d3f33566/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L53-L60
sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"})) * 0.9) |
解决方法二: 修改openshit的machine-config-operator组件中的监控属性,
https://github.com/openshift/machine-config-operator
https://github.com/openshift/machine-config-operator/blob/f86955971533aacbb4bb66f5c7041057d3f33566/install/0000_90_machine-config-operator_01_prometheus-rules.yaml#L53-L60
直接修改为永久不报警,
- name: system-memory-exceeds-reservation rules: - alert: SystemMemoryExceedsReservation expr: | sum by (node) (container_memory_rss{id="/system.slice"}) > ((sum by (node) (kube_node_status_capacity{resource="memory"} - kube_node_status_allocatable{resource="memory"})) * 0.9) for: 15m labels: severity: warning annotations: message: "System memory usage of {{ $value | humanize }} on {{ $labels.node }} exceeds 90% of the reservation. Reserved memory ensures system processes can function even when the node is fully allocated and protects against workload out of memory events impacting the proper functioning of the node. The default reservation is expected to be sufficient for most configurations and should be increased (https://docs.openshift.com/container-platform/latest/nodes/nodes/nodes-nodes-managing.html) when running nodes with high numbers of pods (either due to rate of change or at steady state)." |
二. [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
问题描述:
Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 2313 seconds with labels: {endpoint="metrics", instance="10.255.245.135:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-79fd7675bd-8vwqd", service="cluster-version-operator", severity="warning"} alert KubeContainerWaiting fired for 2313 seconds with labels: {container="machine-config-server", namespace="default", pod="bootstrap-machine-config-operator-master0", severity="warning"} alert KubePodNotReady fired for 2313 seconds with labels: {namespace="default", pod="bootstrap-machine-config-operator-master0", severity="warning"} alert etcdGRPCRequestsSlow fired for 60 seconds with labels: {endpoint="etcd-metrics", grpc_method="Status", grpc_service="etcdserverpb.Maintenance", instance="10.255.245.135:9979", job="etcd", namespace="openshift-etcd", pod="etcd-master0", service="etcd", severity="critical"} |
[root@master0 zsl]# cat e2e202204014-cluster-serial.log | grep "sig-instrumentation" started: (0/4/333) "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" [BeforeEach] [sig-instrumentation] Prometheus [BeforeEach] [sig-instrumentation] Prometheus [AfterEach] [sig-instrumentation] Prometheus [AfterEach] [sig-instrumentation] Prometheus failed: (56.7s) 2022-04-18T07:09:17 "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started: (7/326/333) "[sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started: (7/327/333) "[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started: (7/331/333) "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" passed: (7.6s) 2022-04-18T08:00:17 "[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" passed: (7.9s) 2022-04-18T08:00:18 "[sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" [BeforeEach] [sig-instrumentation][Late] Alerts [BeforeEach] [sig-instrumentation][Late] Alerts [AfterEach] [sig-instrumentation][Late] Alerts [AfterEach] [sig-instrumentation][Late] Alerts failed: (7.7s) 2022-04-18T08:00:18 "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" Apr 18 07:08:20.489 I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started Apr 18 07:08:20.489 - 56s E e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" e2e test finished As "Failed" Apr 18 07:09:17.142 E e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" finishedStatus/Failed Apr 18 08:00:10.359 I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started Apr 18 08:00:10.359 - 7s I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" e2e test finished As "Passed" Apr 18 08:00:10.359 I e2e-test/"[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started Apr 18 08:00:10.359 - 7s I e2e-test/"[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" e2e test finished As "Passed" Apr 18 08:00:10.941 I e2e-test/"[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started Apr 18 08:00:10.941 - 7s E e2e-test/"[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" e2e test finished As "Failed" Apr 18 08:00:17.979 I e2e-test/"[sig-instrumentation][Late] Alerts shouldn't exceed the 500 series limit of total series sent via telemetry from each cluster [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" finishedStatus/Passed Apr 18 08:00:18.277 I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" finishedStatus/Passed Apr 18 08:00:18.666 E e2e-test/"[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" finishedStatus/Failed [sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel] [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel] started: (0/2/333) "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" skipped: (54.2s) 2022-04-18T08:24:07 "[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" Apr 18 08:23:12.818 I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started Apr 18 08:23:12.818 - 54s I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" e2e test finished As "Skipped" Apr 18 08:24:07.055 I e2e-test/"[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" finishedStatus/Skipped |
alert HighOverallControlPlaneCPU fired for 180 seconds with labels: {namespace="openshift-kube-apiserver", severity="warning"} (allowed: high CPU utilization during e2e runs is normal) Apr 18 16:00:18.630: FAIL: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 3118 seconds with labels: {endpoint="metrics", instance="10.253.24.6:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-57f968f56-mv9s8", service="cluster-version-operator", severity="warning"} alert SystemMemoryExceedsReservation fired for 2526 seconds with labels: {node="master1", severity="warning"} alert etcdMemberCommunicationSlow fired for 30 seconds with labels: {To="cbe753567cf13352", endpoint="etcd-metrics", instance="10.253.24.6:9979", job="etcd", namespace="openshift-etcd", pod="etcd-master2", service="etcd", severity="warning"} Full Stack Trace github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc001983f20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xa3 github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc001983f20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x15c github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc001986120, 0x8f27f00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/it_node.go:26 +0x87 github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0028d6f00, 0x0, 0x8f27f00, 0xc00038a7c0) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/spec/spec.go:215 +0x72f github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0028d6f00, 0x8f27f00, 0xc00038a7c0) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/spec/spec.go:138 +0xf2 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc001fbcc80, 0xc0028d6f00, 0x0) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0x111 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc001fbcc80, 0x1) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x147 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc001fbcc80, 0xc002c35398) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0x117 github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc000352870, 0x8f281c0, 0xc001cfee10, 0x0, 0x0, 0xc000796530, 0x1, 0x1, 0x90018d8, 0xc00038a7c0, ...) github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/suite/suite.go:62 +0x426 github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001dae7e0, 0xc001372c40, 0x1, 0x1, 0x83256a1, 0x4a53120) github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x418 main.newRunTestCommand.func1.1() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x4e github.com/openshift/origin/test/extended/util.WithCleanup(0xc001abfc18) github.com/openshift/origin/test/extended/util/test.go:168 +0x5f main.newRunTestCommand.func1(0xc001d91180, 0xc001372c40, 0x1, 0x1, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x333 github.com/spf13/cobra.(*Command).execute(0xc001d91180, 0xc001372be0, 0x1, 0x1, 0xc001d91180, 0xc001372be0) github.com/spf13/cobra@v1.1.3/command.go:852 +0x472 github.com/spf13/cobra.(*Command).ExecuteC(0xc001d90780, 0x0, 0x8f30f20, 0xbfdc960) github.com/spf13/cobra@v1.1.3/command.go:960 +0x375 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.1.3/command.go:897 main.main.func1(0xc001d90780, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0x94 main.main() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x42c [AfterEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:140 STEP: Collecting events from namespace "e2e-test-prometheus-ptvn2". STEP: Found 5 events. Apr 18 16:00:18.641: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for execpod: { } Scheduled: Successfully assigned e2e-test-prometheus-ptvn2/execpod to master1 Apr 18 16:00:18.641: INFO: At 2022-04-18 16:00:16 +0800 CST - event for execpod: {multus } AddedInterface: Add eth0 [21.100.1.147/23] from ovn-kubernetes Apr 18 16:00:18.641: INFO: At 2022-04-18 16:00:16 +0800 CST - event for execpod: {kubelet master1} Pulled: Container image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" already present on machine Apr 18 16:00:18.641: INFO: At 2022-04-18 16:00:16 +0800 CST - event for execpod: {kubelet master1} Created: Created container agnhost-container Apr 18 16:00:18.641: INFO: At 2022-04-18 16:00:16 +0800 CST - event for execpod: {kubelet master1} Started: Started container agnhost-container Apr 18 16:00:18.643: INFO: POD NODE PHASE GRACE CONDITIONS Apr 18 16:00:18.643: INFO: execpod master1 Running 1s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 16:00:13 +0800 CST } {Ready True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 16:00:17 +0800 CST } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 16:00:17 +0800 CST } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 16:00:13 +0800 CST }] Apr 18 16:00:18.643: INFO: Apr 18 16:00:18.646: INFO: skipping dumping cluster info - cluster too large [AfterEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:141 STEP: Destroying namespace "e2e-test-prometheus-ptvn2" for this suite. fail [github.com/onsi/ginkgo@v4.7.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Apr 18 16:00:18.630: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 3118 seconds with labels: {endpoint="metrics", instance="10.253.24.6:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-57f968f56-mv9s8", service="cluster-version-operator", severity="warning"} alert SystemMemoryExceedsReservation fired for 2526 seconds with labels: {node="master1", severity="warning"} alert etcdMemberCommunicationSlow fired for 30 seconds with labels: {To="cbe753567cf13352", endpoint="etcd-metrics", instance="10.253.24.6:9979", job="etcd", namespace="openshift-etcd", pod="etcd-master2", service="etcd", severity="warning"} failed: (7.7s) 2022-04-18T08:00:18 "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" |
允许的报警:
Apr 18 16:00:18.630: INFO: Alerts were detected during test run which are allowed: alert HighOverallControlPlaneCPU fired for 180 seconds with labels: {namespace="openshift-kube-apiserver", severity="warning"} (allowed: high CPU utilization during e2e runs is normal) |
不允许的报警:
Apr 18 16:00:18.630: FAIL: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 3118 seconds with labels: {endpoint="metrics", instance="10.253.24.6:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-57f968f56-mv9s8", service="cluster-version-operator", severity="warning"} alert SystemMemoryExceedsReservation fired for 2526 seconds with labels: {node="master1", severity="warning"} alert etcdMemberCommunicationSlow fired for 30 seconds with labels: {To="cbe753567cf13352", endpoint="etcd-metrics", instance="10.253.24.6:9979", job="etcd", namespace="openshift-etcd", pod="etcd-master2", service="etcd", severity="warning"} |
问题分析:
etcdMemberCommunicationSlow
root@master0 zsl]# kubectl -n openshift-etcd get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES etcd-master0 4/4 Running 0 3d2h 10.253.24.4 master0 <none> <none> etcd-master1 4/4 Running 0 3d2h 10.253.24.5 master1 <none> <none> etcd-master2 4/4 Running 0 3d2h 10.253.24.6 master2 <none> <none> etcd-quorum-guard-6df5f57df7-cx8vf 1/1 Running 0 144m 10.253.24.5 master1 <none> <none> etcd-quorum-guard-6df5f57df7-rwsrk 1/1 Running 0 3d2h 10.253.24.4 master0 <none> <none> etcd-quorum-guard-6df5f57df7-vpgkv 1/1 Running 0 3d2h 10.253.24.6 master2 <none> <none> |
[root@master0 ~]# kubectl -n openshift-etcd get endpoints -o wide NAME ENDPOINTS AGE etcd 10.253.24.4:2379,10.253.24.5:2379,10.253.24.6:2379 + 3 more... 3d3h [root@master0 ~]# kubectl -n openshift-etcd describe endpoint etcd error: the server doesn't have a resource type "endpoint" [root@master0 ~]# kubectl -n openshift-etcd describe endpoints etcd Name: etcd Namespace: openshift-etcd Labels: k8s-app=etcd Annotations: <none> Subsets: Addresses: 10.253.24.4,10.253.24.5,10.253.24.6 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- etcd 2379 TCP etcd-metrics 9979 TCP Events: <none> |
Prometheus Cluster Monitoring | Configuring Clusters | OpenShift Container Platform 3.11
|
| Etcd cluster "Job": member communication with To is taking X_s on etcd instance _Instance. |
https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_90_etcd-operator_03_prometheusrule.yaml
- alert: etcdMemberCommunicationSlow annotations: description: 'etcd cluster "{{ $labels.job }}": member communication with {{ $labels.To }} is taking {{ $value }}s on etcd instance {{ $labels.instance }}.' summary: etcd cluster member communication is slow. expr: | histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~".*etcd.*"}[5m])) > 0.15 for: 10m labels: severity: warning |
Histograms and summaries | Prometheus
etcd处理:
- etcd对磁盘io敏感, 必须要先固定硬件数据
处理:
CannotRetrieveUpdates:
1927903 – "CannotRetrieveUpdates" - critical error in openshift web console
https://github.com/openshift/cluster-version-operator/blob/master/install/0000_90_cluster-version-operator_02_servicemonitor.yaml#L52-L59
- alert: CannotRetrieveUpdates annotations: summary: Cluster version operator has not retrieved updates in {{ "{{ $value | humanizeDuration }}" }}. description: Failure to retrieve updates means that cluster administrators will need to monitor for available updates on their own or risk falling behind on security or other bugfixes. If the failure is expected, you can clear spec.channel in the ClusterVersion object to tell the cluster-version operator to not retrieve updates. Failure reason {{ "{{ with $cluster_operator_conditions := "cluster_operator_conditions" | query}}{{range $value := .}}{{if and (eq (label "name" $value) "version") (eq (label "condition" $value) "RetrievedUpdates") (eq (label "endpoint" $value) "metrics") (eq (value $value) 0.0)}}{{label "reason" $value}} {{end}}{{end}}{{end}}" }}. {{ "{{ with $console_url := "console_url" | query }}{{ if ne (len (label "url" (first $console_url ) ) ) 0}} For more information refer to {{ label "url" (first $console_url ) }}/settings/cluster/.{{ end }}{{ end }}" }} expr: | (time()-cluster_version_operator_update_retrieval_timestamp_seconds) >= 3600 and ignoring(condition, name, reason) cluster_operator_conditions{name="version", condition="RetrievedUpdates", endpoint="metrics", reason!="NoChannel"} labels: severity: warning |
https://github.com/openshift/okd/blob/master/KNOWN_ISSUES.md#cannotretrieveupdates-alert
KubeContainerWaiting
1976940 – GCP RT CI failing on firing KubeContainerWaiting due to liveness and readiness probes timing out
- alert: KubeContainerWaiting annotations: description: pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on container {{ $labels.container}} has been in waiting state for longer than 1 hour. runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubecontainerwaiting summary: Pod container waiting longer than 1 hour expr: | sum by (namespace, pod, container) (kube_pod_container_status_waiting_reason{job="kube-state-metrics"}) > 0 for: 1h labels: severity: warning |
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kubernetesControlPlane-prometheusRule.yaml#L169-L180
KubePodNotReady
- alert: KubePodNotReady annotations: description: Pod {{ $labels.namespace }}/{{ $labels.pod }} has been in a non-ready state for longer than 15 minutes. runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubepodnotready summary: Pod has been in a non-ready state for more than 15 minutes. expr: | sum by (namespace, pod) ( max by(namespace, pod) ( kube_pod_status_phase{job="kube-state-metrics", phase=~"Pending|Unknown"} ) * on(namespace, pod) group_left(owner_kind) topk by(namespace, pod) ( 1, max by(namespace, pod, owner_kind) (kube_pod_owner{owner_kind!="Job"}) ) ) > 0 for: 15m labels: severity: warning |
https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kubernetesControlPlane-prometheusRule.yaml#L26-L42
https://github.com/openshift/cluster-monitoring-operator/issues/72
pod处于non-ready 状态 大于15 min
问题解决:
三. [sig-cli] oc observe works as expected [Suite:openshift/conformance/parallel]
问题描述:
error: command "/bin/sh" exited with status code 1 error: command "/bin/sh" exited with status code 2 |
集群串行执行, 无错误:
Line 2075: started: (3/233/333) "[sig-cli] Kubectl client Kubectl taint [Serial] should update the taint on a node [Suite:openshift/conformance/serial] [Suite:k8s]" Line 2077: passed: (5.1s) 2022-04-18T07:30:58 "[sig-cli] Kubectl client Kubectl taint [Serial] should update the taint on a node [Suite:openshift/conformance/serial] [Suite:k8s]" Line 2125: started: (3/242/333) "[sig-cli] Kubectl client Kubectl taint [Serial] should remove all the taints with the same key off a node [Skipped:SingleReplicaTopology] [Suite:openshift/conformance/serial] [Suite:k8s]" Line 2127: passed: (5.9s) 2022-04-18T07:33:11 "[sig-cli] Kubectl client Kubectl taint [Serial] should remove all the taints with the same key off a node [Skipped:SingleReplicaTopology] [Suite:openshift/conformance/serial] [Suite:k8s]" Line 4097045: started: (7/318/333) "[sig-cli] oc adm cluster-role-reapers [Serial] [Suite:openshift/conformance/serial]" Line 4097047: passed: (12.4s) 2022-04-18T08:00:03 "[sig-cli] oc adm cluster-role-reapers [Serial] [Suite:openshift/conformance/serial]" Line 4102157: Apr 18 07:30:53.319 I e2e-test/"[sig-cli] Kubectl client Kubectl taint [Serial] should update the taint on a node [Suite:openshift/conformance/serial] [Suite:k8s]" started Line 4102158: Apr 18 07:30:53.319 - 5s I e2e-test/"[sig-cli] Kubectl client Kubectl taint [Serial] should update the taint on a node [Suite:openshift/conformance/serial] [Suite:k8s]" e2e test finished As "Passed" Line 4102170: Apr 18 07:30:58.468 I e2e-test/"[sig-cli] Kubectl client Kubectl taint [Serial] should update the taint on a node [Suite:openshift/conformance/serial] [Suite:k8s]" finishedStatus/Passed Line 4102501: Apr 18 07:33:05.954 I e2e-test/"[sig-cli] Kubectl client Kubectl taint [Serial] should remove all the taints with the same key off a node [Skipped:SingleReplicaTopology] [Suite:openshift/conformance/serial] [Suite:k8s]" started Line 4102502: Apr 18 07:33:05.954 - 5s I e2e-test/"[sig-cli] Kubectl client Kubectl taint [Serial] should remove all the taints with the same key off a node [Skipped:SingleReplicaTopology] [Suite:openshift/conformance/serial] [Suite:k8s]" e2e test finished As "Passed" Line 4102687: Apr 18 07:33:11.902 I e2e-test/"[sig-cli] Kubectl client Kubectl taint [Serial] should remove all the taints with the same key off a node [Skipped:SingleReplicaTopology] [Suite:openshift/conformance/serial] [Suite:k8s]" finishedStatus/Passed Line 4127678: Apr 18 07:59:51.543 I e2e-test/"[sig-cli] oc adm cluster-role-reapers [Serial] [Suite:openshift/conformance/serial]" started Line 4127679: Apr 18 07:59:51.543 - 12s I e2e-test/"[sig-cli] oc adm cluster-role-reapers [Serial] [Suite:openshift/conformance/serial]" e2e test finished As "Passed" Line 4127720: Apr 18 08:00:03.941 I e2e-test/"[sig-cli] oc adm cluster-role-reapers [Serial] [Suite:openshift/conformance/serial]" finishedStatus/Passed |
集群并行执行, 出现错误:
started: (18/2732/2748) "[sig-cli] oc observe works as expected [Suite:openshift/conformance/parallel]" passed: (11.9s) 2022-04-18T07:55:16 "[sig-imageregistry][Feature:ImageInfo] Image info should display information about images [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started: (18/2733/2748) "[sig-auth][Feature:OpenShiftAuthorization] RBAC proxy for openshift authz RunLegacyLocalRoleEndpoint should succeed [Suite:openshift/conformance/parallel]" passed: (13.7s) 2022-04-18T07:55:17 "[sig-apps][Feature:DeploymentConfig] deploymentconfigs with failing hook should get all logs from retried hooks [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" started: (18/2734/2748) "[sig-arch] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel]" [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:61 [BeforeEach] [sig-auth][Feature:OAuthServer] [Token Expiration] github.com/openshift/origin/test/extended/util/client.go:142 STEP: Creating a kubernetes client [BeforeEach] [sig-auth][Feature:OAuthServer] [Token Expiration] github.com/openshift/origin/test/extended/util/client.go:116 Apr 18 15:54:02.819: INFO: configPath is now "/tmp/configfile055747163" Apr 18 15:54:02.820: INFO: The user is now "e2e-test-oauth-expiration-m5ws9-user" Apr 18 15:54:02.820: INFO: Creating project "e2e-test-oauth-expiration-m5ws9" Apr 18 15:54:02.998: INFO: Waiting on permissions in project "e2e-test-oauth-expiration-m5ws9" ... Apr 18 15:54:03.002: INFO: Waiting for ServiceAccount "default" to be provisioned... Apr 18 15:54:03.121: INFO: Waiting for service account "default" secrets (default-dockercfg-svzlh,default-dockercfg-svzlh) to include dockercfg/token ... Apr 18 15:54:03.261: INFO: Waiting for ServiceAccount "deployer" to be provisioned... Apr 18 15:54:03.387: INFO: Waiting for ServiceAccount "builder" to be provisioned... Apr 18 15:54:03.506: INFO: Waiting for RoleBinding "system:image-pullers" to be provisioned... Apr 18 15:54:03.520: INFO: Waiting for RoleBinding "system:image-builders" to be provisioned... Apr 18 15:54:03.780: INFO: Waiting for RoleBinding "system:deployers" to be provisioned... Apr 18 15:54:04.753: INFO: Project "e2e-test-oauth-expiration-m5ws9" has been fully provisioned. [BeforeEach] [sig-auth][Feature:OAuthServer] [Token Expiration] github.com/openshift/origin/test/extended/oauth/expiration.go:30 Apr 18 15:54:04.761: INFO: Running 'oc --namespace=e2e-test-oauth-expiration-m5ws9 --kubeconfig=/root/.kube/config create -f /tmp/fixture-testdata-dir977370604/test/extended/testdata/oauthserver/cabundle-cm.yaml' configmap/service-ca created Apr 18 15:54:05.025: INFO: Created resources defined in cabundle-cm.yaml Apr 18 15:54:05.025: INFO: Running 'oc --namespace=e2e-test-oauth-expiration-m5ws9 --kubeconfig=/root/.kube/config create -f /tmp/fixture-testdata-dir977370604/test/extended/testdata/oauthserver/oauth-sa.yaml' serviceaccount/e2e-oauth created Apr 18 15:54:05.205: INFO: Created resources defined in oauth-sa.yaml Apr 18 15:54:05.205: INFO: Running 'oc --namespace=e2e-test-oauth-expiration-m5ws9 --kubeconfig=/root/.kube/config create -f /tmp/fixture-testdata-dir977370604/test/extended/testdata/oauthserver/oauth-network.yaml' service/test-oauth-svc created route.route.openshift.io/test-oauth-route created Apr 18 15:54:05.497: INFO: Created resources defined in oauth-network.yaml Apr 18 15:54:05.522: INFO: Created: ClusterRoleBinding e2e-test-oauth-expiration-m5ws9 Apr 18 15:54:05.574: INFO: Created: /htpasswd Apr 18 15:54:05.583: INFO: Created: Secret e2e-test-oauth-expiration-m5ws9/session-secret Apr 18 15:54:05.685: INFO: Created: ConfigMap e2e-test-oauth-expiration-m5ws9/oauth-config Apr 18 15:54:05.799: INFO: Created: Pod e2e-test-oauth-expiration-m5ws9/test-oauth-server Apr 18 15:54:05.799: INFO: Waiting for user 'system:serviceaccount:e2e-test-oauth-expiration-m5ws9:e2e-oauth' to be authorized to * the * resource Apr 18 15:54:05.803: INFO: Waiting for the OAuth server pod to be ready Apr 18 15:54:05.826: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) <nil> Apr 18 15:54:06.889: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:,ContainerID:,Started:*false,} } Apr 18 15:54:07.844: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:,ContainerID:,Started:*false,} } Apr 18 15:54:08.849: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,},Running:nil,Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:,ContainerID:,Started:*false,} } Apr 18 15:54:09.841: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:10.863: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:11.895: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:12.833: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:13.845: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:14.852: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:15.875: INFO: OAuth server pod is not ready: Container statuses: ([]v1.ContainerStatus) (len=1 cap=1) { (v1.ContainerStatus) &ContainerStatus{Name:oauth-server,State:ContainerState{Waiting:nil,Running:&ContainerStateRunning{StartedAt:2022-04-18 15:54:08 +0800 CST,},Terminated:nil,},LastTerminationState:ContainerState{Waiting:nil,Running:nil,Terminated:nil,},Ready:false,RestartCount:0,Image:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ImageID:image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596,ContainerID:cri-o://71f41ac0b4072de72128c36923f95adfc0bb05a2647067cc3f97e1245b188b3d,Started:*true,} } Apr 18 15:54:16.875: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:16.878: INFO: Waiting for the OAuth server route to be ready: EOF Apr 18 15:54:17.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:17.881: INFO: Waiting for the OAuth server route to be ready: EOF Apr 18 15:54:18.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:18.882: INFO: Waiting for the OAuth server route to be ready: EOF Apr 18 15:54:19.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:19.884: INFO: Waiting for the OAuth server route to be ready: EOF Apr 18 15:54:20.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:20.901: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:21.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:21.884: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:22.925: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:22.935: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:23.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:23.900: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:24.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:24.892: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:25.886: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:25.902: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:26.887: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:26.901: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:27.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:27.910: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:28.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:28.890: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:29.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:29.893: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:30.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:30.886: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:31.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:31.891: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:32.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:32.884: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:33.883: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:33.906: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:34.881: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:34.894: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:35.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:35.884: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:36.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:36.891: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:37.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:37.887: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:38.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:38.924: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:39.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:39.888: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:40.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:40.884: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:41.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:41.887: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:42.896: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:42.907: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:43.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:43.889: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:44.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:44.890: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:45.884: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:45.898: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:46.885: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:46.928: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:47.883: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:47.910: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:48.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:48.887: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:49.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:49.885: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:50.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:50.905: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:51.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:51.899: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:52.889: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:52.905: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:53.881: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:53.891: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:54.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:54.888: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:55.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:55.883: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:56.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:56.900: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:57.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:57.898: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:58.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:58.888: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:54:59.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:54:59.902: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:00.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:00.889: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:01.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:01.888: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:02.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:02.890: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:03.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:03.883: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:04.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:04.884: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:05.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:05.886: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:06.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:06.886: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:07.881: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:07.888: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:08.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:08.888: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:09.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:09.887: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:10.880: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:10.891: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:11.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:11.893: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:12.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:12.886: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:13.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:13.886: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:14.878: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:14.883: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:15.879: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:15.896: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:16.882: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:16.901: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority Apr 18 15:55:16.901: INFO: Waiting for the OAuth server route to be ready Apr 18 15:55:16.917: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority [AfterEach] [sig-auth][Feature:OAuthServer] [Token Expiration] github.com/openshift/origin/test/extended/util/client.go:140 STEP: Collecting events from namespace "e2e-test-oauth-expiration-m5ws9". STEP: Found 5 events. Apr 18 15:55:16.936: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for test-oauth-server: { } Scheduled: Successfully assigned e2e-test-oauth-expiration-m5ws9/test-oauth-server to master2 Apr 18 15:55:16.936: INFO: At 2022-04-18 15:54:08 +0800 CST - event for test-oauth-server: {multus } AddedInterface: Add eth0 [21.100.3.47/23] from ovn-kubernetes Apr 18 15:55:16.936: INFO: At 2022-04-18 15:54:08 +0800 CST - event for test-oauth-server: {kubelet master2} Pulled: Container image "image.cestc.cn/ccos-ceake/oauth-server@sha256:fca7bab88904f8309e75248f84d07a71769a81bcd9d79cf1b61096086a4c8596" already present on machine Apr 18 15:55:16.937: INFO: At 2022-04-18 15:54:08 +0800 CST - event for test-oauth-server: {kubelet master2} Created: Created container oauth-server Apr 18 15:55:16.937: INFO: At 2022-04-18 15:54:08 +0800 CST - event for test-oauth-server: {kubelet master2} Started: Started container oauth-server Apr 18 15:55:16.968: INFO: POD NODE PHASE GRACE CONDITIONS Apr 18 15:55:16.968: INFO: test-oauth-server master2 Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 15:54:05 +0800 CST } {Ready True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 15:54:15 +0800 CST } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 15:54:15 +0800 CST } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2022-04-18 15:54:05 +0800 CST }] Apr 18 15:55:16.968: INFO: Apr 18 15:55:16.985: INFO: skipping dumping cluster info - cluster too large Apr 18 15:55:17.005: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-oauth-expiration-m5ws9-user}, err: <nil> Apr 18 15:55:17.041: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-oauth-expiration-m5ws9}, err: <nil> Apr 18 15:55:17.064: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens sha256~JlfYVET_sRJmauqiEQnJ9Yh9UxhOg9P4zI_7ffYAka8}, err: <nil> [AfterEach] [sig-auth][Feature:OAuthServer] [Token Expiration] github.com/openshift/origin/test/extended/util/client.go:141 STEP: Destroying namespace "e2e-test-oauth-expiration-m5ws9" for this suite. [AfterEach] [sig-auth][Feature:OAuthServer] [Token Expiration] github.com/openshift/origin/test/extended/oauth/expiration.go:36 Apr 18 15:55:17.082: INFO: Running 'oc --namespace= --kubeconfig=/root/.kube/config delete clusterrolebindings.rbac.authorization.k8s.io e2e-test-oauth-expiration-m5ws9' clusterrolebinding.rbac.authorization.k8s.io "e2e-test-oauth-expiration-m5ws9" deleted fail [github.com/openshift/origin/test/extended/oauth/expiration.go:33]: Unexpected error: <*errors.errorString | 0xc0002feb10>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred failed: (1m16s) 2022-04-18T07:55:17 "[sig-auth][Feature:OAuthServer] [Token Expiration] Using a OAuth client with a non-default token max age to generate tokens that expire shortly works as expected when using a token authorization flow [Suite:openshift/conformance/parallel]" passed: (2m14s) 2022-04-18T07:55:17 "[sig-builds][Feature:Builds][timing] capture build stages and durations should record build stages and durations for docker [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" passed: (2.1s) 2022-04-18T07:55:18 "[sig-auth][Feature:OpenShiftAuthorization] RBAC proxy for openshift authz RunLegacyLocalRoleEndpoint should succeed [Suite:openshift/conformance/parallel]" passed: (2.1s) 2022-04-18T07:55:19 "[sig-arch] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel]" passed: (1m52s) 2022-04-18T07:55:24 "[sig-builds][Feature:Builds] prune builds based on settings in the buildconfig should prune failed builds based on the failedBuildsHistoryLimit setting [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" passed: (2m34s) 2022-04-18T07:55:24 "[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should be able to connect to a service that is idled because a GET on the route will unidle it [Skipped:Disconnected] [Suite:openshift/conformance/parallel/minimal]" passed: (1m53s) 2022-04-18T07:55:25 "[sig-network][Feature:Router] The HAProxy router should serve the correct routes when scoped to a single namespace and label set [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" passed: (2m4s) 2022-04-18T07:55:26 "[sig-apps][Feature:DeploymentConfig] deploymentconfigs keep the deployer pod invariant valid should deal with cancellation of running deployment [Suite:openshift/conformance/parallel]" passed: (1m45s) 2022-04-18T07:55:27 "[sig-network] services when using OpenshiftSDN in a mode that does not isolate namespaces by default should allow connections to pods in different namespaces on different nodes via service IPs [Suite:openshift/conformance/parallel]" [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1453 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:61 [BeforeEach] [sig-cli] oc observe github.com/openshift/origin/test/extended/util/client.go:142 STEP: Creating a kubernetes client [It] works as expected [Suite:openshift/conformance/parallel] github.com/openshift/origin/test/extended/cli/observe.go:17 STEP: basic scenarios Apr 18 15:55:17.116: INFO: Running 'oc --kubeconfig=/root/.kube/config observe' Apr 18 15:55:17.219: INFO: Error running /usr/bin/oc --kubeconfig=/root/.kube/config observe: StdOut> error: you must specify at least one argument containing the resource to observe StdErr> error: you must specify at least one argument containing the resource to observe Apr 18 15:55:17.219: INFO: Running 'oc --kubeconfig=/root/.kube/config observe serviceaccounts --once' Apr 18 15:55:17.446: INFO: Running 'oc --kubeconfig=/root/.kube/config observe daemonsets --once' Apr 18 15:55:17.749: INFO: Running 'oc --kubeconfig=/root/.kube/config observe clusteroperators --once' Apr 18 15:55:17.992: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --once --all-namespaces' Apr 18 15:55:18.198: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --once --all-namespaces --print-metrics-on-exit' Apr 18 15:55:18.465: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --once --names echo' Apr 18 15:55:18.852: INFO: Error running /usr/bin/oc --kubeconfig=/root/.kube/config observe services --once --names echo: StdOut> error: --delete and --names must both be specified StdErr> error: --delete and --names must both be specified Apr 18 15:55:18.852: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --exit-after=1s' Apr 18 15:55:20.116: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --exit-after=3s --all-namespaces --print-metrics-on-exit' Apr 18 15:55:23.258: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --exit-after=3s --all-namespaces --names echo --names default/notfound --delete echo --delete remove' STEP: error counting Apr 18 15:55:26.597: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --maximum-errors=1 -- /bin/sh -c exit 1' Apr 18 15:55:26.780: INFO: Error running /usr/bin/oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --maximum-errors=1 -- /bin/sh -c exit 1: StdOut> # 2022-04-18T15:55:26+08:00 Sync started # 2022-04-18T15:55:26+08:00 Sync 5965 /bin/sh -c "exit 1" openshift-kube-controller-manager kube-controller-manager "" error: command "/bin/sh" exited with status code 1 error: reached maximum error limit of 1, exiting StdErr> # 2022-04-18T15:55:26+08:00 Sync started # 2022-04-18T15:55:26+08:00 Sync 5965 /bin/sh -c "exit 1" openshift-kube-controller-manager kube-controller-manager "" error: command "/bin/sh" exited with status code 1 error: reached maximum error limit of 1, exiting Apr 18 15:55:26.780: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --retry-on-exit-code=2 --maximum-errors=1 --loglevel=4 -- /bin/sh -c exit 2' Apr 18 15:55:27.063: INFO: Error running /usr/bin/oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --retry-on-exit-code=2 --maximum-errors=1 --loglevel=4 -- /bin/sh -c exit 2: StdOut> I0418 15:55:26.970635 2404130 observe.go:438] Listening on :11251 at /metrics and /healthz I0418 15:55:26.970715 2404130 reflector.go:255] Listing and watching <unspecified> from observer # 2022-04-18T15:55:27+08:00 Sync started I0418 15:55:27.030943 2404130 observe.go:648] Processing Sync []: &unstructured.Unstructured{Object:map[string]interface {}{"apiVersion":"v1", "kind":"Service", "metadata":map[string]interface {}{"annotations":map[string]interface {}{"service.alpha.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354", "service.beta.openshift.io/serving-cert-secret-name":"cluster-monitoring-operator-tls", "service.beta.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354"}, "creationTimestamp":"2022-04-15T07:06:18Z", "labels":map[string]interface {}{"app":"cluster-monitoring-operator"}, "managedFields":[]interface {}{map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{".":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-secret-name":map[string]interface {}{}}, "f:labels":map[string]interface {}{".":map[string]interface {}{}, "f:app":map[string]interface {}{}}}, "f:spec":map[string]interface {}{"f:clusterIP":map[string]interface {}{}, "f:internalTrafficPolicy":map[string]interface {}{}, "f:ports":map[string]interface {}{".":map[string]interface {}{}, "k:{"port":8443,"protocol":"TCP"}":map[string]interface {}{".":map[string]interface {}{}, "f:name":map[string]interface {}{}, "f:port":map[string]interface {}{}, "f:protocol":map[string]interface {}{}, "f:targetPort":map[string]interface {}{}}}, "f:selector":map[string]interface {}{}, "f:sessionAffinity":map[string]interface {}{}, "f:type":map[string]interface {}{}}}, "manager":"operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}, map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{"f:service.alpha.openshift.io/serving-cert-signed-by":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-signed-by":map[string]interface {}{}}}}, "manager":"service-ca-operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}}, "name":"cluster-monitoring-operator", "namespace":"openshift-monitoring", "resourceVersion":"6618", "uid":"f65e024e-baf4-4eb6-acce-13b913bcc13a"}, "spec":map[string]interface {}{"clusterIP":"None", "clusterIPs":[]interface {}{"None"}, "internalTrafficPolicy":"Cluster", "ipFamilies":[]interface {}{"IPv4"}, "ipFamilyPolicy":"SingleStack", "ports":[]interface {}{map[string]interface {}{"name":"https", "port":8443, "protocol":"TCP", "targetPort":"https"}}, "selector":map[string]interface {}{"app":"cluster-monitoring-operator"}, "sessionAffinity":"None", "type":"ClusterIP"}, "status":map[string]interface {}{"loadBalancer":map[string]interface {}{}}}} # 2022-04-18T15:55:27+08:00 Sync 6618 /bin/sh -c "exit 2" openshift-monitorer'ring cluster-monitoring-operator "" I0418 15:55:27.046210 2404130 metric.go:86] retrying command: exit status 2 I0418 15:55:27.048102 2404130 metric.go:86] retrying command: exit status 2 error: command "/bin/sh" exited with status code 2 error: reached maximum error limit of 1, exiting StdErr> I0418 15:55:26.970635 2404130 observe.go:438] Listening on :11251 at /metrics and /healthz I0418 15:55:26.970715 2404130 reflector.go:255] Listing and watching <unspecified> from observer # 2022-04-18T15:55:27+08:00 Sync started I0418 15:55:27.030943 2404130 observe.go:648] Processing Sync []: &unstructured.Unstructured{Object:map[string]interface {}{"apiVersion":"v1", "kind":"Service", "metadata":map[string]interface {}{"annotations":map[string]interface {}{"service.alpha.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354", "service.beta.openshift.io/serving-cert-secret-name":"cluster-monitoring-operator-tls", "service.beta.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354"}, "creationTimestamp":"2022-04-15T07:06:18Z", "labels":map[string]interface {}{"app":"cluster-monitoring-operator"}, "managedFields":[]interface {}{map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{".":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-secret-name":map[string]interface {}{}}, "f:labels":map[string]interface {}{".":map[string]interface {}{}, "f:app":map[string]interface {}{}}}, "f:spec":map[string]interface {}{"f:clusterIP":map[string]interface {}{}, "f:internalTrafficPolicy":map[string]interface {}{}, "f:ports":map[string]interface {}{".":map[string]interface {}{}, "k:{"port":8443,"protocol":"TCP"}":map[string]interface {}{".":map[string]interface {}{}, "f:name":map[string]interface {}{}, "f:port":map[string]interface {}{}, "f:protocol":map[string]interface {}{}, "f:targetPort":map[string]interface {}{}}}, "f:selector":map[string]interface {}{}, "f:sessionAffinity":map[string]interface {}{}, "f:type":map[string]interface {}{}}}, "manager":"operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}, map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{"f:service.alpha.openshift.io/serving-cert-signed-by":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-signed-by":map[string]interface {}{}}}}, "manager":"service-ca-operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}}, "name":"cluster-monitoring-operator", "namespace":"openshift-monitoring", "resourceVersion":"6618", "uid":"f65e024e-baf4-4eb6-acce-13b913bcc13a"}, "spec":map[string]interface {}{"clusterIP":"None", "clusterIPs":[]interface {}{"None"}, "internalTrafficPolicy":"Cluster", "ipFamilies":[]interface {}{"IPv4"}, "ipFamilyPolicy":"SingleStack", "ports":[]interface {}{map[string]interface {}{"name":"https", "port":8443, "protocol":"TCP", "targetPort":"https"}}, "selector":map[string]interface {}{"app":"cluster-monitoring-operator"}, "sessionAffinity":"None", "type":"ClusterIP"}, "status":map[string]interface {}{"loadBalancer":map[string]interface {}{}}}} # 2022-04-18T15:55:27+08:00 Sync 6618 /bin/sh -c "exit 2" openshift-monitoring cluster-monitoring-operator "" I0418 15:55:27.046210 2404130 metric.go:86] retrying command: exit status 2 I0418 15:55:27.048102 2404130 metric.go:86] retrying command: exit status 2 error: command "/bin/sh" exited with status code 2 error: reached maximum error limit of 1, exiting STEP: argument templates Apr 18 15:55:27.063: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --once --all-namespaces --template='{ .spec.clusterIP }'' [AfterEach] [sig-cli] oc observe github.com/openshift/origin/test/extended/util/client.go:140 [AfterEach] [sig-cli] oc observe github.com/openshift/origin/test/extended/util/client.go:141 fail [github.com/openshift/origin/test/extended/cli/observe.go:71]: Expected <string>: # 2022-04-18T15:55:27+08:00 Sync started # 2022-04-18T15:55:27+08:00 Sync 5552 "" openshift-cloud-credential-operator cco-metrics '21.101.210.238' # 2022-04-18T15:55:27+08:00 Sync 5990 "" openshift-cluster-node-tuning-operator node-tuning-operator 'None' # 2022-04-18T15:55:27+08:00 Sync 6689 "" openshift-monitoring telemeter-client 'None' # 2022-04-18T15:55:27+08:00 Sync 6335 "" openshift-controller-manager controller-manager '21.101.165.77' # 2022-04-18T15:55:27+08:00 Sync 5805 "" openshift-kube-storage-version-migrator-operator metrics '21.101.130.80' # 2022-04-18T15:55:27+08:00 Sync 8367 "" openshift-operator-lifecycle-manager packageserver-service '21.101.111.75' # 2022-04-18T15:55:27+08:00 Sync 1492699 "" e2e-test-router-metrics-qmgwc weightedendpoints1 '21.101.59.216' # 2022-04-18T15:55:27+08:00 Sync 17109 "" openshift-monitoring grafana '21.101.251.250' # 2022-04-18T15:55:27+08:00 Sync 2850 "" openshift-network-diagnostics network-check-target '21.101.31.159' # 2022-04-18T15:55:27+08:00 Sync 1494394 "" e2e-test-oauth-ldap-idp-d7xm7 openldap-server '21.101.254.6' # 2022-04-18T15:55:27+08:00 Sync 6898 "" openshift-dns dns-default '21.101.0.10' # 2022-04-18T15:55:27+08:00 Sync 5646 "" openshift-kube-apiserver apiserver '21.101.39.158' # 2022-04-18T15:55:27+08:00 Sync 7143 "" openshift-marketplace redhat-marketplace '21.101.43.120' # 2022-04-18T15:55:27+08:00 Sync 17111 "" openshift-monitoring prometheus-k8s '21.101.250.115' # 2022-04-18T15:55:27+08:00 Sync 10831 "" openshift-cluster-samples-operator metrics 'None' # 2022-04-18T15:55:27+08:00 Sync 5770 "" openshift-cluster-storage-operator csi-snapshot-webhook '21.101.160.86' # 2022-04-18T15:55:27+08:00 Sync 7410 "" openshift-ingress router-internal-default '21.101.79.225' # 2022-04-18T15:55:27+08:00 Sync 5896 "" openshift-machine-api machine-api-operator-webhook '21.101.7.142' # 2022-04-18T15:55:27+08:00 Sync 6583 "" openshift-monitoring node-exporter 'None' # 2022-04-18T15:55:27+08:00 Sync 1489298 "" e2e-test-weighted-router-m87kt weightedendpoints2 '21.101.32.147' # 2022-04-18T15:55:27+08:00 Sync 6101 "" openshift-kube-scheduler scheduler '21.101.80.179' # 2022-04-18T15:55:27+08:00 Sync 6353 "" openshift-apiserver api '21.101.180.217' # 2022-04-18T15:55:27+08:00 Sync 5921 "" openshift-ovn-kubernetes ovn-kubernetes-master 'None' # 2022-04-18T15:55:27+08:00 Sync 7179 "" openshift-marketplace certified-operators '21.101.85.127' # 2022-04-18T15:55:27+08:00 Sync 17107 "" openshift-monitoring alertmanager-main '21.101.130.77' # 2022-04-18T15:55:27+08:00 Sync 6369 "" openshift-authentication oauth-openshift '21.101.168.81' # 2022-04-18T15:55:27+08:00 Sync 5912 "" openshift-machine-api cluster-baremetal-webhook-service '21.101.183.123' # 2022-04-18T15:55:27+08:00 Sync 5832 "" openshift-etcd-operator metrics '21.101.215.223' # 2022-04-18T15:55:27+08:00 Sync 1492941 "" e2e-test-router-idling-4nfzg idle-test '21.101.53.198' # 2022-04-18T15:55:27+08:00 Sync 2827 "" openshift-network-diagnostics network-check-source 'None' # 2022-04-18T15:55:27+08:00 Sync 6872 "" default openshift '' # 2022-04-18T15:55:27+08:00 Sync 5111 "" openshift-apiserver check-endpoints '21.101.247.68' # 2022-04-18T15:55:27+08:00 Sync 7149 "" openshift-marketplace redhat-operators '21.101.88.107' # 2022-04-18T15:55:27+08:00 Sync 6223 "" openshift-config-operator metrics '21.101.67.58' # 2022-04-18T15:55:27+08:00 Sync 6609 "" openshift-monitoring openshift-state-metrics 'None' # 2022-04-18T15:55:27+08:00 Sync 1487733 "" e2e-test-build-service-xdfn4 hello-nodejs '21.101.86.134' # 2022-04-18T15:55:27+08:00 Sync 19815 "" openshift-console console '21.101.75.225' # 2022-04-18T15:55:27+08:00 Sync 7206 "" openshift-marketplace community-operators '21.101.74.27' # 2022-04-18T15:55:27+08:00 Sync 5517 "" openshift-multus network-metrics-service 'None' # 2022-04-18T15:55:27+08:00 Sync 57687 "" openshift-image-registry image-registry '21.101.207.123' # 2022-04-18T15:55:27+08:00 Sync 5904 "" openshift-image-registry image-registry-operator 'None' # 2022-04-18T15:55:27+08:00 Sync 1489292 "" e2e-test-weighted-router-m87kt weightedendpoints1 '21.101.157.148' # 2022-04-18T15:55:27+08:00 Sync 6324 "" openshift-multus multus-admission-controller '21.101.213.142' # 2022-04-18T15:55:27+08:00 Sync 1487005 "" e2e-test-router-scoped-drhjv endpoints '21.101.29.41' # 2022-04-18T15:55:27+08:00 Sync 6311 "" openshift-operator-lifecycle-manager olm-operator-metrics '21.101.89.234' # 2022-04-18T15:55:27+08:00 Sync 5705 "" openshift-apiserver-operator metrics '21.101.194.87' # 2022-04-18T15:55:27+08:00 Sync 6269 "" openshift-authentication-operator metrics '21.101.3.139' # 2022-04-18T15:55:27+08:00 Sync 6606 "" openshift-monitoring thanos-querier '21.101.253.65' # 2022-04-18T15:55:27+08:00 Sync 6242 "" openshift-cluster-storage-operator cluster-storage-operator-metrics '21.101.98.33' # 2022-04-18T15:55:27+08:00 Sync 5513 "" openshift-insights metrics '21.101.221.77' # 2022-04-18T15:55:27+08:00 Sync 6374 "" openshift-oauth-apiserver api '21.101.172.209' # 2022-04-18T15:55:27+08:00 Sync 5823 "" openshift-service-ca-operator metrics '21.101.210.7' # 2022-04-18T15:55:27+08:00 Sync 1493134 "" e2e-test-oauth-server-headers-psnwf test-oauth-svc '21.101.194.196' # 2022-04-18T15:55:27+08:00 Sync 5797 "" openshift-machine-api machine-api-operator '21.101.123.100' # 2022-04-18T15:55:27+08:00 Sync 18058 "" openshift-monitoring alertmanager-operated 'None' # 2022-04-18T15:55:27+08:00 Sync 5525 "" openshift-marketplace marketplace-operator-metrics '21.101.223.158' # 2022-04-18T15:55:27+08:00 Sync 5690 "" openshift-monitoring prometheus-operator 'None' # 2022-04-18T15:55:27+08:00 Sync 6086 "" openshift-operator-lifecycle-manager catalog-operator-metrics '21.101.175.1' # 2022-04-18T15:55:27+08:00 Sync 6111 "" openshift-cluster-version cluster-version-operator '21.101.214.70' # 2022-04-18T15:55:27+08:00 Sync 6304 "" openshift-controller-manager-operator metrics '21.101.130.66' # 2022-04-18T15:55:27+08:00 Sync 5983 "" openshift-etcd etcd '21.101.131.158' # 2022-04-18T15:55:27+08:00 Sync 6260 "" openshift-machine-api cluster-autoscaler-operator '21.101.90.14' # 2022-04-18T15:55:27+08:00 Sync 5569 "" openshift-kube-apiserver-operator metrics '21.101.69.58' # 2022-04-18T15:55:27+08:00 Sync 19724 "" openshift-console-operator metrics '21.101.236.94' # 2022-04-18T15:55:27+08:00 Sync 17104 "" openshift-monitoring prometheus-k8s-thanos-sidecar 'None' # 2022-04-18T15:55:27+08:00 Sync 5878 "" openshift-ingress-operator metrics '21.101.54.233' # 2022-04-18T15:55:27+08:00 Sync 6206 "" openshift-kube-controller-manager-operator metrics '21.101.95.175' # 2022-04-18T15:55:27+08:00 Sync 6566 "" openshift-monitoring kube-state-metrics 'None' # 2022-04-18T15:55:27+08:00 Sync 231 "" default kubernetes '21.101.0.1' # 2022-04-18T15:55:27+08:00 Sync 5522 "" openshift-machine-config-operator machine-config-daemon '21.101.114.149' # 2022-04-18T15:55:27+08:00 Sync 1492703 "" e2e-test-router-metrics-qmgwc weightedendpoints2 '21.101.119.199' # 2022-04-18T15:55:27+08:00 Sync 6016 "" openshift-dns-operator metrics '21.101.105.34' # 2022-04-18T15:55:27+08:00 Sync 6618 "" openshift-monitoring cluster-monitoring-operator 'None' # 2022-04-18T15:55:27+08:00 Sync 1487683 "" e2e-test-unprivileged-router-w5lst endpoints '21.101.132.106' # 2022-04-18T15:55:27+08:00 Sync 6076 "" openshift-machine-api cluster-baremetal-operator-service '21.101.167.114' # 2022-04-18T15:55:27+08:00 Sync 6001 "" openshift-kube-scheduler-operator metrics '21.101.154.246' # 2022-04-18T15:55:27+08:00 Sync 5965 "" openshift-kube-controller-manager kube-controller-manager '21.101.205.32' # 2022-04-18T15:55:27+08:00 Sync 5715 "" openshift-cluster-machine-approver machine-approver 'None' # 2022-04-18T15:55:27+08:00 Sync 6680 "" openshift-monitoring prometheus-adapter '21.101.129.31' # 2022-04-18T15:55:27+08:00 Sync 6047 "" kube-system kubelet 'None' # 2022-04-18T15:55:27+08:00 Sync 18011 "" openshift-monitoring prometheus-operated 'None' # 2022-04-18T15:55:27+08:00 Sync 2696 "" openshift-ovn-kubernetes ovnkube-db 'None' # 2022-04-18T15:55:27+08:00 Sync 165547 "" e2e-statefulset-5018 test 'None' # 2022-04-18T15:55:27+08:00 Sync 15564 "" openshift-ingress-canary ingress-canary '21.101.240.32' # 2022-04-18T15:55:27+08:00 Sync 6095 "" openshift-ovn-kubernetes ovn-kubernetes-node 'None' # 2022-04-18T15:55:27+08:00 Sync 5681 "" openshift-cluster-storage-operator csi-snapshot-controller-operator-metrics '21.101.55.254' # 2022-04-18T15:55:27+08:00 Sync 5670 "" openshift-machine-api machine-api-controllers '21.101.33.96' # 2022-04-18T15:55:27+08:00 Sync 19835 "" openshift-console downloads '21.101.153.77' # 2022-04-18T15:55:27+08:00 Sync ended To satisfy at least one of these matchers: [%!s(*matchers.ContainSubstringMatcher=&{172.30.0.1 []}) %!s(*matchers.ContainSubstringMatcher=&{fd02::1 []})] failed: (11.3s) 2022-04-18T07:55:27 "[sig-cli] oc observe works as expected [Suite:openshift/conformance/parallel]" |
Apr 18 15:55:26.780: INFO: Running 'oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --retry-on-exit-code=2 --maximum-errors=1 --loglevel=4 -- /bin/sh -c exit 2' Apr 18 15:55:27.063: INFO: Error running /usr/bin/oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --retry-on-exit-code=2 --maximum-errors=1 --loglevel=4 -- /bin/sh -c exit 2: StdOut> I0418 15:55:26.970635 2404130 observe.go:438] Listening on :11251 at /metrics and /healthz I0418 15:55:26.970715 2404130 reflector.go:255] Listing and watching <unspecified> from observer # 2022-04-18T15:55:27+08:00 Sync started I0418 15:55:27.030943 2404130 observe.go:648] Processing Sync []: &unstructured.Unstructured{Object:map[string]interface {}{"apiVersion":"v1", "kind":"Service", "metadata":map[string]interface {}{"annotations":map[string]interface {}{"service.alpha.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354", "service.beta.openshift.io/serving-cert-secret-name":"cluster-monitoring-operator-tls", "service.beta.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354"}, "creationTimestamp":"2022-04-15T07:06:18Z", "labels":map[string]interface {}{"app":"cluster-monitoring-operator"}, "managedFields":[]interface {}{map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{".":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-secret-name":map[string]interface {}{}}, "f:labels":map[string]interface {}{".":map[string]interface {}{}, "f:app":map[string]interface {}{}}}, "f:spec":map[string]interface {}{"f:clusterIP":map[string]interface {}{}, "f:internalTrafficPolicy":map[string]interface {}{}, "f:ports":map[string]interface {}{".":map[string]interface {}{}, "k:{"port":8443,"protocol":"TCP"}":map[string]interface {}{".":map[string]interface {}{}, "f:name":map[string]interface {}{}, "f:port":map[string]interface {}{}, "f:protocol":map[string]interface {}{}, "f:targetPort":map[string]interface {}{}}}, "f:selector":map[string]interface {}{}, "f:sessionAffinity":map[string]interface {}{}, "f:type":map[string]interface {}{}}}, "manager":"operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}, map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{"f:service.alpha.openshift.io/serving-cert-signed-by":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-signed-by":map[string]interface {}{}}}}, "manager":"service-ca-operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}}, "name":"cluster-monitoring-operator", "namespace":"openshift-monitoring", "resourceVersion":"6618", "uid":"f65e024e-baf4-4eb6-acce-13b913bcc13a"}, "spec":map[string]interface {}{"clusterIP":"None", "clusterIPs":[]interface {}{"None"}, "internalTrafficPolicy":"Cluster", "ipFamilies":[]interface {}{"IPv4"}, "ipFamilyPolicy":"SingleStack", "ports":[]interface {}{map[string]interface {}{"name":"https", "port":8443, "protocol":"TCP", "targetPort":"https"}}, "selector":map[string]interface {}{"app":"cluster-monitoring-operator"}, "sessionAffinity":"None", "type":"ClusterIP"}, "status":map[string]interface {}{"loadBalancer":map[string]interface {}{}}}} # 2022-04-18T15:55:27+08:00 Sync 6618 /bin/sh -c "exit 2" openshift-monitoring cluster-monitoring-operator "" I0418 15:55:27.046210 2404130 metric.go:86] retrying command: exit status 2 I0418 15:55:27.048102 2404130 metric.go:86] retrying command: exit status 2 error: command "/bin/sh" exited with status code 2 error: reached maximum error limit of 1, exiting StdErr> I0418 15:55:26.970635 2404130 observe.go:438] Listening on :11251 at /metrics and /healthz I0418 15:55:26.970715 2404130 reflector.go:255] Listing and watching <unspecified> from observer # 2022-04-18T15:55:27+08:00 Sync started I0418 15:55:27.030943 2404130 observe.go:648] Processing Sync []: &unstructured.Unstructured{Object:map[string]interface {}{"apiVersion":"v1", "kind":"Service", "metadata":map[string]interface {}{"annotations":map[string]interface {}{"service.alpha.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354", "service.beta.openshift.io/serving-cert-secret-name":"cluster-monitoring-operator-tls", "service.beta.openshift.io/serving-cert-signed-by":"openshift-service-serving-signer@1650006354"}, "creationTimestamp":"2022-04-15T07:06:18Z", "labels":map[string]interface {}{"app":"cluster-monitoring-operator"}, "managedFields":[]interface {}{map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{".":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-secret-name":map[string]interface {}{}}, "f:labels":map[string]interface {}{".":map[string]interface {}{}, "f:app":map[string]interface {}{}}}, "f:spec":map[string]interface {}{"f:clusterIP":map[string]interface {}{}, "f:internalTrafficPolicy":map[string]interface {}{}, "f:ports":map[string]interface {}{".":map[string]interface {}{}, "k:{"port":8443,"protocol":"TCP"}":map[string]interface {}{".":map[string]interface {}{}, "f:name":map[string]interface {}{}, "f:port":map[string]interface {}{}, "f:protocol":map[string]interface {}{}, "f:targetPort":map[string]interface {}{}}}, "f:selector":map[string]interface {}{}, "f:sessionAffinity":map[string]interface {}{}, "f:type":map[string]interface {}{}}}, "manager":"operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}, map[string]interface {}{"apiVersion":"v1", "fieldsType":"FieldsV1", "fieldsV1":map[string]interface {}{"f:metadata":map[string]interface {}{"f:annotations":map[string]interface {}{"f:service.alpha.openshift.io/serving-cert-signed-by":map[string]interface {}{}, "f:service.beta.openshift.io/serving-cert-signed-by":map[string]interface {}{}}}}, "manager":"service-ca-operator", "operation":"Update", "time":"2022-04-15T07:06:18Z"}}, "name":"cluster-monitoring-operator", "namespace":"openshift-monitoring", "resourceVersion":"6618", "uid":"f65e024e-baf4-4eb6-acce-13b913bcc13a"}, "spec":map[string]interface {}{"clusterIP":"None", "clusterIPs":[]interface {}{"None"}, "internalTrafficPolicy":"Cluster", "ipFamilies":[]interface {}{"IPv4"}, "ipFamilyPolicy":"SingleStack", "ports":[]interface {}{map[string]interface {}{"name":"https", "port":8443, "protocol":"TCP", "targetPort":"https"}}, "selector":map[string]interface {}{"app":"cluster-monitoring-operator"}, "sessionAffinity":"None", "type":"ClusterIP"}, "status":map[string]interface {}{"loadBalancer":map[string]interface {}{}}}} # 2022-04-18T15:55:27+08:00 Sync 6618 /bin/sh -c "exit 2" openshift-monitoring cluster-monitoring-operator "" I0418 15:55:27.046210 2404130 metric.go:86] retrying command: exit status 2 I0418 15:55:27.048102 2404130 metric.go:86] retrying command: exit status 2 error: command "/bin/sh" exited with status code 2 error: reached maximum error limit of 1, exiting |
[It] works as expected [Suite:openshift/conformance/parallel] github.com/openshift/origin/test/extended/cli/observe.go:17 STEP: basic scenarios Apr 18 15:55:17.116: INFO: Running 'oc --kubeconfig=/root/.kube/config observe' Apr 18 15:55:17.219: INFO: Error running /usr/bin/oc --kubeconfig=/root/.kube/config observe: StdOut> error: you must specify at least one argument containing the resource to observe StdErr> error: you must specify at least one argument containing the resource to observe |
问题分析:
github.com/openshift/origin/test/extended/cli/observe.go:17
var _ = g.Describe("[sig-cli] oc observe", func() { defer g.GinkgoRecover() oc := exutil.NewCLIWithoutNamespace("oc-observe").AsAdmin() g.It("works as expected", func() { g.By("Find out the clusterIP of the kubernetes.default service") kubernetesSVC, err := oc.AdminKubeClient().CoreV1().Services("default").Get(context.Background(), "kubernetes", metav1.GetOptions{}) o.Expect(err).NotTo(o.HaveOccurred()) g.By("basic scenarios") out, err := oc.Run("observe").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("you must specify at least one argument containing the resource to observe")) out, err = oc.Run("observe").Args("serviceaccounts", "--once").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.Or(o.ContainSubstring("Sync ended"), o.ContainSubstring("Nothing to sync"))) out, err = oc.Run("observe").Args("daemonsets", "--once").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.Or(o.ContainSubstring("Sync ended"), o.ContainSubstring("Nothing to sync, exiting immediately"))) out, err = oc.Run("observe").Args("clusteroperators", "--once").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("kube-apiserver")) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("default kubernetes")) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--print-metrics-on-exit").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring(`observe_counts{type="Sync"}`)) out, err = oc.Run("observe").Args("services", "--once", "--names", "echo").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("--delete and --names must both be specified")) out, err = oc.Run("observe").Args("services", "--exit-after=1s").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("Shutting down after 1s ...")) out, err = oc.Run("observe").Args("services", "--exit-after=3s", "--all-namespaces", "--print-metrics-on-exit").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring(`observe_counts{type="Sync"}`)) out, err = oc.Run("observe").Args("services", "--exit-after=3s", "--all-namespaces", "--names", "echo", "--names", "default/notfound", "--delete", "echo", "--delete", "remove").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("remove default notfound")) g.By("error counting") out, err = oc.Run("observe").Args("services", "--exit-after=1m", "--all-namespaces", "--maximum-errors=1", "--", "/bin/sh", "-c", "exit 1").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("reached maximum error limit of 1, exiting")) out, err = oc.Run("observe").Args("services", "--exit-after=1m", "--all-namespaces", "--retry-on-exit-code=2", "--maximum-errors=1", "--loglevel=4", "--", "/bin/sh", "-c", "exit 2").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("retrying command: exit status 2")) g.By("argument templates") out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--template='{ .spec.clusterIP }'").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.Or(o.ContainSubstring(kubernetesSVC.Spec.ClusterIP), o.ContainSubstring("fd02::1"))) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--template='{{ .spec.clusterIP }}'", "--output=go-template").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.Or(o.ContainSubstring(kubernetesSVC.Spec.ClusterIP), o.ContainSubstring("fd02::1"))) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--template='bad{ .missingkey }key'").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("badkey")) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--template='bad{ .missingkey }key'", "--allow-missing-template-keys=false").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("missingkey is not found")) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--template='{{ .unknown }}'", "--output=go-template").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("default kubernetes")) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", `--template='bad{{ or (.unknown) "" }}key'`, "--output=go-template").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("badkey")) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--template='bad{{ .unknown }}key'", "--output=go-template", "--allow-missing-template-keys=false").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("map has no entry for key")) g.By("event environment variables") o.Expect(os.Setenv("MYENV", "should_be_passed")).NotTo(o.HaveOccurred()) out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--type-env-var=EVENT", "--", "/bin/sh", "-c", "echo $EVENT $MYENV").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("Sync should_be_passed")) o.Expect(os.Unsetenv("MYENV")).NotTo(o.HaveOccurred()) }) }) |
// NewCLIWithoutNamespace initializes the CLI and Kube framework helpers // without a namespace. Should be called outside of a Ginkgo .It() // function. Use SetupProject() to create a project for this namespace. func NewCLIWithoutNamespace(project string) *CLI { cli := &CLI{ kubeFramework: &framework.Framework{ SkipNamespaceCreation: true, BaseName: project, AddonResourceConstraints: make(map[string]framework.ResourceConstraint), Options: framework.Options{ ClientQPS: 20, ClientBurst: 50, }, Timeouts: framework.NewTimeoutContextWithDefaults(), }, username: "admin", execPath: "oc", adminConfigPath: KubeConfigPath(), withoutNamespace: true, } g.AfterEach(cli.TeardownProject) g.AfterEach(cli.kubeFramework.AfterEach) g.BeforeEach(cli.kubeFramework.BeforeEach) return cli } |
具体出问题的命令:
/usr/bin/oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --maximum-errors=1 -- /bin/sh -c exit 1 |
/bin/sh -c "exit 1" openshift-kube-controller-manager kube-controller-manager "" |
在执行以下命令时候存在耗时:
# 2022-04-18T19:49:23+08:00 Sync 10831 /bin/sh -c exit 1 openshift-cluster-samples-operator metrics "" |
[root@master0 zsl]# time /usr/bin/oc --kubeconfig=/root/.kube/config observe services --exit-after=1m --all-namespaces --maximum-errors=1 -- /bin/sh -c exit 1 # 2022-04-18T19:49:23+08:00 Sync started # 2022-04-18T19:49:23+08:00 Sync 6016 /bin/sh -c exit 1 openshift-dns-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 5522 /bin/sh -c exit 1 openshift-machine-config-operator machine-config-daemon "" # 2022-04-18T19:49:23+08:00 Sync 6374 /bin/sh -c exit 1 openshift-oauth-apiserver api "" # 2022-04-18T19:49:23+08:00 Sync 5111 /bin/sh -c exit 1 openshift-apiserver check-endpoints "" # 2022-04-18T19:49:23+08:00 Sync 1676541 /bin/sh -c exit 1 e2e-test-htpasswd-idp-fnfqp test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 15564 /bin/sh -c exit 1 openshift-ingress-canary ingress-canary "" # 2022-04-18T19:49:23+08:00 Sync 5805 /bin/sh -c exit 1 openshift-kube-storage-version-migrator-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 7206 /bin/sh -c exit 1 openshift-marketplace community-operators "" # 2022-04-18T19:49:23+08:00 Sync 2850 /bin/sh -c exit 1 openshift-network-diagnostics network-check-target "" # 2022-04-18T19:49:23+08:00 Sync 5670 /bin/sh -c exit 1 openshift-machine-api machine-api-controllers "" # 2022-04-18T19:49:23+08:00 Sync 5797 /bin/sh -c exit 1 openshift-machine-api machine-api-operator "" # 2022-04-18T19:49:23+08:00 Sync 6618 /bin/sh -c exit 1 openshift-monitoring cluster-monitoring-operator "" # 2022-04-18T19:49:23+08:00 Sync 5770 /bin/sh -c exit 1 openshift-cluster-storage-operator csi-snapshot-webhook "" # 2022-04-18T19:49:23+08:00 Sync 6680 /bin/sh -c exit 1 openshift-monitoring prometheus-adapter "" # 2022-04-18T19:49:23+08:00 Sync 6242 /bin/sh -c exit 1 openshift-cluster-storage-operator cluster-storage-operator-metrics "" # 2022-04-18T19:49:23+08:00 Sync 6206 /bin/sh -c exit 1 openshift-kube-controller-manager-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 6606 /bin/sh -c exit 1 openshift-monitoring thanos-querier "" # 2022-04-18T19:49:23+08:00 Sync 1668667 /bin/sh -c exit 1 e2e-test-build-service-6sk4h hello-nodejs "" # 2022-04-18T19:49:23+08:00 Sync 5705 /bin/sh -c exit 1 openshift-apiserver-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 6369 /bin/sh -c exit 1 openshift-authentication oauth-openshift "" # 2022-04-18T19:49:23+08:00 Sync 5904 /bin/sh -c exit 1 openshift-image-registry image-registry-operator "" # 2022-04-18T19:49:23+08:00 Sync 8367 /bin/sh -c exit 1 openshift-operator-lifecycle-manager packageserver-service "" # 2022-04-18T19:49:23+08:00 Sync 6311 /bin/sh -c exit 1 openshift-operator-lifecycle-manager olm-operator-metrics "" # 2022-04-18T19:49:23+08:00 Sync 6304 /bin/sh -c exit 1 openshift-controller-manager-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 6101 /bin/sh -c exit 1 openshift-kube-scheduler scheduler "" # 2022-04-18T19:49:23+08:00 Sync 6689 /bin/sh -c exit 1 openshift-monitoring telemeter-client "" # 2022-04-18T19:49:23+08:00 Sync 6324 /bin/sh -c exit 1 openshift-multus multus-admission-controller "" # 2022-04-18T19:49:23+08:00 Sync 6086 /bin/sh -c exit 1 openshift-operator-lifecycle-manager catalog-operator-metrics "" # 2022-04-18T19:49:23+08:00 Sync 165547 /bin/sh -c exit 1 e2e-statefulset-5018 test "" # 2022-04-18T19:49:23+08:00 Sync 6095 /bin/sh -c exit 1 openshift-ovn-kubernetes ovn-kubernetes-node "" # 2022-04-18T19:49:23+08:00 Sync 1672696 /bin/sh -c exit 1 e2e-test-oauth-server-headers-xvj6k test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 6223 /bin/sh -c exit 1 openshift-config-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 1672264 /bin/sh -c exit 1 e2e-test-router-scoped-b67q2 endpoints "" # 2022-04-18T19:49:23+08:00 Sync 1673608 /bin/sh -c exit 1 e2e-test-router-scoped-ns2x9 endpoints "" # 2022-04-18T19:49:23+08:00 Sync 7149 /bin/sh -c exit 1 openshift-marketplace redhat-operators "" # 2022-04-18T19:49:23+08:00 Sync 17111 /bin/sh -c exit 1 openshift-monitoring prometheus-k8s "" # 2022-04-18T19:49:23+08:00 Sync 231 /bin/sh -c exit 1 default kubernetes "" # 2022-04-18T19:49:23+08:00 Sync 5990 /bin/sh -c exit 1 openshift-cluster-node-tuning-operator node-tuning-operator "" # 2022-04-18T19:49:23+08:00 Sync 18058 /bin/sh -c exit 1 openshift-monitoring alertmanager-operated "" # 2022-04-18T19:49:23+08:00 Sync 6111 /bin/sh -c exit 1 openshift-cluster-version cluster-version-operator "" # 2022-04-18T19:49:23+08:00 Sync 17104 /bin/sh -c exit 1 openshift-monitoring prometheus-k8s-thanos-sidecar "" # 2022-04-18T19:49:23+08:00 Sync 5681 /bin/sh -c exit 1 openshift-cluster-storage-operator csi-snapshot-controller-operator-metrics "" # 2022-04-18T19:49:23+08:00 Sync 6609 /bin/sh -c exit 1 openshift-monitoring openshift-state-metrics "" # 2022-04-18T19:49:23+08:00 Sync 18011 /bin/sh -c exit 1 openshift-monitoring prometheus-operated "" # 2022-04-18T19:49:23+08:00 Sync 1679606 /bin/sh -c exit 1 e2e-test-cli-idling-rwmgr idling-echo "" # 2022-04-18T19:49:23+08:00 Sync 5983 /bin/sh -c exit 1 openshift-etcd etcd "" # 2022-04-18T19:49:23+08:00 Sync 5690 /bin/sh -c exit 1 openshift-monitoring prometheus-operator "" # 2022-04-18T19:49:23+08:00 Sync 1675290 /bin/sh -c exit 1 e2e-net-services1-3818 service-m62z8 "" # 2022-04-18T19:49:23+08:00 Sync 1677803 /bin/sh -c exit 1 e2e-test-router-idling-97smp idle-test "" # 2022-04-18T19:49:23+08:00 Sync 5878 /bin/sh -c exit 1 openshift-ingress-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 5513 /bin/sh -c exit 1 openshift-insights metrics "" # 2022-04-18T19:49:23+08:00 Sync 1675143 /bin/sh -c exit 1 e2e-test-router-scoped-2rq7v endpoints "" # 2022-04-18T19:49:23+08:00 Sync 1678652 /bin/sh -c exit 1 e2e-test-oauth-server-headers-4lr8f test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 5896 /bin/sh -c exit 1 openshift-machine-api machine-api-operator-webhook "" # 2022-04-18T19:49:23+08:00 Sync 19724 /bin/sh -c exit 1 openshift-console-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 57687 /bin/sh -c exit 1 openshift-image-registry image-registry "" # 2022-04-18T19:49:23+08:00 Sync 5715 /bin/sh -c exit 1 openshift-cluster-machine-approver machine-approver "" # 2022-04-18T19:49:23+08:00 Sync 6898 /bin/sh -c exit 1 openshift-dns dns-default "" # 2022-04-18T19:49:23+08:00 Sync 5912 /bin/sh -c exit 1 openshift-machine-api cluster-baremetal-webhook-service "" # 2022-04-18T19:49:23+08:00 Sync 6566 /bin/sh -c exit 1 openshift-monitoring kube-state-metrics "" # 2022-04-18T19:49:23+08:00 Sync 5569 /bin/sh -c exit 1 openshift-kube-apiserver-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 6260 /bin/sh -c exit 1 openshift-machine-api cluster-autoscaler-operator "" # 2022-04-18T19:49:23+08:00 Sync 7143 /bin/sh -c exit 1 openshift-marketplace redhat-marketplace "" # 2022-04-18T19:49:23+08:00 Sync 6583 /bin/sh -c exit 1 openshift-monitoring node-exporter "" # 2022-04-18T19:49:23+08:00 Sync 19815 /bin/sh -c exit 1 openshift-console console "" # 2022-04-18T19:49:23+08:00 Sync 6335 /bin/sh -c exit 1 openshift-controller-manager controller-manager "" # 2022-04-18T19:49:23+08:00 Sync 5832 /bin/sh -c exit 1 openshift-etcd-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 6001 /bin/sh -c exit 1 openshift-kube-scheduler-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 6353 /bin/sh -c exit 1 openshift-apiserver api "" # 2022-04-18T19:49:23+08:00 Sync 2696 /bin/sh -c exit 1 openshift-ovn-kubernetes ovnkube-db "" # 2022-04-18T19:49:23+08:00 Sync 5552 /bin/sh -c exit 1 openshift-cloud-credential-operator cco-metrics "" # 2022-04-18T19:49:23+08:00 Sync 5525 /bin/sh -c exit 1 openshift-marketplace marketplace-operator-metrics "" # 2022-04-18T19:49:23+08:00 Sync 5646 /bin/sh -c exit 1 openshift-kube-apiserver apiserver "" # 2022-04-18T19:49:23+08:00 Sync 6076 /bin/sh -c exit 1 openshift-machine-api cluster-baremetal-operator-service "" # 2022-04-18T19:49:23+08:00 Sync 5921 /bin/sh -c exit 1 openshift-ovn-kubernetes ovn-kubernetes-master "" # 2022-04-18T19:49:23+08:00 Sync 17107 /bin/sh -c exit 1 openshift-monitoring alertmanager-main "" # 2022-04-18T19:49:23+08:00 Sync 1679455 /bin/sh -c exit 1 e2e-test-oauth-server-headers-nvhcd test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 5823 /bin/sh -c exit 1 openshift-service-ca-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 1679526 /bin/sh -c exit 1 e2e-test-new-app-bqgn5 a234567890123456789012345678901234567890123456789012345678 "" # 2022-04-18T19:49:23+08:00 Sync 6047 /bin/sh -c exit 1 kube-system kubelet "" # 2022-04-18T19:49:23+08:00 Sync 19835 /bin/sh -c exit 1 openshift-console downloads "" # 2022-04-18T19:49:23+08:00 Sync 7410 /bin/sh -c exit 1 openshift-ingress router-internal-default "" # 2022-04-18T19:49:23+08:00 Sync 5965 /bin/sh -c exit 1 openshift-kube-controller-manager kube-controller-manager "" # 2022-04-18T19:49:23+08:00 Sync 1676035 /bin/sh -c exit 1 e2e-test-oauth-ldap-idp-pvm4z test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 1668918 /bin/sh -c exit 1 e2e-test-oauth-ldap-idp-pvm4z openldap-server "" # 2022-04-18T19:49:23+08:00 Sync 1678095 /bin/sh -c exit 1 e2e-test-oauth-expiration-vlms9 test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 6872 /bin/sh -c exit 1 default openshift "" # 2022-04-18T19:49:23+08:00 Sync 6269 /bin/sh -c exit 1 openshift-authentication-operator metrics "" # 2022-04-18T19:49:23+08:00 Sync 7179 /bin/sh -c exit 1 openshift-marketplace certified-operators "" # 2022-04-18T19:49:23+08:00 Sync 17109 /bin/sh -c exit 1 openshift-monitoring grafana "" # 2022-04-18T19:49:23+08:00 Sync 5517 /bin/sh -c exit 1 openshift-multus network-metrics-service "" # 2022-04-18T19:49:23+08:00 Sync 1678699 /bin/sh -c exit 1 e2e-test-oauth-server-headers-cb6ws test-oauth-svc "" # 2022-04-18T19:49:23+08:00 Sync 2827 /bin/sh -c exit 1 openshift-network-diagnostics network-check-source "" # 2022-04-18T19:49:23+08:00 Sync 10831 /bin/sh -c exit 1 openshift-cluster-samples-operator metrics "" Shutting down after 1m0s ... real 1m0.067s user 0m0.174s sys 0m0.099s |
out, err = oc.Run("observe").Args("services", "--once", "--all-namespaces", "--print-metrics-on-exit").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring(`observe_counts{type="Sync"}`)) out, err = oc.Run("observe").Args("services", "--once", "--names", "echo").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("--delete and --names must both be specified")) out, err = oc.Run("observe").Args("services", "--exit-after=1s").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("Shutting down after 1s ...")) out, err = oc.Run("observe").Args("services", "--exit-after=3s", "--all-namespaces", "--print-metrics-on-exit").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring(`observe_counts{type="Sync"}`)) out, err = oc.Run("observe").Args("services", "--exit-after=3s", "--all-namespaces", "--names", "echo", "--names", "default/notfound", "--delete", "echo", "--delete", "remove").Output() o.Expect(err).NotTo(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("remove default notfound")) g.By("error counting") out, err = oc.Run("observe").Args("services", "--exit-after=1m", "--all-namespaces", "--maximum-errors=1", "--", "/bin/sh", "-c", "exit 1").Output() o.Expect(err).To(o.HaveOccurred()) o.Expect(out).To(o.ContainSubstring("reached maximum error limit of 1, exiting")) |
问题解决:
最后
以上就是漂亮老师为你收集整理的2022-04-18 openshift-单元测试的一些问题环境:编译和运行:失败的测试用例收集:需要解决的失败用例:的全部内容,希望文章能够帮你解决2022-04-18 openshift-单元测试的一些问题环境:编译和运行:失败的测试用例收集:需要解决的失败用例:所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复