亚洲AV无码在线观看一区二区三区 ,青青草黄色在线观看视频,日韩毛片无码一区二区三区

新聞中心

這里有您想知道的互聯(lián)網(wǎng)營銷解決方案

【夜鶯監(jiān)控】管理Kubernetes組件指標(biāo)

作者：喬克 2023-05-11 07:08:07

云計(jì)算

云原生這篇文章只討論 Kubernetes 本身的監(jiān)控，而且只討論如何在夜鶯體系中來監(jiān)控它們。

開始之前

Kubernetes 是一個(gè)簡單且復(fù)雜的系統(tǒng)，簡單之處在于其整體架構(gòu)比較簡單清晰，是一個(gè)標(biāo)準(zhǔn)的 Master-Slave 模式，如下：

但是，它又是一個(gè)復(fù)雜的系統(tǒng)，不論是 Master 還是 Slave，都有多個(gè)組件組合而成，如上圖所示：

Master 組件

apiserver：API 入口，負(fù)責(zé)認(rèn)證、授權(quán)、訪問控制、API 注冊與發(fā)現(xiàn)等。
scheduler：負(fù)責(zé)資源調(diào)度。
controller-manager：維護(hù)集群狀態(tài)。

Slave 組件。
kubelet：維護(hù)容器生命周期、CSI 管理以及 CNI 管理。
kube-proxy：負(fù)責(zé)服務(wù)發(fā)現(xiàn)和負(fù)載均衡。
container runtime（docker、containerd 等）：鏡像管理、容器運(yùn)行、CRI 管理等。
數(shù)據(jù)庫組件。
Etcd：保存集群狀態(tài)，與 apiserver 保持通信。

對于如此復(fù)雜的簡單系統(tǒng)，要時(shí)刻掌握里內(nèi)部的運(yùn)行狀態(tài)，是一件挺難的事情，因?yàn)樗母采w面非常的廣，主要涉及：

操作系統(tǒng)層面：Kubernetes 是部署在操作系統(tǒng)之上的，操作系統(tǒng)層面的監(jiān)控非常重要。
Kubernetes 本身：Kubernetes 涉及相當(dāng)多的組件，這些組件的運(yùn)行狀態(tài)關(guān)乎整個(gè)集群的穩(wěn)定性。
Kubernetes 之上的應(yīng)用：Kubernetes 是為應(yīng)用提供運(yùn)行環(huán)境的，企業(yè)的應(yīng)用系統(tǒng)都是部署在集群中，這些應(yīng)用的穩(wěn)定關(guān)乎企業(yè)的發(fā)展。
還有其他的比如網(wǎng)絡(luò)、機(jī)房、機(jī)柜等等底層支柱。

要監(jiān)控的非常多，SLI 也非常多。不過，這篇文章只討論 Kubernetes 本身的監(jiān)控，而且只討論如何在夜鶯體系中來監(jiān)控它們。

對于 Kubernetes 本身，主要是監(jiān)控其系統(tǒng)組件，如下：

!! Ps：這里不在介紹夜鶯監(jiān)控是怎么安裝的，如果不清楚的可以看《【夜鶯監(jiān)控】初識夜鶯》這篇文章，本次實(shí)驗(yàn)也是使用是這篇文章中的安裝方式。

KubeApiServer

ApiServer 是 Kubernetes 架構(gòu)中的核心，是所有 API 是入口，它串聯(lián)所有的系統(tǒng)組件。

為了方便監(jiān)控管理 ApiServer，設(shè)計(jì)者們?yōu)樗┞读艘幌盗械闹笜?biāo)數(shù)據(jù)。當(dāng)你部署完集群，默認(rèn)會(huì)在default名稱空間下創(chuàng)建一個(gè)名叫kubernetes的 service，它就是 ApiServer 的地址。

# kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1            443/TCP   309d

你可以通過curl -s -k -H "Authorization: Bearer $token" https://10.96.0.1:6443/metrics命令查看指標(biāo)。其中$token是通過在集群中創(chuàng)建 ServerAccount 以及授予相應(yīng)的權(quán)限得到。

所以，要監(jiān)控 ApiServer，采集到對應(yīng)的指標(biāo)，就需要先授權(quán)。為此，我們先準(zhǔn)備認(rèn)證信息。

創(chuàng)建 namespace

kubectl create ns flashcat

創(chuàng)建認(rèn)證授權(quán)信息

創(chuàng)建0-apiserver-auth.yaml文件，內(nèi)容如下：

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: categraf
rules:
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/metrics
      - nodes/stats
      - nodes/proxy
      - services
      - endpoints
      - pods
    verbs: ["get", "list", "watch"]
  - apiGroups:
      - extensions
      - networking.k8s.io
    resources:
      - ingresses
    verbs: ["get", "list", "watch"]
  - nonResourceURLs: ["/metrics", "/metrics/cadvisor"]
    verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: categraf
  namespace: flashcat
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: categraf
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: categraf
subjects:
  - kind: ServiceAccount
    name: categraf
    namespace: flashcat

上面的內(nèi)容主要是為categraf授予查詢相關(guān)資源的權(quán)限，這樣就可以獲取到這些組件的指標(biāo)數(shù)據(jù)了。

指標(biāo)采集

指標(biāo)采集的方式有很多種，建議通過自動(dòng)發(fā)現(xiàn)的方式進(jìn)行采集，這樣是不論是伸縮、修改組件都無需再次來調(diào)整監(jiān)控方式了。

夜鶯支持Prometheus Agent的方式獲取指標(biāo)，而且 Prometheus 在服務(wù)發(fā)現(xiàn)方面做的非常好，所以這里將使用Prometheus Agent方式來采集 ApiServer 的指標(biāo)。

（1）創(chuàng)建 Prometheus 配置

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-conf
  labels:
    name: prometheus-agent-conf
  namespace: flashcat
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'apiserver'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
    remote_write:
    - url: 'http://192.168.205.143:17000/prometheus/v1/write'

上面的內(nèi)容主要是通過endpoints的方式主動(dòng)發(fā)現(xiàn)在default名稱空間下名字為kubernetes且端口為https的服務(wù)，然后將獲取到的監(jiān)控指標(biāo)傳輸給夜鶯服務(wù)端http://192.168.205.143:17000/prometheus/v1/write（這個(gè)地址根據(jù)實(shí)際情況做調(diào)整）。

（2）部署 Prometheus Agent

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-agent
  namespace: flashcat
  labels:
    app: prometheus-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-agent
  template:
    metadata:
      labels:
        app: prometheus-agent
    spec:
      serviceAccountName: categraf
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--web.enable-lifecycle"
            - "--enable-feature=agent"
          ports:
            - containerPort: 9090
          resources:
            requests:
              cpu: 500m
              memory: 500M
            limits:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
      volumes:
        - name: prometheus-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-agent-conf
        - name: prometheus-storage-volume
          emptyDir: {}

其中--enable-feature=agent表示啟動(dòng)的是 agent 模式。

然后將上面的所有 YAML 文件部署到 Kubernetes 中，然后查看 Prometheus Agent 是否正常。

# kubectl get po -n flashcat
NAME                                READY   STATUS    RESTARTS   AGE
prometheus-agent-78c8ccc4f5-g25st   1/1     Running   0          92s

然后可以到夜鶯UI查看對應(yīng)的指標(biāo)。

獲取到了指標(biāo)數(shù)據(jù)，后面就是合理利用指標(biāo)做其他動(dòng)作，比如構(gòu)建面板、告警處理等。

比如夜鶯Categraf提供了 ApiServer 的儀表盤（https://github.com/flashcatcloud/categraf/blob/main/k8s/apiserver-dash.json），導(dǎo)入后如下：

但是，不論是做面板也好，還是做告警也罷，首先都要對 ApiServer 的指標(biāo)有一個(gè)清晰的認(rèn)識。

下面做了一些簡單的整理。

指標(biāo)簡介

以下指標(biāo)來自阿里云 ACK 官方文檔，我覺得整理的比較全，比較細(xì)，就貼了一部分。想要了解更多的可以到官方網(wǎng)站去查看。

指標(biāo)清單

指標(biāo)	類型	解釋
apiserver_request_duration_seconds_bucket	Histogram	該指標(biāo)用于統(tǒng)計(jì) APIServer 客戶端對 APIServer 的訪問時(shí)延。對 APIServer 不同請求的時(shí)延分布。請求的維度包括 Verb、Group、Version、Resource、Subresource、Scope、Component 和 Client。
Histogram Bucket 的閾值為：{0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 60}，單位：秒。
apiserver_request_total	Counter	對 APIServer 不同請求的計(jì)數(shù)。請求的維度包括 Verb、Group、Version、Resource、Scope、Component、HTTP contentType、HTTP code 和 Client。
apiserver_request_no_resourceversion_list_total	Counter	對 APIServer 的請求參數(shù)中未配置 ResourceVersion 的 LIST 請求的計(jì)數(shù)。請求的維度包括 Group、Version、Resource、Scope 和 Client。用來評估 quorum read 類型 LIST 請求的情況，用于發(fā)現(xiàn)是否存在過多 quorum read 類型 LIST 以及相應(yīng)的客戶端，以便優(yōu)化客戶端請求行為。
apiserver_current_inflight_requests	Gauge	APIServer 當(dāng)前處理的請求數(shù)。包括 ReadOnly 和 Mutating 兩種。
apiserver_dropped_requests_total	Counter	限流丟棄掉的請求數(shù)。HTTP 返回值是429 'Try again later'。
apiserver_admission_controller_admission_duration_seconds_bucket	Gauge	準(zhǔn)入控制器（Admission Controller）的處理延時(shí)。標(biāo)簽包括準(zhǔn)入控制器名字、操作（CREATE、UPDATE、CONNECT 等）、API 資源、操作類型（validate 或 admit）和請求是否被拒絕（true 或 false）。
Bucket 的閾值為：{0.005, 0.025, 0.1, 0.5, 2.5}，單位：秒。
apiserver_admission_webhook_admission_duration_seconds_bucket	Gauge	準(zhǔn)入 Webhook（Admission Webhook）的處理延時(shí)。標(biāo)簽包括準(zhǔn)入控制器名字、操作（CREATE、UPDATE、CONNECT 等）、API 資源、操作類型（validate 或 admit）和請求是否被拒絕（true 或 false）。
Bucket 的閾值為：{0.005, 0.025, 0.1, 0.5, 2.5}，單位：秒。
apiserver_admission_webhook_admission_duration_seconds_count	Counter	準(zhǔn)入 Webhook（Admission Webhook）的處理請求統(tǒng)計(jì)。標(biāo)簽包括準(zhǔn)入控制器名字、操作（CREATE、UPDATE、CONNECT 等）、API 資源、操作類型（validate 或 admit）和請求是否被拒絕（true 或 false）。
cpu_utilization_core	Gauge	CPU 使用量，單位：核（Core）。
cpu_utilization_ratio	Gauge	CPU 使用率=CPU 使用量/內(nèi)存資源上限，百分比形式。
memory_utilization_byte	Gauge	內(nèi)存使用量，單位：字節(jié)（Byte）。
memory_utilization_ratio	Gauge	內(nèi)存使用率=內(nèi)存使用量/內(nèi)存資源上限，百分比形式。
up	Gauge	服務(wù)可用性。

1：表示服務(wù)可用。
0：表示服務(wù)不可用。

關(guān)鍵指標(biāo)

名稱	PromQL	說明
API QPS	sum(irate(apiserver_request_total[$interval]))	APIServer 總 QPS。
讀請求成功率	sum(irate(apiserver_request_total{code=~"20.*",verb=~"GET\|LIST"}[interval]))	APIServer 讀請求成功率。
寫請求成功率	sum(irate(apiserver_request_total{code=~"20.*",verb!~"GET\|LIST\|WATCH\|CONNECT"}[interval]))	APIServer 寫請求成功率。
在處理讀請求數(shù)量	sum(apiserver_current_inflight_requests{requestKind="readOnly"})	APIServer 當(dāng)前在處理讀請求數(shù)量。
在處理寫請求數(shù)量	sum(apiserver_current_inflight_requests{requestKind="mutating"})	APIServer 當(dāng)前在處理寫請求數(shù)量。
請求限流速率	sum(irate(apiserver_dropped_requests_total[$interval]))	Dropped Request Rate。

資源指標(biāo)

名稱	PromQL	說明
內(nèi)存使用量	memory_utilization_byte{cnotallow="kube-apiserver"}	APIServer 內(nèi)存使用量，單位：字節(jié)。
CPU 使用量	cpu_utilization_core{cnotallow="kube-apiserver"}*1000	CPU 使用量，單位：豪核。
內(nèi)存使用率	memory_utilization_ratio{cnotallow="kube-apiserver"}	APIServer 內(nèi)存使用率，百分比。
CPU 使用率	cpu_utilization_ratio{cnotallow="kube-apiserver"}	APIServer CPU 使用率，百分比。
資源對象數(shù)量

max by(resource)(apiserver_storage_objects)。
max by(resource)(etcd_object_counts) | Kubernetes 管理資源數(shù)量，不同版本名稱可能不同。 |

QPS 和時(shí)延

名稱	PromQL	說明
按 Verb 維度分析 QPS	sum(irate(apiserver_request_total{verb=~"verb"}[interval]))by(verb)	按 Verb 維度，統(tǒng)計(jì)單位時(shí)間（1s）內(nèi)的請求 QPS。
按 Verb+Resource 維度分析 QPS	sum(irate(apiserver_request_total{verb=~"resource"}[$interval]))by(verb,resource)	按 Verb+Resource 維度，統(tǒng)計(jì)單位時(shí)間（1s）內(nèi)的請求 QPS。
按 Verb 維度分析請求時(shí)延	histogram_quantile(interval])) by (le,verb))	按 Verb 維度，分析請求時(shí)延。
按 Verb+Resource 維度分析請求時(shí)延	histogram_quantile(interval])) by (le,verb,resource))	按 Verb+Resource 維度，分析請求時(shí)延。
非 2xx 返回值的讀請求 QPS	sum(irate(apiserver_request_total{verb=~"GET\|LIST",resource=~"resource",code!~"2.*"}[interval])) by (verb,resource,code)	統(tǒng)計(jì)非 2xx 返回值的讀請求 QPS。
非 2xx 返回值的寫請求 QPS	sum(irate(apiserver_request_total{verb!~"GET\|LIST\|WATCH",verb=~"resource",code!~"2.*"}[$interval])) by (verb,resource,code)	統(tǒng)計(jì)非 2xx 返回值的寫請求 QPS。

KubeControllerManager

ControllerManager 也是 Kubernetes 的重要組件，它負(fù)責(zé)整個(gè)集群的資源控制管理，它有許多的控制器，比如 NodeController、JobController 等。

ControllerManager 的監(jiān)控思路和 ApiServer 一樣，都使用 Prometheus Agent 進(jìn)行采集。

指標(biāo)采集

ControllerManager 是通過10257的/metrics接口進(jìn)行指標(biāo)采集，要訪問這個(gè)接口同樣需要相應(yīng)的權(quán)限，不過我們在采集 ApiServer 的時(shí)候創(chuàng)建過相應(yīng)的權(quán)限，這里就不用創(chuàng)建了。

（1）添加 Prometheus 配置在原有的 Prometheus 采集配置中新增一個(gè) job 用于采集 ControllerManager，如下：

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-conf
  labels:
    name: prometheus-agent-conf
  namespace: flashcat
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'apiserver'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

      - job_name: 'controller-manager'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: kube-system;kube-controller-manager;https-metrics

    remote_write:
    - url: 'http://192.168.205.143:17000/prometheus/v1/write'

由于我的集群里沒有相應(yīng)的 endpoints，所以需要?jiǎng)?chuàng)建一個(gè)，如下：

apiVersion: v1
kind: Service
metadata:
  annotations:
  labels:
    k8s-app: kube-controller-manager
  name: kube-controller-manager
  namespace: kube-system
spec:
  clusterIP: None
  ports:
    - name: https-metrics
      port: 10257
      protocol: TCP
      targetPort: 10257
  selector:
    component: kube-controller-manager
  sessionAffinity: None
  type: ClusterIP

將 YAML 的資源更新到 Kubernetes 中，然后使用curl -X POST "http://:9090/-/reload"重載 Prometheus。

但是現(xiàn)在我們還無法獲取到 ControllerManager 的指標(biāo)數(shù)據(jù)，需要把 ControllerManager 的bind-address改成0.0.0.0。

然后就可以在夜鶯 UI 中查看指標(biāo)了。

然后可以導(dǎo)入https://github.com/flashcatcloud/categraf/blob/main/k8s/cm-dash.json的是數(shù)據(jù)大盤。

指標(biāo)簡介

指標(biāo)清單

指標(biāo)	類型	說明
workqueue_adds_total	Counter	Workqueue 處理的 Adds 事件的數(shù)量。
workqueue_depth	Gauge	Workqueue 當(dāng)前隊(duì)列深度。
workqueue_queue_duration_seconds_bucket	Histogram	任務(wù)在 Workqueue 中存在的時(shí)長。
memory_utilization_byte	Gauge	內(nèi)存使用量，單位：字節(jié)（Byte）。
memory_utilization_ratio	Gauge	內(nèi)存使用率=內(nèi)存使用量/內(nèi)存資源上限，百分比形式。
cpu_utilization_core	Gauge	CPU 使用量，單位：核（Core）。
cpu_utilization_ratio	Gauge	CPU 使用率=CPU 使用量/內(nèi)存資源上限，百分比形式。
rest_client_requests_total	Counter	從狀態(tài)值（Status Code）、方法（Method）和主機(jī)（Host）維度分析 HTTP 請求數(shù)。
rest_client_request_duration_seconds_bucket	Histogram	從方法（Verb）和 URL 維度分析 HTTP 請求時(shí)延。

Queue 指標(biāo)

名稱	PromQL	說明
Workqueue 入隊(duì)速率	sum(rate(workqueue_adds_total{job="ack-kube-controller-manager"}[$interval])) by (name)	無
Workqueue 深度	sum(rate(workqueue_depth{job="ack-kube-controller-manager"}[$interval])) by (name)	無
Workqueue 處理時(shí)延	histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-kube-controller-manager"}[5m])) by (name, le))	無

資源指標(biāo)

名稱	PromQL	說明
內(nèi)存使用量	memory_utilization_byte{cnotallow="kube-controller-manager"}	內(nèi)存使用量，單位：字節(jié)。
CPU 使用量	cpu_utilization_core{cnotallow="kube-controller-manager"}*1000	CPU 使用量，單位：毫核。
內(nèi)存使用率	memory_utilization_ratio{cnotallow="kube-controller-manager"}	內(nèi)存使用率，百分比。
CPU 使用率	cpu_utilization_ratio{cnotallow="kube-controller-manager"}	CPU 使用率，百分比。

QPS 和時(shí)延

名稱	PromQL	說明
Kube API 請求 QPS

sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"2.."}[$interval])) by (method,code)。
sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"3.."}[$interval])) by (method,code)。
sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"4.."}[$interval])) by (method,code)。
sum(rate(rest_client_requests_totaljob="ack-scheduler",code=~"5.."}[$interval])) by (method,code)對 kube-apiserver 發(fā)起的 HTTP 請求，從方法（Method）和返回值（Code) 維度分析。 | | Kube API 請求時(shí)延 | histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-kube-controller-manager"[$interval])) by (verb,url,le)) | 對 kube-apiserver 發(fā)起的 HTTP 請求時(shí)延，從方法（Verb）和請求 URL 維度分析。 |

KubeScheduler

Scheduler 監(jiān)聽在10259端口，依然通過 Prometheus Agent 的方式采集指標(biāo)。

指標(biāo)采集

（1）編輯 Prometheus 配置文件

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-conf
  labels:
    name: prometheus-agent-conf
  namespace: flashcat
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'apiserver'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

      - job_name: 'controller-manager'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: kube-system;kube-controller-manager;https-metrics
      - job_name: 'scheduler'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: kube-system;kube-scheduler;https

    remote_write:
    - url: 'http://192.168.205.143:17000/prometheus/v1/write'

然后配置 Scheduler 的 Service。

apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: kube-system
spec:
  clusterIP: None
  ports:
    - name: https
      port: 10259
      protocol: TCP
      targetPort: 10259
  selector:
    component: kube-scheduler
  sessionAffinity: None
  type: ClusterIP

將 YAML 的資源更新到 Kubernetes 中，然后使用curl -X POST "http://:9090/-/reload"重載 Prometheus。

但是現(xiàn)在我們還無法獲取到 Scheduler 的指標(biāo)數(shù)據(jù)，需要把 Scheduler 的bind-address改成0.0.0.0。

修改完成過后就可以正常在夜鶯UI中查看指標(biāo)了。

導(dǎo)入監(jiān)控大盤（https://github.com/flashcatcloud/categraf/blob/main/k8s/scheduler-dash.json）。

指標(biāo)簡介

指標(biāo)清單

指標(biāo)清單	類型	說明
scheduler_scheduler_cache_size	Gauge	調(diào)度器緩存中 Node、Pod 和 AssumedPod 的數(shù)量。
scheduler_pending_pods	Gauge	Pending Pod 的數(shù)量。隊(duì)列種類如下：

unschedulable：表示不可調(diào)度的 Pod 數(shù)量。
backoff：表示 backoffQ 的 Pod 數(shù)量。
active：表示 activeQ 的 Pod 數(shù)量。 | | scheduler_pod_scheduling_attempts_bucket | Histogram | 調(diào)度器嘗試成功調(diào)度 Pod 的次數(shù)，Bucket 閾值為 1、2、4、8、16。 | | memory_utilization_byte | Gauge | 內(nèi)存使用量，單位：字節(jié)（Byte）。 | | memory_utilization_ratio | Gauge | 內(nèi)存使用率=內(nèi)存使用量/內(nèi)存資源上限，百分比形式。 | | cpu_utilization_core | Gauge | CPU 使用量，單位：核（Core）。 | | cpu_utilization_ratio | Gauge | CPU 使用率=CPU 使用量/內(nèi)存資源上限，百分比形式。 | | rest_client_requests_total | Counter | 從狀態(tài)值（Status Code）、方法（Method）和主機(jī)（Host）維度分析 HTTP 請求數(shù)。 | | rest_client_request_duration_seconds_bucket | Histogram | 從方法（Verb）和 URL 維度分析 HTTP 請求時(shí)延。 |

基本指標(biāo)

指標(biāo)清單	PromQL	說明
Scheduler 集群統(tǒng)計(jì)數(shù)據(jù)

scheduler_scheduler_cache_size{job="ack-scheduler",type="nodes"}
scheduler_scheduler_cache_size{job="ack-scheduler",type="pods"}
scheduler_scheduler_cache_sizejob="ack-scheduler",type="assumed_pods"}調(diào)度器緩存中 Node、Pod 和 AssumedPod 的數(shù)量。 | | Scheduler Pending Pods | scheduler_pending_pods{job="ack-scheduler"| Pending Pod 的數(shù)量。隊(duì)列種類如下：
unschedulable：表示不可調(diào)度的 Pod 數(shù)量。
backoff：表示 backoffQ 的 Pod 數(shù)量。
active：表示 activeQ 的 Pod 數(shù)量。 | | Scheduler 嘗試成功調(diào)度 Pod 次數(shù) | histogram_quantile(interval])) by (pod, le)) | 調(diào)度器嘗試調(diào)度 Pod 的次數(shù)，Bucket 閾值為 1、2、4、8、16。 |

資源指標(biāo)

指標(biāo)清單	PromQL	說明
內(nèi)存使用量	memory_utilization_byte{cnotallow="kube-scheduler"}	內(nèi)存使用量，單位：字節(jié)。
CPU 使用量	cpu_utilization_core{cnotallow="kube-scheduler"}*1000	CPU 使用量，單位：毫核。
內(nèi)存使用率	memory_utilization_ratio{cnotallow="kube-scheduler"}	內(nèi)存使用率，百分比。
CPU 使用率	cpu_utilization_ratio{cnotallow="kube-scheduler"}	CPU 使用率，百分比。

QPS 和時(shí)延

指標(biāo)清單	PromQL	說明
Kube API 請求 QPS

sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"2.."}[$interval])) by (method,code)
sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"3.."}[$interval])) by (method,code)
sum(rate(rest_client_requests_total{job="ack-scheduler",code=~"4.."}[$interval])) by (method,code)
sum(rate(rest_client_requests_totaljob="ack-scheduler",code=~"5.."}[$interval])) by (method,code)調(diào)度器對 kube-apiserver 發(fā)起的 HTTP 請求，從方法（Method）和返回值（Code) 維度分析。 | | Kube API 請求時(shí)延 | histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-scheduler"[$interval])) by (verb,url,le)) | 調(diào)度器對 kube-apiserver 發(fā)起的 HTTP 請求時(shí)延，從方法（Verb）和請求 URL 維度分析。 |

Etcd

Etcd 是 Kubernetes 的存儲中心，所有資源信息都是存在在其中，它通過2381端口對外提供監(jiān)控指標(biāo)。

指標(biāo)采集

由于我這里的 Etcd 是通過靜態(tài) Pod 的方式部署到 Kubernetes 集群中的，所以依然使用 Prometheus Agent 來采集指標(biāo)。

（1）配置 Prometheus 的采集配置

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-conf
  labels:
    name: prometheus-agent-conf
  namespace: flashcat
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'apiserver'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

      - job_name: 'controller-manager'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: kube-system;kube-controller-manager;https-metrics
      - job_name: 'scheduler'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: kube-system;kube-scheduler;https
      - job_name: 'etcd'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: http
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: kube-system;etcd;http
    remote_write:
    - url: 'http://192.168.205.143:17000/prometheus/v1/write'

然后增加 Etcd 的 Service 配置。

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: etcd
  labels:
    k8s-app: etcd
spec:
  selector:
    component: etcd
  type: ClusterIP
  clusterIP: None
  ports:
    - name: http
      port: 2381
      targetPort: 2381
      protocol: TCP

部署 YAML 文件，并重啟 Prometheus。如果獲取不到指標(biāo)，需要修改 Etcd 的listen-metrics-urls配置為0.0.0.0。

導(dǎo)入監(jiān)控大盤（https://github.com/flashcatcloud/categraf/blob/main/k8s/etcd-dash.json）。

指標(biāo)簡介

指標(biāo)清單

指標(biāo)	類型	說明
cpu_utilization_core	Gauge	CPU 使用量，單位：核（Core）。
cpu_utilization_ratio	Gauge	CPU 使用率=CPU 使用量/內(nèi)存資源上限，百分比形式。
etcd_server_has_leader	Gauge	etcd member 是否有 Leader。

1：表示有主節(jié)點(diǎn)。
0：表示沒有主節(jié)點(diǎn)。 | | etcd_server_is_leader | Gauge | etcd member 是否是 Leader。
1：表示是。
0：表示不是。 | | etcd_server_leader_changes_seen_total | Counter | etcd member 過去一段時(shí)間切主次數(shù)。 | | etcd_mvcc_db_total_size_in_bytes | Gauge | etcd member db 總大小。 | | etcd_mvcc_db_total_size_in_use_in_bytes | Gauge | etcd member db 實(shí)際使用大小。 | | etcd_disk_backend_commit_duration_seconds_bucket | Histogram | etcd backend commit 延時(shí)。 Bucket 列表為：**[0.001 0.002 0.004 0.008 0.016 0.032 0.064 0.128 0.256 0.512 1.024 2.048 4.096 8.192]**。 | | etcd_debugging_mvcc_keys_total | Gauge | etcd keys 總數(shù)。 | | etcd_server_proposals_committed_total | Gauge | raft proposals commit 提交總數(shù)。 | | etcd_server_proposals_applied_total | Gauge | raft proposals apply 總數(shù)。 | | etcd_server_proposals_pending | Gauge | raft proposals 排隊(duì)數(shù)量。 | | etcd_server_proposals_failed_total | Counter | raft proposals 失敗數(shù)量。 | | memory_utilization_byte | Gauge | 內(nèi)存使用量，單位：字節(jié)（Byte）。 | | memory_utilization_ratio | Gauge | 內(nèi)存使用率=內(nèi)存使用量/內(nèi)存資源上限，百分比形式。 |

基礎(chǔ)指標(biāo)

名稱	PromQL	說明
etcd 存活狀態(tài)

etcd_server_has_leader
etcd_server_is_leader == 1 |
etcd member 是否存活，正常值為 3。
etcd member 是否是主節(jié)點(diǎn)，正常情況下，必須有一個(gè) Member 為主節(jié)點(diǎn)。 | | 過去一天切主次數(shù) | changes(etcd_server_leader_changes_seen_totaljob="etcd"}[1d])過去一天內(nèi) etcd 集群切主次數(shù)。 | | 內(nèi)存使用量 | memory_utilization_byte{cnotallow="etcd"| 內(nèi)存使用量，單位：字節(jié)。 | | CPU 使用量 | cpu_utilization_corecnotallow="etcd"}*1000CPU 使用量，單位：毫核。 | | 內(nèi)存使用率 | memory_utilization_ratio{cnotallow="etcd"| 內(nèi)存使用率，百分比。 | | CPU 使用率 | cpu_utilization_ratio{cnotallow="etcd"} | CPU 使用率，百分比。 | | 磁盤大小 |
etcd_mvcc_db_total_size_in_bytes
etcd_mvcc_db_total_size_in_use_in_bytes |
etcd backend db 總大小。
etcd backend db 實(shí)際使用大小。 | | kv 總數(shù) | etcd_debugging_mvcc_keys_total | etcd 集群 kv 對總數(shù)。 | | backend commit 延遲 | histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket{job="etcd"}[5m])) by (instance, le)) | db commit 時(shí)延。 | | raft proposal 情況 |
rate(etcd_server_proposals_failed_total{job="etcd"}[1m])
etcd_server_proposals_pending{job="etcd"}
etcd_server_proposals_committed_total{job="etcd"} - etcd_server_proposals_applied_total{job="etcd"} |
raft proposal failed 速率（分鐘）。
raft proposal pending 總數(shù)。
commit-apply 差值。 |

kubelet

kubelet 工作節(jié)點(diǎn)的主要組件，它監(jiān)聽兩個(gè)端口：10248和10250。10248是監(jiān)控檢測端口，10250是系統(tǒng)默認(rèn)端口，通過它的/metrics接口暴露指標(biāo)。

指標(biāo)采集

這里依然通過 Prometheus Agent 的方式采集 kubelet 的指標(biāo)。

（1）修改 Prometheus 的配置文件

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-agent-conf
  labels:
    name: prometheus-agent-conf
  namespace: flashcat
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s

    scrape_configs:
      - job_name: 'apiserver'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

      - job_name: 'controller-manager'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          insecure_skip_verify: true
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __m                                                

                                                分享名稱：【夜鶯監(jiān)控】管理Kubernetes組件指標(biāo)                                                

                                                轉(zhuǎn)載源于：http://www.dlmjj.cn/article/dpsopgi.html


                                            
                                                
                                                    其他資訊
                                                
                                                
                                                    
                                                        
                                                                中國服務(wù)器機(jī)房排名?
                                                            

                                                                windows服務(wù)器的安裝配置和維護(hù)？（數(shù)據(jù)庫服務(wù)器對硬件配置的要求）
                                                            

                                                                北京VPS主機(jī)哪家好用,北京VPS主機(jī)
                                                            

                                                                網(wǎng)絡(luò)存儲服務(wù)器與普通服務(wù)器區(qū)別在哪?
                                                            

                                                                如何利用區(qū)塊鏈技術(shù)加強(qiáng)網(wǎng)絡(luò)安全

日本综合一区二区|亚洲中文天堂综合|日韩欧美自拍一区|男女精品天堂一区|欧美自拍第6页亚洲成人精品一区|亚洲黄色天堂一区二区成人|超碰91偷拍第一页|日韩av夜夜嗨中文字幕|久久蜜综合视频官网|精美人妻一区二区三区

新聞中心

【夜鶯監(jiān)控】管理Kubernetes組件指標(biāo)

開始之前

KubeApiServer

創(chuàng)建 namespace

創(chuàng)建認(rèn)證授權(quán)信息

指標(biāo)采集

（1）創(chuàng)建 Prometheus 配置

（2）部署 Prometheus Agent

指標(biāo)簡介

指標(biāo)清單

關(guān)鍵指標(biāo)

資源指標(biāo)

QPS 和時(shí)延

KubeControllerManager

指標(biāo)采集

指標(biāo)簡介

指標(biāo)清單

Queue 指標(biāo)

資源指標(biāo)

QPS 和時(shí)延

KubeScheduler

指標(biāo)采集

（1）編輯 Prometheus 配置文件

指標(biāo)簡介

指標(biāo)清單

基本指標(biāo)

資源指標(biāo)

QPS 和時(shí)延

Etcd

指標(biāo)采集

（1）配置 Prometheus 的采集配置

指標(biāo)簡介

指標(biāo)清單

基礎(chǔ)指標(biāo)

kubelet

指標(biāo)采集

（1）修改 Prometheus 的配置文件

其他資訊