我是靠谱客的博主 清脆飞鸟,最近开发中收集的这篇文章主要介绍prometheus监控hadoop,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

说明:

  1. 通过使用jmx_prometheus_javaagent进行hadoop的监控

  2. 监控namenode

  3. 监控datanode

  4. 监控resourcemanager

  5. 监控nodemanager

1、下载jmx_exporter的jar包

下载链接:

  • 本次使用的版本是:0.15.0

 https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/

2、创建jmx监控的配置文件

  • namenode.yaml

  • datanode.yaml

  • 配置文件可放在任意位置

  • 端口需要是未被占用的端口

namenode.yaml

 [root@duanjl-hadoop ~]# cat /root/duanjl/namenode.yaml
 ---
 startDelaySeconds: 0
 #hostPort: 127.0.0.1:1234
 jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:1234/jmxrmi
 ssl: false
 lowercaseOutputName: false
 lowercaseOutputLabelNames: false

datanode.yaml

 [root@duanjl-hadoop ~]# cat /root/duanjl/datanode.yaml
 ---
 startDelaySeconds: 0
 #hostPort: 127.0.0.1:1235
 jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:1235/jmxrmi
 ssl: false
 lowercaseOutputName: false
 lowercaseOutputLabelNames: false

2.1、创建监控YARN组件yaml文件

  • resourcemanager.yaml

  • nodemanager.yaml

  • 配置文件可放在任意位置

  • 端口需要是未被占用的端口

resourcemanager.yaml

 startDelaySeconds: 0
 #hostPort: 100.86.13.73:1236
 jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:1236/jmxrmi
 ssl: false
 lowercaseOutputName: false
 lowercaseOutputLabelNames: false

nodemanager.yaml

 startDelaySeconds: 0
 #hostPort: 100.86.13.73:1237
 jmxUrl: service:jmx:rmi:///jndi/rmi://127.0.0.1:1237/jmxrmi
 ssl: false
 lowercaseOutputName: false
 lowercaseOutputLabelNames: false

NameDescription
startDelaySecondsstart delay before serving requests. Any requests within the delay period will result in an empty metrics set.
hostPortThe host and port to connect to via remote JMX. If neither this nor jmxUrl is specified, will talk to the local JVM.
usernameThe username to be used in remote JMX password authentication.
passwordThe password to be used in remote JMX password authentication.
jmxUrlA full JMX URL to connect to. Should not be specified if hostPort is.
sslWhether JMX connection should be done over SSL. To configure certificates you have to set following system properties: -Djavax.net.ssl.keyStore=/home/user/.keystore -Djavax.net.ssl.keyStorePassword=changeit -Djavax.net.ssl.trustStore=/home/user/.truststore -Djavax.net.ssl.trustStorePassword=changeit
lowercaseOutputNameLowercase the output metric name. Applies to default format and name. Defaults to false.
lowercaseOutputLabelNamesLowercase the output metric label names. Applies to default format and labels. Defaults to false.
whitelistObjectNamesA list of ObjectNames to query. Defaults to all mBeans.
blacklistObjectNamesA list of ObjectNames to not query. Takes precedence over whitelistObjectNames. Defaults to none.
rulesA list of rules to apply in order, processing stops at the first matching rule. Attributes that aren't matched aren't collected. If not specified, defaults to collecting everything in the default format.
patternRegex pattern to match against each bean attribute. The pattern is not anchored. Capture groups can be used in other options. Defaults to matching everything.
attrNameSnakeCaseConverts the attribute name to snake case. This is seen in the names matched by the pattern and the default format. For example, anAttrName to an_attr_name. Defaults to false.
nameThe metric name to set. Capture groups from the pattern can be used. If not specified, the default format will be used. If it evaluates to empty, processing of this attribute stops with no output.
valueValue for the metric. Static values and capture groups from the pattern can be used. If not specified the scraped mBean value will be used.
valueFactorOptional number that value (or the scraped mBean value if value is not specified) is multiplied by, mainly used to convert mBean values from milliseconds to seconds.
labelsA map of label name to label value pairs. Capture groups from pattern can be used in each. name must be set to use this. Empty names and values are ignored. If not specified and the default format is not being used, no labels are set.
helpHelp text for the metric. Capture groups from pattern can be used. name must be set to use this. Defaults to the mBean attribute decription and the full name of the attribute.
typeThe type of the metric, can be GAUGE, COUNTER or UNTYPED. name must be set to use this. Defaults to UNTYPED.

参考链接:

踩坑记录(四)_k.o.b.e-24的博客-CSDN博客

【集群监控】JMX exporter+Prometheus+Grafana监控Hadoop集群-Java架构师必看

3、修改hadoop-env.sh

路径:$HADOOP_HOME/etc/hadoop/hadoop-env.sh

NameNode节点添加:

  • 原配置:

 export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
  • 修改后的配置

  • 端口需要是未被占用的端口

 export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1234 -javaagent:/root/jmx_prometheus_javaagent-0.15.0.jar=19200:/root/duanjl/namenode.yaml"

DataNode节点添加:

  • 原配置

 export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
  • 修改后的配置

  • 端口需要是未被占用的端口

 export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1235 -javaagent:/root/jmx_prometheus_javaagent-0.15.0.jar=19300:/root/duanjl/datanode.yaml"

4、修改yarn.env.sh

  • 路径:$HADOOP_HOME/etc/hadoop/yarn.env.sh

  • 所有节点都执行

  • 添加以下配置

  • 端口需要是未被占用的端口

 export YARN_NODE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1237 -javaagent:/root/jmx_prometheus_javaagent-0.15.0.jar=19400:/root/duanjl/nodemanager.yaml"
 export YARN_RE_JMX_OPTS="-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.local.only=false   -Dcom.sun.management.jmxremote.port=1236 -javaagent:/root/jmx_prometheus_javaagent-0.15.0.jar=19500:/root/duanjl/resourcemanager.yaml"
 ​

5、修改 $HADOOP_HOME/bin/yarn

  • 所有节点都执行

  • 找到相关位置添加变量名称

  • 搜索 $YARN_RESOURCEMANAGER_OPTS

 # 在$YARN_RESOURCEMANAGER_OPTS 后加上 $YARN_RE_JMX_OPTS
 # 如:
 elif [ "$COMMAND" = "resourcemanager" ] ; then
   CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/rm-config/log4j.properties
   CLASSPATH=${CLASSPATH}:"$HADOOP_YARN_HOME/$YARN_DIR/timelineservice/*"
   CLASSPATH=${CLASSPATH}:"$HADOOP_YARN_HOME/$YARN_DIR/timelineservice/lib/*"
   CLASS='org.apache.hadoop.yarn.server.resourcemanager.ResourceManager'
   YARN_OPTS="$YARN_OPTS $YARN_RESOURCEMANAGER_OPTS $YARN_RE_JMX_OPTS"
   if [ "$YARN_RESOURCEMANAGER_HEAPSIZE" != "" ]; then
     JAVA_HEAP_MAX="-Xmx""$YARN_RESOURCEMANAGER_HEAPSIZE""m"
   fi
  • 搜索 $YARN_NODEMANAGER_OPTS

 # 在$YARN_NODEMANAGER_OPTS 后加上 $YARN_NODE_JMX_OPTS
 # 如:
 ​
 elif [ "$COMMAND" = "nodemanager" ] ; then
   CLASSPATH=${CLASSPATH}:$YARN_CONF_DIR/nm-config/log4j.properties
   CLASSPATH=${CLASSPATH}:"$HADOOP_YARN_HOME/$YARN_DIR/timelineservice/*"
   CLASSPATH=${CLASSPATH}:"$HADOOP_YARN_HOME/$YARN_DIR/timelineservice/lib/*"
   CLASS='org.apache.hadoop.yarn.server.nodemanager.NodeManager'
   YARN_OPTS="$YARN_OPTS -server $YARN_NODEMANAGER_OPTS $YARN_NODE_JMX_OPTS"
   if [ "$YARN_NODEMANAGER_HEAPSIZE" != "" ]; then
     JAVA_HEAP_MAX="-Xmx""$YARN_NODEMANAGER_HEAPSIZE""m"
   fi

6、重启hadoop

  • 关闭命令

 stop-all.sh
  • 启动命令

 start-all.sh

7、访问验证

Hadoop_NameNode:http://100.86.13.73:19200/

Hadoop_DataNode:http://100.86.13.73:19300/

Hadoop_NodeManager:http://100.86.13.73:19400/

Hadoop_ResourceManager:http://100.86.13.73:19500/

8、报错总结

8.1、At most one of hostPort and jmxUrl must be provided

解决:主要排查下面这两个yaml文件中,hostPortjmxUrl 这俩只能存在一个

  • namenode.yaml

  • datanode.yaml

8.2、Collector already registered that provides name: jmx_exporter_build_info

解决:HBase with JMX exporter BindException - Stack Overflow

最后

以上就是清脆飞鸟为你收集整理的prometheus监控hadoop的全部内容,希望文章能够帮你解决prometheus监控hadoop所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(46)

评论列表共有 0 条评论

立即
投稿
返回
顶部