Presto安装部署详细说明

89 阅读 0 评论 59 点赞

我是靠谱客的博主清新抽屉，这篇文章主要介绍Presto安装部署详细说明，现在分享给大家，希望可以做个参考。

一、官网教程

https://prestodb.io/docs/current/installation.html

二、环境准备

Presto需要：Linux Java8_64

JDK自行安装，不用多说了

三、单节点部署

1.presto下载

下载安装包：https://prestodb.io/download.html

时间：2020-08-03，最新版本0.238.2，大小：800M左右

presto-server-0.238.2.tar.gz
presto-cli-0.238.2-executable.jar
presto-jdbc-0.238.2.jar

2.上传解压

1.所有应用安装到/data下(个人习惯)，创建presto目录：

复制代码

1
mkdir presto

2.上传presto文件，并解压

复制代码

1
tar -xzvf presto-server-0.238.2.tar.gz

3.配置Presto

在安装目录下创建一个etc目录。在etc目录下配置以下信息：

节点属性(Node Properties)：每个节点的环境配置信息
JVM配置(JVM Config)：JVM的命令行选项
配置属性(Config Properties)：PrestoServer的配置信息
日志级别(Log Properties)：日志级别配置信息
Catalog属性(Catalog Properties)：连接器配置信息(数据源)

3.1 Node Properties

节点属性配置文件：etc/node.properties包含针对于每个节点的特定的配置信息。一个节点就是在一台机器上安装的一个实例。这个配置文件一般情况下是在Presto第一次安装的时候，由部署系统创建的，一个etc/node.properties配置文件至少包含如下配置信息：

复制代码node.environment=production node.id=ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir=/var/presto/data
1
2
3
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/var/presto/data

各配置参数描述如下：

node.enviroment：集群名称，所有在同一个集群中的Presto节点必须拥有相同的集群名称，这个名字就是控制台右上角展示的enviroment。

node.id：每个Presto节点的唯一标识，每个节点的node.id都必须是唯一的。在Presto进行重启或者升级过程中每个节点的node.id必须保持不变。如果在一个节点上安装多个Presto实例(例如：在同一台机器上安装多个Presto节点)，那么每个Presto节点必须拥有唯一的node.id。

node.data-dir：数据存储目录的位置(操作系统上的路径)，Presto将会把日期和数据存储在这个目录下。

具体配置：

复制代码

1
2
3
4
5
6
7
cd ./etc
vi node.properties

#输入配置
node.environment=test_bob
node.id=bigdata_test1
node.data-dir=/data/presto/presto-server-0.238.2/data

3.2 JVM Config

JVM配置文件：etc/jvm.config，包含一系列在启动JVM时候需要使用的命令行选项。

一个典型的etc/jvm.config配置文件如下：

复制代码-server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:ReservedCodeCacheSize=256M
1
2
3
4
5
6
7
8
9
10
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:ReservedCodeCacheSize=256M

由于OutOfMemoryError将会导致JVM处于不一致状态，一般的处理措施是将写入dump headp中的信息，然后强制终止进程。老版的可添加-XX:OnOutOfMemoryError=kill -9 %p，新版的用-XX:+ExitOnOutOfMemoryError。

具体的内存配置大小可以根据机器的内存大小设置，用命令free -g查看，不宜设置过大，也不能太小。由于测试机器内存只有32G，所以配置就用了上面的配置。

具体配置：

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
cd ./etc
vi jvm.config

#输入配置
-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-XX:ReservedCodeCacheSize=256M

3.3 Config Properties

Presto配置文件：etc/config.properties包含了Presto Server的所有配置信息。每个Presto Server即可以是一个coordinator也可以是一个worker。但是在大型集群中，处于性能考虑，建议单独用一台机器作为coordinator(调度节点)。

一个coordinator的配置至少包含以下信息：

复制代码coordinator=true node-scheduler.include-coordinator=false http-server.http.port=8080 query.max-memory=50GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://example.net:8080
1
2
3
4
5
6
7
8
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

一个worker的配置至少包含以下信息：

复制代码coordinator=false http-server.http.port=8080 query.max-memory=50GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery.uri=http://example.net:8080
1
2
3
4
5
6
coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery.uri=http://example.net:8080

作为单节点测试，这台机器既会作为coordinator，也会作为worker。配置如下：

复制代码coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://example.net:8080
1
2
3
4
5
6
7
8
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

各配置项解释：

coordinator：指定是否运维Presto实例作为一个coordinator(接收来自客户端的查询情切管理每个查询的执行过程)。

node-scheduler.include-coordinator：是否允许在coordinator服务中进行调度工作(即作为coordinator又作为worker。对于大型的集群，在一个节点上的Presto server即作为coordinator又作为worker将会降低查询性能。因为如果一个服务器作为worker使用，那么大部分的资源都会被worker占用，那么就不会有足够的资源进行关键任务调度、管理和监控查询执行。

http-server.http.port：指定HTTP server的端口。Presto 使用 HTTP进行内部和外部的所有通讯。

query.max-memory：可查询最大内存。

query.max-memory-per-node：可查询最大单用户内存。

query.max-total-memory-per-node：一个查询可使用的最大用户和内存。

discovery-server.enabled：Presto 通过Discovery 服务来找到集群中所有的节点。为了能够找到集群中所有的节点，每一个Presto实例都会在启动的时候将自己注册到discovery服务。Presto为了简化部署，并且也不想再增加一个新的服务进程，Presto coordinator 可以运行一个内嵌在coordinator 里面的Discovery 服务。这个内嵌的Discovery 服务和Presto共享HTTP server并且使用同样的端口。

discovery.uri：Discovery server的URI。由于启用了Presto coordinator内嵌的Discovery 服务，因此这个uri就是Presto coordinator的uri。修改example.net:8080，根据你的实际环境设置该URI。注意：这个URI一定不能以“/“结尾。

具体配置：

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
vi config.properties

#调度节点配置
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=9666
query.max-memory=30GB
query.max-memory-per-node=3GB
query.max-cpu-time=4h
discovery-server.enabled=true
discovery.uri=http://yy-t-bigdata1.niwodai.com:9666
#选择性配置
scheduler.http-client.request-timeout=10m
scheduler.http-client.idle-timeout=10m
node-manager.http-client.request-timeout=10m
node-manager.http-client.idle-timeout=10m
memoryManager.http-client.request-timeout=10m
memoryManager.http-client.idle-timeout=10m
task.max-worker-threads=32


#worker节点配置
coordinator=false
http-server.http.port=9666
query.max-memory=10GB
query.max-memory-per-node=1GB
discovery.uri=http://yy-t-bigdata1.niwodai.com:9666

3.4 Log Properties

日志配置文件：etc/log.properties。类似Java的日志级别，包括INFO、DEBUG、ERROR。

复制代码com.facebook.presto=INFO
1
com.facebook.presto=INFO

3.5 Catalog Properties

Presto通过connectors访问数据。这些connectors挂载在catalogs上。 connector可以提供一个catalog中所有的schema和表。例如： Hive connector 将每个hive的database都映射成为一个schema，所以如果hive connector挂载到了名为hive的catalog，并且在hive的web有一张名为clicks的表，那么在Presto中可以通过hive.web.clicks来访问这张表。
通过在etc/catalog目录下创建catalog属性文件来完成catalogs的注册。
例如：
如果要创建jmx数据源的连接器，可以创建一个etc/catalog/jmx.properties文件，文件中的内容如下，完成在jmxcatalog上挂载一个jmxconnector：
connector.name=jmx

如果要创建hive数据源的连接器，可以创建一个etc/catalog/hive.properties文件，文件中的内容如下，完成在hivecatalog上挂载一个hiveconnector：
connector.name=hive-hadoop2
hive.metastore.uri=thrift://example.net:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

参数说明
connector.name为连接器名称，hive的话需要加上版本号例如hive-hadoop2
hive.metastore.uri需要与hive的metastore地址和端口对应。
一般配置在/etc/hive/conf/hive-site.xml中。

具体配置：

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mkdir catalog
cd catalog
vi hive.properties

#输入配置
connector.name=hive-hadoop2
hive.metastore.uri=thrift://yy-t-bigdata2.niwodai.com:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml, /etc/hadoop/conf/hdfs-site.xml
hive.allow-drop-table=true
hive.storage-format=ORC
hive.metastore-cache-ttl=1s
hive.metastore-refresh-interval=1s
hive.metastore-timeout=35m
hive.max-partitions-per-writers=1000

配置完成，最终etc下：catalog config.properties jvm.config log.properties node.properties

3.6 启动运行presto

在安装目录的bin/launcher文件，就是启动脚本。Presto可以使用如下命令作为一个后台进程启动：

复制代码bin/launcher start
1
bin/launcher start

另外，也可以在前台运行，日志和相关输出将会写入stdout/stderr

复制代码bin/launcher run
1
bin/launcher run

停止

bin/launcher stop

重启

bin/launcher restart

查看服务进程

bin/launcher status

查看进程

ps -aux|grep PrestoServer 或 jps

启动完之后，日志将会写在var/log目录下，该目录下有如下文件：

launcher.log：这个日志文件由launcher创建，并且server的stdout和stderr都被重定向到了这个日志文件中。这份日志文件中只会有很少的信息，包括：
在server日志系统初始化的时候产生的日志和JVM产生的诊断和测试信息。
server.log：这个是Presto使用的主要日志文件。一般情况下，该文件中将会包括server初始化失败时产生的相关信息。这份文件会被自动轮转和压缩。
http-request.log：这是HTTP请求的日志文件，包括server收到的每个HTTP请求信息，这份文件会被自动轮转和压缩。

3.7 连接hive测试验证

将一开始下载的包：presto-cli-0.238.2-executable.jar，重命名为presto

复制代码

1
2
3
mv presto-cli-0.238.2-executable.jar presto

chmod +x presto

使用beeline连接hive，show tables；

再进入presto cli，使用命令：

复制代码

1
2
3
./presto --server http://yy-t-bigdata1.niwodai.com:9666 --catalog hive --schema default

show tables;

比较与hive查出来的是否一致。

四、集群配置

1.架构

192.168.0.141:coordinator调度节点
192.168.0.142:worker节点
192.168.0.143:worker节点
192.168.0.144:worker节点
192.168.0.145:worker节点
192.168.0.146:worker节点

2.传输（SCP）

复制代码

1
2
3
4
5
6
7
8
9
scp -r /data/prosto user@remote_ip:/data

#如果data是root权限，先切换用户

sudo su
scp -r /data/prosto remote_ip:/data

#进入目标机器，切换presto权限
chown -R user:usergreoup presto

3.修改配置

3.1 修改node.properties

复制代码

1
2
3
4
5
6
vi node.properties

#修改node.id
node.environment=test_bob
node.id=bigdata_test2
node.data-dir=/data/presto/presto-server-0.238.2/data

3.2修改config.properties

复制代码

1
2
3
4
5
6
7
8
vi config.properties

#worker配置
coordinator=false
http-server.http.port=9666
query.max-memory=10GB
query.max-memory-per-node=1GB
discovery.uri=http://yy-t-bigdata1.niwodai.com:9666

4.启动

与单节点启动方式一样。

注意：先删除data下etc和plugin目录，这个是从第一个节点带过来的，不然启动会有冲突！

5.一键重启

在141机器上先配置免密：

然后写一个shell脚本，远程重启worker节点：

复制代码

1
ssh 192.168.0.142 "ps -ef | grep presto-server  | grep -v grep | grep PrestoServer | awk '{print $2}' | xargs kill -9;/data/presto/presto-server-0.238.2/bin/launcher restart" &

五、资源组

1.何为资源组

Presto的资源组机制，是从资源分配的角度来控制集群的整体查询负载。Presto会在集群整体资源下开辟多个资源组，每一个提交的查询都会分配到一个特定的资源组执行。在特定资源组A开启一个新的查询B之前，会检查当前A的资源负载是否超过了集群给A分配的资源量；如果已经超过了，资源组机制会阻塞新到的查询B，使其处于排队状态甚至直接拒绝。

资源可以分成CPU，内存，带宽，磁盘等维度，Presto资源组主要定义了内存和CPU两个维度。

2.配置

2.1 resource-groups.properties

在Presto Coordinator节点安装目录etc下新建一个文件 resource-groups.properties，然后将resource-groups.config-file 指向资源组配置文件的路径，比如：

复制代码

1
2
3
4
5
6
cd etc
vi resource-groups.properties

#输入配置
resource-groups.configuration-manager=file
resource-groups.config-file=etc/resource_groups.json

2.2 resource_groups.json

2.2.1 资源组主要配置项
name（必须）：特定资源组名称；
maxQueued（必须）：排队任务的最大数量，当达到此阈值后，新的任务将被拒绝；
hardConcurrencyLimit（必须）：任何时刻处于"RUNNING"状态的查询的最大数量；
softMemoryLimit（必须）：这个资源组最大内存使用量，当达到此阈值后，新任务进入排队。可以指定为一个绝对值（如100GB），也可以指定对集群内存总量的百分比（如60%）；
softCpuLimit（可选）：一个周期里可以使用cpu的时间，hardCpuLimit也必须指定，在达到该阈值后，该资源组内占据最大CPU资源的查询的CPU资源会被减少；
hardCpuLimit（可选）：一个周期里可以使用的cpu时间，在达到该阈值后，新的查询会进行排队而非直接执行；
schedulingPolicy（可选）：指定查询从排队到运行状态的调度策略。
【这里解决用户问题2，参照不同的调度策略，可能会有不同的资源分配顺序】
主要有以下类型：
fair（default）：当一个资源组下，有几个子资源组都同时有排队的查询，这些子资源组间按照定义的顺序，轮流获得资源，同一个子资源组的查询按照先来先执行的规则获取资源；
weighted_fair ：采取这种策略的每一个资源组会配置一个属性schedulingWeight，每个子资源组会计算一个比值：
当前子资源组查询数量/schedulingWeight，比值越小的子资源组越先得到资源；
weighted：默认值为1，子资源组的schedulingWeight越大，越先得到资源；
query_priority：所有的子资源组都要配置为 query_priority ，排队的查询严格按照指定的query_priority大小顺序来进行获取资源。

2.2.2 资源组选择器
user（可选）：匹配用户名；
source（可选）：匹配连接源，如cli、jdbc、pyhive等；
queryType（可选）：匹配任务类型；
clientTags（可选）：tag列表，每个tag必须在用户提交任务的tag列表里；
group（必须）：这些任务运行的组。
【这里解决用户问题3，可以对不同的查询类型queryType，比如EXPLAIN、INSERT、SELECT和DATA_DEFINITION等类型，匹配到不同的资源组，分配不同的资源，来执行查询】

具体配置：

复制代码

vi resource_groups.json

{
  "rootGroups": [
    {
      "name": "global",
      "softMemoryLimit": "100%",
      "hardConcurrencyLimit": 15,
      "maxQueued": 100,
      "schedulingPolicy": "weighted",
      "subGroups": [
        {
          "name": "operation",
          "softMemoryLimit": "30%",
          "softCpuLimit": "10h",
          "hardCpuLimit": "10h",
          "hardConcurrencyLimit": 8,
          "maxQueued": 20,
          "schedulingWeight": 8,
          "runningTimeLimit": "30m",
          "queuedTimeLimit" : "10m"
        }
        {
          "name": "default",
          "softMemoryLimit": "30%",
          "softCpuLimit": "10h",
          "hardCpuLimit": "10h",
          "hardConcurrencyLimit": 15,
          "maxQueued": 20,
          "schedulingWeight": 2,
          "runningTimeLimit": "30m",
          "queuedTimeLimit" : "10m"
        }
      ]
    },
    {
      "name": "admin",
      "softMemoryLimit": "100%",
      "softCpuLimit": "8h",
      "hardCpuLimit": "8h",
      "runningTimeLimit": "60m",
      "hardConcurrencyLimit": 15,
      "maxQueued": 20,
      "schedulingPolicy": "fair"
     }
  ],
  "selectors": [
    {
      "source": "operation",
      "group": "global.operation"
    },
    {
      "source": "default",
      "group": "global.default"
    },
    {
      "user": "presto",
      "group": "global.default"
    }
   ],
  "cpuQuotaPeriod": "1m"
}

vi queue_config.json

{
  "queues": {
    "user.${USER}": {
      "maxConcurrent": 25,
      "maxQueued": 25
    },
    "datamart": {
      "maxConcurrent": 25,
      "maxQueued": 50,
      "softMemoryLimit": "50%",
      "softCpuLimit":"3h",
      "hardCpuLimit": "2h"
    },
    "admin": {
      "maxConcurrent": 10,
      "maxQueued": 35
    },
    "global": {
      "maxConcurrent": 3,
      "maxQueued": 5,
      "softMemoryLimit": "10%",
      "softCpuLimit":"45m",
      "hardCpuLimit": "35m"
    }
  },
  "rules": [
    {
      "user": "presto",
      "queues": ["admin"]
    },
    {
      "user": "datamart",
      "queues": ["datamart"]
    },
    {
       "user": "zhang",
       "queues" : ["global"]
    },
    {
        "user": "lili",
        "queues" : ["global"]
    }
  ]
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
vi resource_groups.json

{
  "rootGroups": [
    {
      "name": "global",
      "softMemoryLimit": "100%",
      "hardConcurrencyLimit": 15,
      "maxQueued": 100,
      "schedulingPolicy": "weighted",
      "subGroups": [
        {
          "name": "operation",
          "softMemoryLimit": "30%",
          "softCpuLimit": "10h",
          "hardCpuLimit": "10h",
          "hardConcurrencyLimit": 8,
          "maxQueued": 20,
          "schedulingWeight": 8,
          "runningTimeLimit": "30m",
          "queuedTimeLimit" : "10m"
        }
        {
          "name": "default",
          "softMemoryLimit": "30%",
          "softCpuLimit": "10h",
          "hardCpuLimit": "10h",
          "hardConcurrencyLimit": 15,
          "maxQueued": 20,
          "schedulingWeight": 2,
          "runningTimeLimit": "30m",
          "queuedTimeLimit" : "10m"
        }
      ]
    },
    {
      "name": "admin",
      "softMemoryLimit": "100%",
      "softCpuLimit": "8h",
      "hardCpuLimit": "8h",
      "runningTimeLimit": "60m",
      "hardConcurrencyLimit": 15,
      "maxQueued": 20,
      "schedulingPolicy": "fair"
     }
  ],
  "selectors": [
    {
      "source": "operation",
      "group": "global.operation"
    },
    {
      "source": "default",
      "group": "global.default"
    },
    {
      "user": "presto",
      "group": "global.default"
    }
   ],
  "cpuQuotaPeriod": "1m"
}



vi queue_config.json

{
  "queues": {
    "user.${USER}": {
      "maxConcurrent": 25,
      "maxQueued": 25
    },
    "datamart": {
      "maxConcurrent": 25,
      "maxQueued": 50,
      "softMemoryLimit": "50%",
      "softCpuLimit":"3h",
      "hardCpuLimit": "2h"
    },
    "admin": {
      "maxConcurrent": 10,
      "maxQueued": 35
    },
    "global": {
      "maxConcurrent": 3,
      "maxQueued": 5,
      "softMemoryLimit": "10%",
      "softCpuLimit":"45m",
      "hardCpuLimit": "35m"
    }
  },
  "rules": [
    {
      "user": "presto",
      "queues": ["admin"]
    },
    {
      "user": "datamart",
      "queues": ["datamart"]
    },
    {
       "user": "zhang",
       "queues" : ["global"]
    },
    {
        "user": "lili",
        "queues" : ["global"]
    }
  ]
}

六、参考

https://blog.csdn.net/zzq900503/article/details/79403949

https://zhuanlan.zhihu.com/p/99125164

最后

以上就是清新抽屉最近收集整理的关于Presto安装部署详细说明的全部内容，更多相关Presto安装部署详细说明内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：# presto
浏览次数：89 次浏览
发布日期：2023-08-30 20:25:37
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzo_14_f2_12__7_c4.html

Presto安装部署详细说明

一、官网教程

二、环境准备

三、单节点部署

1.presto下载

2.上传解压

3.配置Presto

3.1 Node Properties

3.2 JVM Config

3.3 Config Properties

3.4 Log Properties

3.5 Catalog Properties

3.6 启动运行presto

3.7 连接hive测试验证

四、集群配置

1.架构

2.传输（SCP）

3.修改配置

4.启动

5.一键重启

五、资源组

1.何为资源组

2.配置

六、参考

最后

评论列表共有 0 条评论

发表评论取消回复

Presto安装部署详细说明

一、官网教程

二、环境准备

三、单节点部署

1.presto下载

2.上传解压

3.配置Presto

3.1 Node Properties

3.2 JVM Config

3.3 Config Properties

3.4 Log Properties

3.5 Catalog Properties

3.6 启动运行presto

3.7 连接hive测试验证

四、集群配置

1.架构

2.传输（SCP）

3.修改配置

4.启动

5.一键重启

五、资源组

1.何为资源组

2.配置

六、参考

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

微信扫一扫：分享

发表评论取消回复