clickhouse简介和性能测试clickhouse特点：性能测试数据准备

83 阅读 0 评论 55 点赞

我是靠谱客的博主俊秀果汁，最近开发中收集的这篇文章主要介绍clickhouse简介和性能测试clickhouse特点：性能测试数据准备，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

clickhouse使用文档：相对简洁的介绍了clickhouse的使用，对各种引擎做了简单介绍

clickhouse官网：clickhouse的权威官方网站

准备：在分布式clickhouse集群搭建好之后，进行建库建表，并导入数据并做性能测试。

clickhouse特点：

数据通过小批量Batch存储
支持高强度的写操作（数千行写入/每秒）
读数据量非常小
读数据操作中Primary Key 的数量有限（<1百万）
每一行的数据量很小

优点：

多个服务器上的分布式处理：分布式查询：从分布式表查询-> 重写 ->负载均衡,发送给远程节点查询->接收结果、合并
非常快速的扫描，可用于实时查询
列存储非常适合使用“宽”/“非规范化”表（许多列）：计算类查询时，大大减少IO消耗
压缩性好：相对mysql压缩10倍
SQL支持（有限制）
良好的功能集，包括支持近似计算
不同的表引擎：MergeTree，ReplicatedMergeTree，Distributed等
非常适合结构日志/事件数据以及时间序列数据（引擎MergeTree需要日期字段）
索引支持（仅限主键，不是所有存储引擎）
漂亮的命令行界面，具有用户友好的进度条和格式

缺点：

没有真正的删除/更新支持，也没有事务（与Spark和大多数大数据系统相同），没有delete/update
没有二级密钥（与Spark和大多数大数据系统相同）
只支持自己的协议（没有MySQL协议支持）
有限的SQL支持，以及连接实现是不同的。如果要从MySQL或Spark迁移，则可能必须使用连接重新编写所有查询。
没有窗口功能

ClickHouse vs. Spark

Size / compression	Spark v. 2.0.2	ClickHouse
Data storage format	Parquet, compressed: snappy	Internal storage, compressed
Size (uncompressed: 1.2TB)	395G	212G

Test	Spark v. 2.0.2	ClickHouse	Diff
Query 1: count (warm)	7.37 sec (no disk IO)	6.61 sec	~same
Query 2: simple group (warm)	792.55 sec (no disk IO)	37.45 sec	21x better
Query 3: complex group by	2522.9 sec	398.55 sec	6.3x better

Clickhouse

clickhouse 引擎：clickhouse较核心的模块

4.1. TinyLog
4.2. Log
4.3. Memory
4.4. Merge
4.5. Distributed
4.6. Null
4.7. Buffer
4.8. Set
4.9. Join
4.10. MergeTree
4.11. ReplacingMergeTree
4.12. SummingMergeTree
4.13. AggregatingMergeTree
4.14. CollapsingMergeTree

性能测试数据准备

环境：

clickhouse 集群：节点为192.168.1.1 192.168.1.2 192.168.1.3 三个节点的 clickhouse集群。

创建数据库：

本文使用default默认的数据库

建表：

在三台机器上进入clickhouse-client 并分别执行以下创建表的语句

CREATE TABLE test_table  (city String,model String,module String,duration UInt32,network String,lib_version String,platform String,country String,osversion String,region String,appkey String,version String,channelid String,deviceid UInt32,time DateTime,action Stringdate Date)ENGINE = MergeTree(date,(date), 8192);

建分布式表：

分布表（Distributed）本身不存储数据，相当于路由，需要指定集群名、数据库名、数据表名、分片KEY，这里分片用rand()函数，表示随机分片。查询分布表，会根据集群配置信息，路由到具体的数据表，再把结果进行合并。

CREATE TABLE test_table_all  AS test_table ENGINE = Distributed(perftest_3shards_1replicas, default, test_table, rand());

数据导入：

将准备好的json数据导入action_data中

cat test.log | clickhouse-client --query="INSERT INTO default.test_table FORMAT JSONEachRow"

导入之后查询

select * from test_table_all limit 1000;

select count(*) from test_table_all ;

三个节点上均可查询，已经备份。

性能测试

集群机器配置

集群的三个节点均为虚机，配置如下

服务器	配置
1	Linux apm01 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux cpu cores : 4 MemTotal: 16266780 kB
2	Linux apm02 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux cpu cores : 4 MemTotal: 16266764 kB
3	Linux clickhouse02 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux cpu cores : 4 MemTotal: 16266764 kB

服务器

配置

Linux apm01 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

cpu cores : 4

MemTotal: 16266780 kB

Linux apm02 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

cpu cores : 4

MemTotal: 16266764 kB

Linux clickhouse02 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

cpu cores : 4

MemTotal: 16266764 kB

数据条数：

243597660

查询性能时长：

查询结果和查询语句	数据大小	处理的行数
耗时：6ms 执行（select count(*) as GROUP_BY_MODULE, module from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and time >= '2018-09-03 00:00:00' and time <'2018-10-04 00:00:00' group by module）	14.51 GB	243.60 million rows
耗时：8ms 执行（ select round(avg(duration)) as ROUND_AVG from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and time >= '2018-09-03 00:00:00' and time <'2018-10-04 00:00:00'）	11.94 GB	243.60 million rows
耗时：1375ms 执行（select count(*) as ACTIVE_USER_OF_DAY from (select distinct deviceid from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and time >= '2018-09-03 00:00:00' and time <'2018-09-04 00:00:00');）	11.94 GB	243.60 million rows
耗时：1609ms 执行（select count(1) as ALL_USER_COUNT from (select distinct deviceid from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6')）	10.96 GB	243.60 million rows
耗时：2117ms 执行（select count(*) as LAUNCH_COUNT from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and action = '$launch' and time >= '2018-09-03 00:00:00' and time <'2018-09-04 00:00:00';）	15.17 GB	243.60 million rows
耗时：2883ms 执行（select count(1) as ALL_USER_COUNT_Nanjing from (select distinct deviceid from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and country = '中国' and region = '江苏' and city = '南京')）	21.98 GB	243.60 million rows
耗时：2918ms 执行（select count(*) as LOGIN_COUNT from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and action like 'login%' and time >= '2018-09-03 00:00:00' and time <'2018-09-04 00:00:00'）	15.17 GB	243.60 million rows
耗时：3924ms 执行（select count(*) as GROUP_BY_VERSION_CHANNELID_HAVING, version ,channelid from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and action = '$launch' and platform = 'android' and time >= '2018-09-03 00:00:00' and time <'2018-09-04 00:00:00' group by version ,channelid having GROUP_BY_VERSION_CHANNELID_HAVING > 50 and channelid like '1000%'）	25.50 GB	243.60 million rows
耗时：3932ms 执行（select count(*) as GROUP_BY_VERSION_CHANNELID , version ,channelid from test_table_all where appkey = '33645ea6eda746b3be4cfc5fa23ec6a6' and action = '$launch' and platform = 'android' and time >= '2018-09-03 00:00:00' and time <'2018-09-04 00:00:00' group by version ,channelid）	25.50 GB	243.60 million rows

结论：

在count 方面，速度很快，消耗内存较大
在group by 方面，速度很快，消耗内存很大

实时处理方面，处理的数据量远大于 mysql，但是count 和 group by 消耗内存较多，并发较高的情况下，业务服务器难以承受；group by 查询时性能还是不能达到妙级响应的要求；并发较小，100 Queries / second；所以不适合做业务型高并发实时查询
在批处理方面，由于其列式存储的设计，减小IO的消耗等原因计算性能远超 spark，hive ；100 Queries / second远高于spark几十个并发任务的数量；所以可以替代hadoop集群作为批处理离线框架

提供一个性能测试文档供参考

最后

以上就是俊秀果汁为你收集整理的clickhouse简介和性能测试clickhouse特点：性能测试数据准备的全部内容，希望文章能够帮你解决clickhouse简介和性能测试clickhouse特点：性能测试数据准备所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：clickhouse
浏览次数：83 次浏览
发布日期：2024-01-16 03:46:13
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_ogf4_14_z_10_w.html

clickhouse简介和性能测试clickhouse特点：性能测试数据准备

概述

clickhouse特点：

优点：

缺点：

ClickHouse vs. Spark

clickhouse 引擎：clickhouse较核心的模块

性能测试数据准备

环境：

创建数据库：

建表：

建分布式表：