概述
标签
PostgreSQL , 审计日志 , duration , 性能影响 , syslogger , BUFFER , log_statement , log_duration , track_io_timing , osq_lock
背景
开启审计日志,或者开启数据库的SQL耗时记录,会给数据库带来多大的性能开销?
1、审计日志通过log_statement设置,可以设置为all, ddl, mod, none,分别表示审计所有SQL,DDL SQL,DML SQL,不审计。
开启审计时,在执行SQL的时候(提交SQL请求时),数据库日志中打印类似这样的信息:
2018-06-16 14:48:23.760 CST,"postgres","postgres",19448,"[local]",5b24b270.4bf8,8,"idle",2018-06-16 14:47:12 CST,3/1484181,0,LOG,00000,"statement: select count(*) from pg_class;",,,,,,,,"exec_simple_query, postgres.c:940","psql"
2、SQL耗时记录,通过设置log_duration参数来指定,设置为ON 或OFF。
如果打开时间记录,那么在SQL执行结束时,打印SQL的耗时。(不包括将结果传输到客户端的时间)
2018-06-16 14:48:23.760 CST,"postgres","postgres",19448,"[local]",5b24b270.4bf8,9,"SELECT",2018-06-16 14:47:12 CST,3/0,0,LOG,00000,"duration: 0.609 ms",,,,,,,,"exec_simple_query, postgres.c:1170","psql"
开启这两个开关,有多大的性能开销?
1、开启审计,实际上性能开销较小,因为日志是异步管道输出,而且目前的BUFFER已经调得比较大,性能损耗小。
但是,请注意,如果并发非常高,大量写日志可能引发mutex lock冲突,导致性能骤降。PostgreSQL syslogger相关代码有改进优化的地方。
src/backend/postmaster/syslogger.c
/*
* Buffers for saving partial messages from different backends.
*
* Keep NBUFFER_LISTS lists of these, with the entry for a given source pid
* being in the list numbered (pid % NBUFFER_LISTS), so as to cut down on
* the number of entries we have to examine for any one incoming message.
* There must never be more than one entry for the same source pid.
*
* An inactive buffer is not removed from its list, just held for re-use.
* An inactive buffer has pid == 0 and undefined contents of data.
*/
typedef struct
{
int32
pid;
/* PID of source process */
StringInfoData data;
/* accumulated data, as a StringInfo */
} save_buffer;
#define NBUFFER_LISTS 256
static List *buffer_lists[NBUFFER_LISTS];
2、开启时间统计,这个开销主要取决于读取时钟的额外开销。
通过pg_test_timing可以测试获取系统时间带来的额外开销。
数据库中如果要统计IO TIMING(track_io_timing = on),以及auto_explain中的log timing,开销也与之类似。
pg_test_timing -d 10
Testing timing overhead for 10 seconds.
Per loop time including overhead: 42.38 ns
Histogram of timing durations:
< us
% of total
count
1
95.76853
225960491
2
4.22695
9973261
4
0.00430
10145
8
0.00005
109
16
0.00016
370
32
0.00001
32
64
0.00000
3
实测
测试环境为ECS主机,测出了一个虚拟机的性能缺陷。
1、创建测试数据
pgbench -i -s 300
2、测试小事务
pgbench -M prepared -v -r -P 1 -c 56 -j 56 -T 120 -S
3、关闭log_statement与log_duration
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 25058528
latency average = 0.268 ms
latency stddev = 0.217 ms
tps = 208818.253316 (including connections establishing)
tps = 208834.342644 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.025
set aid random(1, 100000 * :scale)
0.252
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
4、开启log_statement=all
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 23088368
latency average = 0.291 ms
latency stddev = 1.445 ms
tps = 192401.524926 (including connections establishing)
tps = 192413.921654 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.018
set aid random(1, 100000 * :scale)
0.282
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
5、关闭log_statement,开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 12842222
latency average = 0.523 ms
latency stddev = 1.516 ms
tps = 107017.184761 (including connections establishing)
tps = 107025.759343 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.019
set aid random(1, 100000 * :scale)
0.520
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
6、开启log_statement=all,同时开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 7484093
latency average = 0.898 ms
latency stddev = 1.386 ms
tps = 62366.260357 (including connections establishing)
tps = 62370.603530 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.016
set aid random(1, 100000 * :scale)
0.897
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
7、开启log_statement=all,同时开启log_duration=on,同时开启track_io_timing = on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 7161693
latency average = 0.938 ms
latency stddev = 1.150 ms
tps = 59679.955252 (including connections establishing)
tps = 59684.540717 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.015
set aid random(1, 100000 * :scale)
0.938
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
诊断
测试结果与实际的时钟开销不相符,因为时钟开销实际上只有几十纳秒,但是测试结果很显然不止这么多。
profiling
Samples: 1M of event 'cpu-clock', Event count (approx.): 234709749074
Overhead
Shared Object
Symbol
41.15%
[kernel]
[k] osq_lock
19.00%
libc-2.17.so
[.] __mcount_internal
10.37%
[kernel]
[k] _raw_spin_unlock_irqrestore
perf
主要是锁冲突引起的性能下降。
# Children
Self
Command
Shared Object
Symbol
# ........
........
..............
.....................
...............................................
#
55.47%
0.00%
postgres
postgres
[.] PostmasterMain
|
---PostmasterMain
|
|--54.42%--ServerLoop
|
|
|
--54.20%--PostgresMain
|
|
|
|--40.68%--errfinish
|
|
|
|
|
--40.56%--EmitErrorReport
|
|
|
|
|
|--38.51%--__write_nocancel
|
|
|
|
|
|
|
--38.44%--system_call_fastpath
|
|
|
|
|
|
|
--38.43%--sys_write
|
|
|
|
|
|
|
--38.39%--vfs_write
|
|
|
|
|
|
|
--38.32%--do_sync_write
|
|
|
|
|
|
|
--38.27%--pipe_write
|
|
|
|
|
|
|
|--21.55%--pipe_wait
|
|
|
|
|
|
|
|
|
|--20.07%--mutex_lock
|
|
|
|
|
|
|
|
|
|
|
--20.02%--__mutex_lock_slowpath
|
|
|
|
|
|
|
|
|
|
|
|--17.89%--osq_lock
|
|
|
|
|
|
|
|
|
|
|
|--1.44%--apic_timer_interrupt
|
|
|
|
|
|
smp_apic_timer_interrupt
|
|
|
|
|
|
irq_exit
|
|
|
|
|
|
do_softirq
|
|
|
|
|
|
call_softirq
|
|
|
|
|
|
|
|
|
|
|
|
|
--1.25%--__do_softirq
|
|
|
|
|
|
|
|
|
|
|
--0.61%--mutex_spin_on_owner
|
|
|
|
|
|
|
|
|
--0.94%--prepare_to_wait
|
|
|
|
|
|
|
|
|
--0.91%--_raw_spin_unlock_irqrestore
|
|
|
|
|
|
|
|--11.44%--mutex_lock
|
|
|
|
|
|
|
|
|
--11.35%--__mutex_lock_slowpath
|
|
|
|
|
|
|
|
|
|--10.11%--osq_lock
|
|
|
|
|
|
|
|
|
--0.69%--apic_timer_interrupt
|
|
|
|
smp_apic_timer_interrupt
|
|
|
|
irq_exit
|
|
|
|
do_softirq
|
|
|
|
call_softirq
|
|
|
|
|
|
|
|
|
--0.60%--__do_softirq
|
|
|
|
|
|
|
--4.73%--__wake_up_sync_key
|
|
|
|
|
|
|
--4.68%--_raw_spin_unlock_irqrestore
|
|
|
|
|
|--0.72%--log_line_prefix.isra.1
|
|
|
|
|
--0.52%--_mcount
|
|
|
|--4.05%--PortalRun
|
|
|
|
|
--4.01%--PortalRunSelect
|
|
|
|
|
--3.91%--standard_ExecutorRun
|
|
|
|
|
--3.27%--ExecScan
|
|
|
|
|
--3.18%--IndexNext
|
|
|
|
|
--2.95%--index_getnext
|
|
|
|
|
|--2.19%--index_getnext_tid
|
|
|
|
|
|
|
--2.16%--btgettuple
|
|
|
|
|
|
|
--2.08%--_bt_first
|
|
|
|
|
|
|
--1.65%--_bt_search
|
|
|
|
|
|
|
--0.80%--_bt_relandgetbuf
|
|
|
|
|
--0.73%--index_fetch_heap
|
|
|
|--2.14%--PortalStart
|
|
|
|
|
--1.90%--standard_ExecutorStart
|
|
|
|
|
--1.64%--ExecInitNode
|
|
|
|
|
--1.59%--ExecInitIndexScan
|
|
|
|--1.65%--finish_xact_command.part.4
|
|
|
|
|
--1.64%--CommitTransactionCommand
|
|
|
|
|
--1.49%--CommitTransaction
|
|
|
|
|
--0.72%--PreCommit_Portals
|
|
|
|
|
--0.63%--PortalDrop
|
|
|
|
|
--0.53%--PortalCleanup
|
|
|
|--0.96%--pq_getbyte
|
|
|
|
|
--0.95%--pq_recvbuf
|
|
|
|
|
--0.62%--secure_read
|
|
|
--0.90%--ReadyForQuery
|
|
|
--0.83%--socket_flush
|
|
|
--0.82%--internal_flush
|
|
|
--0.78%--__libc_send
|
|
|
--0.76%--system_call_fastpath
|
|
|
--0.75%--sys_sendto
|
|
|
--0.75%--SYSC_sendto
|
|
|
--0.73%--sock_sendmsg
|
|
|
--0.69%--unix_stream_sendmsg
|
|
|
--0.57%--sock_def_readable
|
|
|
--0.57%--__wake_up_sync_key
|
|
|
--0.54%--_raw_spin_unlock_irqrestore
|
--1.04%--SysLogger_Start
|
--0.84%--__read_nocancel
|
--0.84%--system_call_fastpath
|
--0.84%--sys_read
|
--0.84%--vfs_read
|
--0.84%--do_sync_read
|
--0.82%--pipe_read
为什么只开log_statement,没有触发这个锁冲突呢?实际上只是并发没有够,或者调用elog的频率还不够。(开启log_duration后,调用elog翻倍,所以同样并发更容易触发这个锁冲突)
1、将并发调低,锁冲突问题减少,性能损耗相对来说就很小。
Samples: 672K of event 'cpu-clock', Event count (approx.): 146760374621
Overhead
Shared Object
Symbol
29.73%
libc-2.17.so
[.] __mcount_internal
6.37%
[kernel]
[k] _raw_spin_unlock_irqrestore
5.93%
libc-2.17.so
[.] _mcount
4.18%
libc-2.17.so
[.] vfprintf
2.96%
[kernel]
[k] osq_lock
2.92%
[kernel]
[k] __do_softirq
1.80%
[kernel]
[k] finish_task_switch
1.34%
postgres
[.] hash_search_with_hash_value
1.20%
libc-2.17.so
[.] _IO_default_xsputn
1、关闭log_statement与log_duration
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 17770447
latency average = 0.189 ms
latency stddev = 0.077 ms
tps = 148086.294302 (including connections establishing)
tps = 148093.564454 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.013
set aid random(1, 100000 * :scale)
0.176
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
2、开启log_statement=all
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 17236212
latency average = 0.195 ms
latency stddev = 0.321 ms
tps = 143634.319683 (including connections establishing)
tps = 143642.034125 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.010
set aid random(1, 100000 * :scale)
0.185
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
3、关闭log_statement,开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 16774800
latency average = 0.200 ms
latency stddev = 0.977 ms
tps = 139789.393297 (including connections establishing)
tps = 139795.661393 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.008
set aid random(1, 100000 * :scale)
0.193
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
4、开启log_statement=all,同时开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 14696887
latency average = 0.229 ms
latency stddev = 0.277 ms
tps = 122473.319859 (including connections establishing)
tps = 122479.381454 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.007
set aid random(1, 100000 * :scale)
0.224
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
5、开启log_statement=all,同时开启log_duration=on,同时开启track_io_timing = on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 14606445
latency average = 0.230 ms
latency stddev = 0.726 ms
tps = 121719.147924 (including connections establishing)
tps = 121725.278733 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.007
set aid random(1, 100000 * :scale)
0.226
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
2、使用高并发,即使仅开启log_statement也会触发锁问题。
只开log_statement,调用elog是同时开启log_statement, log_duraiton的一半,所以仅开启log_statement时把并发开到一倍(112),就可以观测到问题。
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 112
number of threads: 112
duration: 120 s
number of transactions actually processed: 11832722
latency average = 1.135 ms
latency stddev = 2.437 ms
tps = 98604.216501 (including connections establishing)
tps = 98621.113426 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.021
set aid random(1, 100000 * :scale)
1.149
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
perf 如下
Samples: 557K of event 'cpu-clock', Event count (approx.): 94808854918
Children
Self
Shared Object
Symbol
+
67.35%
0.22%
postgres
[.] PostgresMain
+
45.50%
0.01%
[kernel]
[k] system_call_fastpath
+
41.27%
0.02%
postgres
[.] errfinish
+
41.14%
0.27%
postgres
[.] EmitErrorReport
+
39.62%
0.01%
[kernel]
[k] sys_write
+
39.56%
0.03%
[kernel]
[k] vfs_write
+
39.48%
0.02%
[kernel]
[k] do_sync_write
+
39.31%
0.01%
libpthread-2.17.so
[.] __write_nocancel
+
39.06%
0.08%
[kernel]
[k] pipe_write
+
35.00%
0.09%
[kernel]
[k] mutex_lock
+
34.88%
0.12%
[kernel]
[k] __mutex_lock_slowpath
+
33.71%
0.00%
libc-2.17.so
[.] __libc_start_main
+
33.61%
33.61%
[kernel]
[k] osq_lock
+
32.61%
0.00%
postgres
[.] PostmasterMain
+
32.61%
0.00%
postgres
[.] main
+
32.33%
0.00%
postgres
[.] ServerLoop
+
31.63%
2.75%
libc-2.17.so
[.] _mcount
+
29.31%
29.14%
libc-2.17.so
[.] __mcount_internal
+
27.19%
0.02%
[kernel]
[k] pipe_wait
+
21.76%
0.27%
pgbench
[.] threadRun
+
11.30%
0.00%
libpthread-2.17.so
[.] start_thread
syslogger相关代码
src/backend/tcop/postgres.c
934
/* Log immediately if dictated by log_statement */
935
if (check_log_statement(parsetree_list))
936
{
937
ereport(LOG,
938
(errmsg("statement: %s", query_string),
939
errhidestmt(true),
940
errdetail_execute(parsetree_list)));
941
was_logged = true;
942
}
1162
/*
1163
* Emit duration logging if appropriate.
1164
*/
1165
switch (check_log_duration(msec_str, was_logged))
1166
{
1167
case 1:
1168
ereport(LOG,
1169
(errmsg("duration: %s ms", msec_str),
1170
errhidestmt(true)));
1171
break;
1172
case 2:
1173
ereport(LOG,
1174
(errmsg("duration: %s ms
statement: %s",
1175
msec_str, query_string),
1176
errhidestmt(true),
1177
errdetail_execute(parsetree_list)));
1178
break;
1179
}
1946
/* Log immediately if dictated by log_statement */
1947
if (check_log_statement(portal->stmts))
1948
{
1949
ereport(LOG,
1950
(errmsg("%s %s%s%s: %s",
1951
execute_is_fetch ?
1952
_("execute fetch from") :
1953
_("execute"),
1954
prepStmtName,
1955
*portal_name ? "/" : "",
1956
*portal_name ? portal_name : "",
1957
sourceText),
1958
errhidestmt(true),
1959
errdetail_params(portalParams)));
1960
was_logged = true;
1961
}
src/backend/utils/error/elog.c
/*
* Constructs the error message, depending on the Errordata it gets, in a CSV
* format which is described in doc/src/sgml/config.sgml.
*/
static void
write_csvlog(ErrorData *edata)
{
............
/* If in the syslogger process, try to write messages direct to file */
if (am_syslogger)
write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
else
write_pipe_chunks(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
pfree(buf.data);
}
/*
* Send data to the syslogger using the chunked protocol
*
* Note: when there are multiple backends writing into the syslogger pipe,
* it's critical that each write go into the pipe indivisibly, and not
* get interleaved with data from other processes.
Fortunately, the POSIX
* spec requires that writes to pipes be atomic so long as they are not
* more than PIPE_BUF bytes long.
So we divide long messages into chunks
* that are no more than that length, and send one chunk per write() call.
* The collector process knows how to reassemble the chunks.
*
* Because of the atomic write requirement, there are only two possible
* results from write() here: -1 for failure, or the requested number of
* bytes.
There is not really anything we can do about a failure; retry would
* probably be an infinite loop, and we can't even report the error usefully.
* (There is noplace else we could send it!)
So we might as well just ignore
* the result from write().
However, on some platforms you get a compiler
* warning from ignoring write()'s result, so do a little dance with casting
* rc to void to shut up the compiler.
*/
static void
write_pipe_chunks(char *data, int len, int dest)
{
PipeProtoChunk p;
int
fd = fileno(stderr);
int
rc;
Assert(len > 0);
p.proto.nuls[0] = p.proto.nuls[1] = '