Oracle ASH空间过小导致的故障问题隐含参数_ash_size
今天现场数据库又说出了问题,表现就是hang住,无法提供服务。打开alert日志如下:
Fri Aug 17 10:00:18 2018
Archived Log entry 15155 added for thread 1 sequence 16322 ID 0x7f5f5e6d dest 1:
Fri Aug 17 10:07:53 2018
Thread 1 advanced to log sequence 16324 (LGWR switch)
Current log# 2 seq# 16324 mem# 0: /data/cc/cc_redo2_1
Current log# 2 seq# 16324 mem# 1: /data/cc/cc_redo2_2
Fri Aug 17 10:08:03 2018
Archived Log entry 15156 added for thread 1 sequence 16323 ID 0x7f5f5e6d dest 1:
Fri Aug 17 10:14:30 2018
Thread 1 advanced to log sequence 16325 (LGWR switch)
Current log# 3 seq# 16325 mem# 0: /data/cc/cc_redo3_1
Current log# 3 seq# 16325 mem# 1: /data/cc/cc_redo3_2
Fri Aug 17 10:14:34 2018
Archived Log entry 15157 added for thread 1 sequence 16324 ID 0x7f5f5e6d dest 1:
Fri Aug 17 10:24:55 2018
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 33554432 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
select total_size,awr_flush_emergency_count from v$ash_info;
Fri Aug 17 10:30:38 2018
WARNING: aiowait timed out 1 times
Fri Aug 17 10:37:09 2018
Shutting down instance (abort)
License high water mark = 1891
Fri Aug 17 10:37:09 2018
USER (ospid: 24501): terminating the instance
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (23063) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (1804) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (19864) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (21857) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (17236) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (2830) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting processFri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (1778) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (1936) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (4751) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (3382) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (3986) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (1658) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (2696) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting processFri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (2850) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (3580) as a result of ORA-1092
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:09 2018
opiodr aborting process unknown ospid (1514) as a result of ORA-1092
Fri Aug 17 10:37:10 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:10 2018
opiodr aborting process unknown ospid (2738) as a result of ORA-1092
Fri Aug 17 10:37:10 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:10 2018
opiodr aborting process unknown ospid (2602) as a result of ORA-1092
Fri Aug 17 10:37:10 2018
ORA-1092 : opitsk aborting process
Fri Aug 17 10:37:10 2018
opiodr aborting process unknown ospid (1634) as a result of ORA-1092
Fri Aug 17 10:37:10 2018
一堆的ora-1092然后就被这个错误带偏了。
opiodr aborting process unknown ospid (1634) as a result of ORA-1092
这种语句标示操作系统进程号为1634的进程被杀掉了,杀掉的原因是因为ORA-1092的错误。
然后就一直在找为啥会出现ora-1092,是不是资源不够用,被操作系统层面的特权用户干掉了,何种资源不够用,以及如何解决等等。
后来没找到具体原因的情况下,又回去扒拉了下alert日志,无意间发现了如下信息:
想到是不是认为的执行了shutdown abort导致的,就去问现场的人,果然,在这个时间点,他们确实执行了shutdown abort,因为当时数据库执行任何sql语句都会hang住。
然后才明白,这个ora-1092不是原因,而是shutdown abort导致的结果!
后来在重新梳理了alert日志下,又发现了下面的信息:
从告警日志情况看,应该是Oracle内部自动调节机制的作用。进入11g之后,Oracle alert log的告警提示作用愈加明显。对于一些自动诊断过程中出现的问题,都会作为提醒出现在日志中。比如swap转换,ash变化等。今天的ash emergency flush就是比较常见的一个。
从分析角度,Oracle在收集ASH过程中,频度是很高的,通常为分钟级别。如果收集之后就立即存储入数据库文件,在性能上损耗是不容易被接受的。一种方法是构建在内存共享存储中的专门buffer。定期或者确定激发条件将数据从内存中写回到数据库中。
从提示信息中看,Oracle在负载比较大的情况下,会出现ASH信息超过系统限制,进行了一次强制的紧急清空动作。Oracle建议,如果反复出现这样的情况,就建议调整_ash_size参数大小。
Oracle内部的确是存在参数_ash_size,作为隐含参数可以使用SQL进行查看。
SQL> select x.ksppinm name,
2 y.ksppstvl value,
3 y.ksppstdf isdefault,
4 decode(bitand(y.ksppstvf, 7),
5 1,
6 'MODIFIED',
7 4,
8 'SYSTEM_MOD',
9 'FALSE') ismod,
10 decode(bitand(y.ksppstvf, 2), 2, 'TRUE', 'FALSE') isadj
11 from sys.x$ksppi x, sys.x$ksppcv y
12 where x.inst_id = userenv('Instance')
13 and y.inst_id = userenv('Instance')
14 and x.indx = y.indx
15 and x.ksppinm = '_ash_size'
16 order by translate(x.ksppinm, ' _', ' ');
NAME VALUE ISDEFAULT ISMOD ISADJ
-------------------- -------------------- -------------------- -------------------- --------------------
_ash_size 536870912 TRUE SYSTEM_MOD FALSE
SQL>
此处536870912(512M)是我调整后的值,本来的值是1048618,也就是1M。
Ash size大小用于指定ash buffer(shared pool)。默认给定的是1048618 bytes,也就是1M。ASH工作采样是以Active Session为中心的。如果系统处理操作过于频繁,活跃用户会话数量很多,这样每次采样的数据量就会超过系统空闲状态。
随之而来的就是内存中ash buffer的填满,进而引发数据库强制回写数据,启动DBWR进程读写动作。DBWR在写入的时候,会占用一部分系统资源,从整体看是性能瓶颈点。
解决:
alter system set "_ash_size"=512m;
执行完后上面就变成512M了,但是:
查询是254M,这个值目前是oracle允许的最大值,如果你的_ash_size本身小于254M,那么你可以调整这个值填的很大,比如512M,不会报错,但是oracle会自动取最大254M。如果现在_ash_size已经到达最大值254M以后,你在此执行:
alter system set "_ash_size"=任意大于254M的值;
就会报值不合法。
Active Session History (ASH) Performed An Emergency Flush Messages In The Alert Log (文档 ID 1385872.1)
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.1.0.6 and later
Information in this document applies to any platform.
SYMPTOMS
The following message is dumped in the alert log:
Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 19922944 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
select total_size,awr_flush_emergency_count from v$ash_info;
CAUSE
Typically some activity on system causes more active sessions. Therefore filling the ASH buffers faster than usual causing this message to be displayed. It is not a problem per se, just indicates the buffers might need to be increased to support peak activity on the database.
SOLUTION
The current ASH size is displayed in the message in the alert log, or can be found using the following SQL statement.
select total_size from v$ash_info;
Then increase the value for _ash_size by some value, like 50% more than what is currently allocated. For example if total_size = 16MB, then an increase of 50% more would be (16MB + (16MB * 50%)) = 24MB.
sqlplus / as sysdba
alter system set "_ash_size"=25165824;
You can verify the change using the following select:
select total_size from v$ash_info;
NOTE: The max setting for "_ash_size" is 254Mb, i.e. 127 chunks of 2 Mb chunksize (however, for 12c _ash_size can be set to a value greater than 254Mb).
You can submit a higher value but internally it is adjusted to 254Mb. Once 254 has been reached, and you try to set higher size, error
ORA-2097 will be dumped:
1. alter system set "_ash_size"=100m; verified it was set to 100 MB
2. alter system set "_ash_size"=300m; verified it was set to 250 MB
3. alter system set "_ash_size"=400m;
ORA-2097: parameter cannot be modified because specified value is invalid
最后
以上就是老实大白最近收集整理的关于oracle ASH之隐含参数_ash_size过小故障问题的全部内容,更多相关oracle内容请搜索靠谱客的其他文章。
发表评论 取消回复