SMON: Parallel transaction recovery tried 引发的问题

259 阅读 0 评论 171 点赞

我是靠谱客的博主迷你牛排，这篇文章主要介绍SMON: Parallel transaction recovery tried 引发的问题，现在分享给大家，希望可以做个参考。

SMON: Parallel transaction recovery tried 这个一般是在具有在跑大数据量的 transaction的时候 kill 掉了进程而导致 smon 去清理回滚段时导致的。

这个在业务高峰期的时候，如果发现这个，有可能导致 SMON 占用了 100% cpu 而导致系统 hang 在那边。

即使你 shutdown immediate ，Oracle 也会等待 smon 清理完毕才能关机，而这个等待过程也许是漫长的。

如果你 shutdown abort，那么oracle 会麻烦 shutdown ，但是，当你 startup的时候，有可能就会很慢，因为 smon 会接着清理 undo，这个等待过程也许是很漫长的：

——————————————————————————————————
Completed: ALTER DATABASE MOUNT
Thu Aug 26 22:43:57 2010
ALTER DATABASE OPEN
Thu Aug 26 22:43:57 2010
Beginning crash recovery of 1 threads
Thu Aug 26 22:43:57 2010
Started first pass scan
Thu Aug 26 22:43:57 2010
Completed first pass scan
402218 redo blocks read, 126103 data blocks need recovery
Thu Aug 26 22:45:05 2010
Restarting dead background process QMN0
QMN0 started with pid=16
Thu Aug 26 22:45:19 2010
Started recovery at
Thread 1: logseq 13392, block 381202, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 3 Seq 13392 Reading mem 0
Mem# 0 errs 0: /zxindata/oracle/redolog/redo03.dbf
Recovery of Online Redo Log: Thread 1 Group 1 Seq 13393 Reading mem 0
Mem# 0 errs 0: /zxindata/oracle/redolog/redo01.dbf
Thu Aug 26 22:45:21 2010
Completed redo application
Thu Aug 26 22:48:35 2010
Ended recovery at
Thread 1: logseq 13393, block 271434, scn 2623.1377219707
126103 data blocks read, 115641 data blocks written, 402218 redo blocks read
Crash recovery completed successfully
________________________________________________
看红色标注的那个，等待了 3 分钟才做完 recovery。

metalink（238507.1）：
---------------------------------------------------------------------------------------------

1. Find SMON's Oracle PID:

Example:

SQL> select pid, program from v$process where program like '%SMON%';

PID PROGRAM
---------- ------------------------------------------------
6 oracle@stsun7 (SMON)

2. Disable SMON transaction cleanup:

SVRMGR> oradebug setorapid <SMON's Oracle PID>
SVRMGR> oradebug event 10513 trace name context forever, level 2

3. Kill the PQ slaves that are doing parallel transaction recovery.
You can check V$FAST_START_SERVERS to find these.

4. Turn off fast_start_parallel_rollback:

alter system set fast_start_parallel_rollback=false;

If SMON is recovering, this command might hang, if it does just control-C out of it. You may need to try this many times to get this to complete (between SMON cycles).

5. Re-enable SMON txn recovery:

SVRMGR> oradebug setorapid <SMON's Oracle PID>
SVRMGR> oradebug event 10513 trace name context off

——————————————————————————————————
以上的思路主要是要把 SMON 并行 recovery 的功能给改成非并行，主要是 fast_start_parallel_rollback 这个参数的作用。 There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq slaves are interfering with each other. This depends mainly on the type of changes that need to be made during rollback and usually may happen when rolling back INDEX Updates in parallel.

我们也可以通过 V$FAST_START_TRANSACTIONS的UNDOBLOCKSTOTAL 来查看需要 recover 的undo 的量