日期:2014-05-16  浏览次数:20526 次

SMON: Parallel transaction recovery tried 引发的问题

?SMON: Parallel transaction recovery tried 这个一般是在具有在跑大数据量的 transaction的时候kill掉了进程而导致 smon 去清理 回滚段时导致的。
这个在业务高峰期的时候,如果发现这个,有可能导致 SMON 占用了 100% cpu 而导致 系统 hang 在那边。
即使你shutdown immediate ,oracle 也会等待 smon 清理完毕才能关机,而这个等待过程也许是漫长的。
如果你 shutdown abort,那么oracle会马上shutdown ,但是,当你startup的时候,有可能就会很慢,因为 smon 会接着清理 undo,这个等待过程也许是很漫长的:
——————————————————————————————————
Completed: ALTER DATABASE?? MOUNT
Thu Aug 26 22:43:57 2010
ALTER DATABASE OPEN
Thu Aug 26 22:43:57 2010
Beginning crash recovery of 1 threads
Thu Aug 26 22:43:57 2010

Started first pass scan
Thu Aug 26 22:43:57 2010
Completed first pass scan
?402218 redo blocks read, 126103 data blocks need recovery
Thu Aug 26 22:45:05 2010
Restarting dead background process QMN0
QMN0 started with pid=16
Thu Aug 26 22:45:19 2010
Started recovery at
?Thread 1: logseq 13392, block 381202, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 3 Seq 13392 Reading mem 0
? Mem# 0 errs 0: /zxindata/oracle/redolog/redo03.dbf
Recovery of Online Redo Log: Thread 1 Group 1 Seq 13393 Reading mem 0
? Mem# 0 errs 0: /zxindata/oracle/redolog/redo01.dbf
Thu Aug 26 22:45:21 2010
Completed redo application

Thu Aug 26 22:48:35 2010
Ended recovery at
?Thread 1: logseq 13393, block 271434, scn 2623.1377219707
?126103 data blocks read, 115641 data blocks written, 402218 redo blocks read
Crash recovery completed successfully

________________________________________________
看红色标注的那个,等待了 3 分钟才做完 recovery。
那如何才能让它快呢,metalink(238507.1 ) 有给出一些做法:
---------------------------------------------------------
1. Find SMON's Oracle PID:
Example:
SQL> select pid, program from v$process where program like '%SMON%';
?????? PID PROGRAM
---------- ------------------------------------------------
???????? 6 oracle@stsun7 (SMON)

2. Disable SMON transaction cleanup:
SVRMGR> oradebug setorapid <SMON's Oracle PID>
SVRMGR> oradebug event 10513 trace name context forever, level 2

3. Kill the PQ slaves that are doing parallel transaction recovery.
You can check V$FAST_START_SERVERS to find these.
4. Turn off fast_start_parallel_rollback:
alter system set fast_start_parallel_rollback=false;
If SMON is recovering, this command might hang, if it does just control-C out of it.? You may need to try this many times to get this to complete (between SMON cycles).
5. Re-enable SMON txn recovery:
SVRMGR> oradebug setorapid <SMON's Oracle PID>
SVRMGR> oradebug event 10513 trace name context off

——————————————————————————————————
以上的思路主要是要把 SMON 并行 recovery 的功能给改成非并行,主要
是 fast_start_parallel_rollback 这个参数的作用。
There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq slaves are interfering with each