今天一8点跑去中金校验数据,发现近期库(AIX6.1下4节点11gR2 RAC)的归档满了,数据库hang住,跑去问提前到的中间件的哥们,结果来了一句没发现什么异常……
心凉了一截,这他妈我要是晚来一会,准出事啊,纳税人还不得急死……二话不说赶紧去先清清再说,切换到grid用户下,通过 asmcmd 用 os 命令连删除了两个文件夹
ORA-15032: not all alterations performed
ORA-15028: ASM file '+FRA/bjschxcx/……' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
ls 命令核查,发现只有一个文件未删除,数据库已经从 hang 机状态恢复了,尝试用 RMAN 删除,仍然报如下错误:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on default channel at 06/08/2012 13:20:35
ORA-15028: ASM file '+FRA/bjschxcx/……' not dropped; currently being accessed
我要删除的归档是好几天钱的了,当前按道理应该没有使用才对,即便是近期库上配置了好几家厂商的 GoldenGate 实例,数据库在释放一点归档空间后虽然成功
起来了,但是这个问题不解决也不是个事,我在几家厂商的 GoldenGate 实例上查了一下,都未用到我要删除的归档日志,而且进程都没有延迟。
查阅了下 metalink ,有 2、3 篇文章对此现象有描述
The issue can be caused by any replication process running or hanging, holding this file.
For example a Golden Gate replication or shareplex replication process.
Stop the replication process and try deleting the file uisng rman or ASMCMD.
Cause: An attempt was made to drop an ASM file, but the file was being
accessed by one or more database instances and therefore could not
be dropped.
Action: Shut down all database instances that might be accessing this
file and then retry the drop command.
Use the following to quickly find out which database instance holds the lock and to identify for restart:
ASMCMD [+] > lsof -G DG_ARCH
DB_Name Instance_Name Path
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1_seq_72711.5178.785032231
myprod myprod1 +dg_arch/myprod/archivelog/2012_06_04/thread_1