日期:2014-05-16  浏览次数:20605 次

oracle真实案例之oem大量占用cpu与内存问题的解决方法

一、介绍

??? 由于dba离职,所以公司所有的oracle数据库服务器我先兼职管理,今天登陆某省的数据库,发现ssh登陆30秒左右才进入,之后查看了一下负载与内存,具体情况如下图:
负载:

没有见过这样高的负载,以前见过最多的就是负责1000多,java的问题
内存:

连交换内存都使用完了,物理内存就剩下71m了,太危险了
top:

发现了6个僵尸进程与大量的perl进行
现在查看一下僵尸进程

发现都是[sh] <defunct>进程,以前遇到过这样的问题,都是由于cron里启动脚本的时候,没有加入错误输入到空设备里导致,解决方法是在cron 里运行脚本后,添加>>/dev/null 2>&1,查看一下cron,查看是否与我的想法一致

果然是没有错误的输出,添加完>>/dev/null 2>&1在重启cron服务器就解决了
在查看perl进程

发现2726个进程,占用了大量的cpu与内存
去metalink里查看,发现这个问题是oem的故障导致,oracle给的问题的描述与解决方法为:
  1. Server?Has?100%?Of?Cpu?Because?Of?Dbresp.pl?[ID?764140.1]??????????????? ?
  2. ?
  3. ???? ?
  4. ?
  5. ________________________________________ ?
  6. ?Modified:07-Feb-2012?Type:PROBLEM?Status:MODERATED?Priority:3??????????? ?
  7. ?????????????????????Comments?(0)? ?
  8. ?????To?Bottom? ?
  9. ?
  10. ?
  11. ? ?
  12. ?
  13. In?this?Document ?
  14. Symptoms ?
  15. Cause ?
  16. Solution ?
  17. References ?
  18. ________________________________________ ?
  19. This?document?is?being?delivered?to?you?via?Oracle?Support's?Rapid?Visibility?(RaV)?process?and?therefore?has?not?been?subject?to?an?independent?technical?review. ?
  20. Applies?to:? ?
  21. Enterprise?Manager?Base?Platform?-?Version:?10.2.0.1?and?later?[Release:?10.2?and?later?] ?
  22. Information?in?this?document?applies?to?any?platform. ?
  23. ***Checked?for?relevance?on?07-Feb-2012***? ?
  24. Symptoms ?
  25. Server?has?100%?of?CPU?because?of?dbresp.pl?.?There?are?more?than?50?process?from?this?script ?
  26. ?
  27. emagent.trc?shows: ?
  28. 2009-01-21?10:19:50?Thread-4099931040?WARN?engine:?Missing?Properties?:?[limitSwitch]? ?
  29. 2009-01-21?10:19:50?Thread-4099931040?ERROR?engine:?[oracle_database,orcl,?alertLog]?:?nmeegd_GetMetricData?failed?:?Missing?Properties?:?[limitSwitch]? ?
  30. 2009-01-22?06:54:33?Thread-4105165728?ERROR?fetchlets.oslinetok:?Metric?execution?timed?out?in?600?seconds? ?
  31. 2009-01-22?06:54:33?Thread-4105165728?ERROR?command:?failed?to?kill?process?4793?running?perl:?( errno = 3 :?No?such?process)? ?
  32. 2009-01-22?06:54:33?Thread-4105165728?ERROR?engine:?[oracle_database,orlc,?Response]?:?nmeegd_GetMetricData?failed?:?Metric?execution?timed?out?in?600?seconds? ?
  33. Cause ?
  34. The?Response?metric?is?making?a?timed?out?then?the?Agent?starts?other?process?to?take?the?Response?metric.?The?process?to?kill?the?PID?taking?the?Response?metric?is?failing?increasing?the?process?running?dbresp.pl ?
  35. ?