日期:2014-05-16  浏览次数:20894 次

Linux CPU 负载度量公式

一个top命令不就行了么?顶多再加一些管道什么的过滤一下。我一开始也是这么想得。其实还可以理解的更多。

首先一个问题,是统计某个时间点的CPU负载,还是某个时间段的?

为了画折线图报表,一般横坐标都是某个时间点,也就是希望能够统计某个时间点的CPU负载,但这是很难办得到的。比较容易的做法是通过两个时间点之间的CPU负载,也就是某个时间段。如果要做benchmark,就把时间段变得很小,1秒甚至更小。如果要常规监控, 可以将时间段放大到1分钟,甚至更多。


第二个问题,用什么来判断某个时间段的CPU的负载?

CPU有一个基本时间度量单位叫做jiffy,这是一个很短的时间,具体时长多少取决与硬件。不过关系不大,对于我的计算负载达到百分之多少来讲已经够用了。

下面这篇文章http://www.linuxhowtos.org/System/procstat.htm介绍了介绍了 介绍了/proc/stat文件。里面指的关注的是:

1. 第一行CPU的数值是下面几个CPU数值的总和

2. 一行7个数字的分别解释:

The meanings of the columns are as follows, from left to right:

user: normal processes executing in user mode
nice: niced processes executing in user mode
system: processes executing in kernel mode
idle: twiddling thumbs
iowait: waiting for I/O to complete
irq: servicing interrupts
softirq: servicing softirqs

然后这篇讨论贴给出计算公式,http://stackoverflow.com/questions/3017162/how-to-get-total-cpu-usage-in-linux-c

e.g. Suppose at 14:00:00 you have

cpu 4698 591 262 8953 916 449 531

total_jiffies_1 = (sum of all values) = 16400

work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551

and at 14:00:05 you have

cpu 4739 591 289 9961 936 449 541

total_jiffies_2 = 17506

work_jiffies_2 = 5619

So the %cpu usage over this period is:

work_over_period = work_jiffies_2 - work_jiffies_1 = 68

total_over_period = total_jiffies_2 - total_jiffies_1 = 1106

%cpu = work_over_period / total_over_period * 100 = 6.1%

很容易理解。最后算出来的小数 * 100后就是百分数。

在我的机器上,一共10列。

 cat /proc/stat 
cpu  2065552 1692 636745 10842974 59979 16 6860 0 0 0
cpu0 524690 552 158305 2701823 8912 7 4808 0 0 0
cpu1 511203 670 157274 2703792 31404 1 1179 0 0 0
cpu2 519169 441 155591 2720326 11179 0 438 0 0 0
cpu3 510489 27 165574 2717032 8482 7 435 0 0 0
在man 5 proc中回车,输入/proc/stat后再次回车进行查找,看到

       /proc/stat
              kernel/system statistics.  Varies with architecture.  Common entries include:

              cpu  3357 0 4313 1362393
                     The amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system  spent  in  user  mode,
                     user mode with low priority (nice), system mode, and the idle task, respectively.  The last value should be USER_HZ times the second entry in the uptime pseudo-file.

                     In  Linux  2.6  this line includes three additional columns: iowait - time waiting for I/O to complete (since 2.5.41); irq - time servicing interrupts (since 2.6.0-test4); softirq - time
                     servicing softirqs (since 2.6.0-test4).

                     Since Linux 2.6.11, there is an eighth column, steal - stolen time, which is the time spent in other operating systems when running in a virtualized environment

                     Since Linux 2.6.24, there is a ninth column, guest, which is the time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.

这里解释了

第8个是虚拟机环境下,其他OS偷走的时间。

第9个是如果是host机器,那么运行的guest VM用去的时间。

这些信息也是很有用的。毕竟现在不少server其实只是VM而已。