日期:2014-05-16  浏览次数:20715 次

调优linux i/o 行为

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

?

The Linux Page Cache and pdflush:
Theory of Operation and Tuning for Write-Heavy Loads

As you write out data ultimately intended for disk, Linux caches this information in an area of memory called the page cache. You can find out basic info about the page cache using tools like free, vmstat or top. See http://gentoo-wiki.com/FAQ_Linux_Memory_Management to learn how to interpret top's memory information, or atop to get an improved version.

Full information about the page cache only shows up by looking at /proc/meminfo. Here is a sample from a system with 4GB of RAM:

MemTotal:      3950112 kB
MemFree:        622560 kB
Buffers:         78048 kB
Cached:        2901484 kB
SwapCached:          0 kB
Active:        3108012 kB
Inactive:        55296 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      3950112 kB
LowFree:        622560 kB
SwapTotal:     4198272 kB
SwapFree:      4198244 kB
Dirty:             416 kB
Writeback:           0 kB
Mapped:         999852 kB
Slab:            57104 kB
Committed_AS:  3340368 kB
PageTables:       6672 kB
VmallocTotal: 536870911 kB
VmallocUsed:     35300 kB
VmallocChunk: 536835611 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

The size of the page cache itself is the "Cached" figure here, in this example it's 2.9GB. As pages are written, the size of the "Dirty" section will increase. Once writes to disk have begun, you'll see the "Writeback" figure go up until the write is finished. It can be very hard to actually catch the Writeback value going high, as its value is very transient and only increases during the brief period when I/O is queued but not yet written.

Linux usually writes data out of the page cache using a process called pdflush. At any moment, between 2 and 8 pdflush threads are running on the system. You can monitor how many are active by looking at /proc/sys/vm/nr_pdflush_threads . Whenever all existing pdflush threads are busy for at least one second, an additional pdflush daemon is spawned. The new ones try to write back data to device queues that are not congested, aiming to have each device that's active get its own thread flushing data to that device. Each time a second has passed without any pdflush activity, one of the threads is removed. There are tunables for adjusting the minimum and maximum number of pdflush processes, but it's very rare they need to be adjusted.

pdflush tunables

Exactly what each pdflush thread does is controlled by a series of parameters in /proc/sys/vm:

/proc/sys/vm/dirty_writeback_centisecs (default 500): In hundredths of a second, this is how often pdflush wakes up to write data to disk. The default wakes up the two (or more) active threads every five seconds.

There can be undocumented behavior that thwarts attempts to decrease dirty_writeback_centisecs in an attempt to make pdflush more aggressive. For example, in early 2.6 kernels, the Linux mm/page-writeback.c code includes logic that's described as "if a writeback event takes longer than a dirty_writeback_centisecs interval, then leave a one-second gap". In general, this "congestion" logic in the kernel is documented only by the kernel source itself, and how it operates can vary considerably depending on which kernel you are running. Because of all this, it's unlikely you'll gain much benefit from lowering the writeback time; the thread spawning code assures that they will automatically run themselves as often as is practical to try and meet the other requirements.

The first thing pdflush works on is writing pages that have been dirty for longer than it deems acceptable. This is controlled by:

/proc/sys/vm/dirty_expire_centiseconds (default 3000):