Commit 3efaf0fa authored by Wu Fengguang's avatar Wu Fengguang
Browse files

writeback: skip balance_dirty_pages() for in-memory fs

This avoids unnecessary checks and dirty throttling on tmpfs/ramfs.

Notes about the tmpfs/ramfs behavior changes:

As for 2.6.36 and older kernels, the tmpfs writes will sleep inside
balance_dirty_pages() as long as we are over the (dirty+background)/2
global throttle threshold.  This is because both the dirty pages and
threshold will be 0 for tmpfs/ramfs. Hence this test will always
evaluate to TRUE:

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
                        || (nr_reclaimable + nr_writeback >= dirty_thresh);

For 2.6.37, someone complained that the current logic does not allow the
users to set vm.dirty_ratio=0.  So commit 4cbec4c8

 changed the test to

                dirty_exceeded =
                        (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh)
                        || (nr_reclaimable + nr_writeback > dirty_thresh);

So 2.6.37 will behave differently for tmpfs/ramfs: it will never get
throttled unless the global dirty threshold is exceeded (which is very
unlikely to happen; once happen, will block many tasks).

I'd say that the 2.6.36 behavior is very bad for tmpfs/ramfs. It means
for a busy writing server, tmpfs write()s may get livelocked! The
"inadvertent" throttling can hardly bring help to any workload because
of its "either no throttling, or get throttled to death" property.

So based on 2.6.37, this patch won't bring more noticeable changes.

CC: Hugh Dickins <>
Acked-by: default avatarRik van Riel <>
Acked-by: default avatarPeter Zijlstra <>
Reviewed-by: default avatarMinchan Kim <>
Signed-off-by: default avatarWu Fengguang <>
parent 6f718656
...@@ -244,13 +244,8 @@ void task_dirty_inc(struct task_struct *tsk) ...@@ -244,13 +244,8 @@ void task_dirty_inc(struct task_struct *tsk)
static void bdi_writeout_fraction(struct backing_dev_info *bdi, static void bdi_writeout_fraction(struct backing_dev_info *bdi,
long *numerator, long *denominator) long *numerator, long *denominator)
{ {
if (bdi_cap_writeback_dirty(bdi)) { prop_fraction_percpu(&vm_completions, &bdi->completions,
prop_fraction_percpu(&vm_completions, &bdi->completions,
numerator, denominator); numerator, denominator);
} else {
*numerator = 0;
*denominator = 1;
} }
static inline void task_dirties_fraction(struct task_struct *tsk, static inline void task_dirties_fraction(struct task_struct *tsk,
...@@ -495,6 +490,9 @@ static void balance_dirty_pages(struct address_space *mapping, ...@@ -495,6 +490,9 @@ static void balance_dirty_pages(struct address_space *mapping,
bool dirty_exceeded = false; bool dirty_exceeded = false;
struct backing_dev_info *bdi = mapping->backing_dev_info; struct backing_dev_info *bdi = mapping->backing_dev_info;
if (!bdi_cap_account_dirty(bdi))
for (;;) { for (;;) {
struct writeback_control wbc = { struct writeback_control wbc = {
.sync_mode = WB_SYNC_NONE, .sync_mode = WB_SYNC_NONE,
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment