writeback: stabilize bdi->dirty_ratelimit

There are some imperfections in balanced_dirty_ratelimit. 1) large fluctuations The dirty_rate used for computing balanced_dirty_ratelimit is merely averaged in the past 200ms (very small comparing to the 3s estimation period for write_bw), which makes rather dispersed distribution of balanced_dirty_ratelimit. It's pretty hard to average out the singular points by increasing the estimation period. Considering that the averaging technique will introduce very undesirable time lags, I give it up totally. (btw, the 3s write_bw averaging time lag is much more acceptable because its impact is one-way and therefore won't lead to oscillations.) The more practical way is filtering -- most singular balanced_dirty_ratelimit points can be filtered out by remembering some prev_balanced_rate and prev_prev_balanced_rate. However the more reliable way is to guard balanced_dirty_ratelimit with task_ratelimit. 2) due to truncates and fs redirties, the (write_bw <=> dirty_rate) match could become unbalanced, which may lead to large systematical errors in balanced_dirty_ratelimit. The truncates, due to its possibly bumpy nature, can hardly be compensated smoothly. So let's face it. When some over-estimated balanced_dirty_ratelimit brings dirty_ratelimit high, dirty pages will go higher than the setpoint. task_ratelimit will in turn become lower than dirty_ratelimit. So if we consider both balanced_dirty_ratelimit and task_ratelimit and update dirty_ratelimit only when they are on the same side of dirty_ratelimit, the systematical errors in balanced_dirty_ratelimit won't be able to bring dirty_ratelimit far away. The balanced_dirty_ratelimit estimation may also be inaccurate near @limit or @freerun, however is less an issue. 3) since we ultimately want to - keep the fluctuations of task ratelimit as small as possible - keep the dirty pages around the setpoint as long time as possible the update policy used for (2) also serves the above goals nicely: if for some reason the dirty pages are high (task_ratelimit < dirty_ratelimit), and dirty_ratelimit is low (dirty_ratelimit < balanced_dirty_ratelimit), there is no point to bring up dirty_ratelimit in a hurry only to hurt both the above two goals. So, we make use of task_ratelimit to limit the update of dirty_ratelimit in two ways: 1) avoid changing dirty rate when it's against the position control target (the adjusted rate will slow down the progress of dirty pages going back to setpoint). 2) limit the step size. task_ratelimit is changing values step by step, leaving a consistent trace comparing to the randomly jumping balanced_dirty_ratelimit. task_ratelimit also has the nice smaller errors in stable state and typically larger errors when there are big errors in rate. So it's a pretty good limiting factor for the step size of dirty_ratelimit. Note that bdi->dirty_ratelimit is always tracking balanced_dirty_ratelimit. task_ratelimit is merely used as a limiting factor. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
author: Wu Fengguang <fengguang.wu@intel.com> 2011-08-26 15:53:24 -0600
committer: Wu Fengguang <fengguang.wu@intel.com> 2011-10-03 21:08:57 +0800
commit: 7381131cbcf7e15d201a0ffd782a4698efe4e740 (patch)
tree: 83f00c40d0a3fcd41ff2e6681a5da70dd155628a /include
parent: be3ffa276446e1b691a2bf84e7621e5a6fb49db9 (diff)
download: op-kernel-dev-7381131cbcf7e15d201a0ffd782a4698efe4e740.zip
op-kernel-dev-7381131cbcf7e15d201a0ffd782a4698efe4e740.tar.gz
1 files changed, 3 insertions, 0 deletions
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index dff0ff7..c3b9201 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -83,8 +83,11 @@ struct backing_dev_info {
 	/*
 	 * The base dirty throttle rate, re-calculated on every 200ms.
 	 * All the bdi tasks' dirty rate will be curbed under it.
+	 * @dirty_ratelimit tracks the estimated @balanced_dirty_ratelimit
+	 * in small steps and is much more smooth/stable than the latter.
 	 */
 	unsigned long dirty_ratelimit;
+	unsigned long balanced_dirty_ratelimit;
 
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
author	Wu Fengguang <fengguang.wu@intel.com>	2011-08-26 15:53:24 -0600
committer	Wu Fengguang <fengguang.wu@intel.com>	2011-10-03 21:08:57 +0800
commit	7381131cbcf7e15d201a0ffd782a4698efe4e740 (patch)
tree	83f00c40d0a3fcd41ff2e6681a5da70dd155628a /include
parent	be3ffa276446e1b691a2bf84e7621e5a6fb49db9 (diff)
download	op-kernel-dev-7381131cbcf7e15d201a0ffd782a4698efe4e740.zip op-kernel-dev-7381131cbcf7e15d201a0ffd782a4698efe4e740.tar.gz