From 8b6490e5faafb3a16ea45654fb55f9ff086f1495 Mon Sep 17 00:00:00 2001 From: Dipankar Sarma Date: Fri, 9 Sep 2005 13:04:07 -0700 Subject: [PATCH] files: fix rcu initializers First of a number of files_lock scaability patches. Here are the x86 numbers - tiobench on a 4(8)-way (HT) P4 system on ramdisk : (lockfree) Test 2.6.10-vanilla Stdev 2.6.10-fd Stdev ------------------------------------------------------------- Seqread 1400.8 11.52 1465.4 34.27 Randread 1594 8.86 2397.2 29.21 Seqwrite 242.72 3.47 238.46 6.53 Randwrite 445.74 9.15 446.4 9.75 The performance improvement is very significant. We are getting killed by the cacheline bouncing of the files_struct lock here. Writes on ramdisk (ext2) seems to vary just too much to get any meaningful number. Also, With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system : 2.6.12-rc5-vanilla : Running test 'readwrite' with 8 tasks Threads 0.34 +/- 0.01 seconds Processes 0.16 +/- 0.00 seconds 2.6.12-rc5-fd : Running test 'readwrite' with 8 tasks Threads 0.17 +/- 0.02 seconds Processes 0.17 +/- 0.02 seconds I repeated the measurements on ramfs (as opposed to ext2 on ramdisk in the earlier measurement) and I got more consistent results from tiobench : 4(8) way xeon P4 ----------------- (lock-free) Test 2.6.12-rc5 Stdev 2.6.12-rc5-fd Stdev ------------------------------------------------------------- Seqread 1282 18.59 1343.6 26.37 Randread 1517 7 2415 34.27 Seqwrite 702.2 5.27 709.46 5.9 Randwrite 846.86 15.15 919.68 21.4 4-way ppc64 ------------ (lock-free) Test 2.6.12-rc5 Stdev 2.6.12-rc5-fd Stdev ------------------------------------------------------------- Seqread 1549 91.16 1569.6 47.2 Randread 1473.6 25.11 1585.4 69.99 Seqwrite 1096.8 20.03 1136 29.61 Randwrite 1189.6 4.04 1275.2 32.96 Also running Tridge's thread_perf test on ppc64 : 2.6.12-rc5-vanilla -------------------- Running test 'readwrite' with 4 tasks Threads 0.20 +/- 0.02 seconds Processes 0.16 +/- 0.01 seconds 2.6.12-rc5-fd -------------------- Running test 'readwrite' with 4 tasks Threads 0.18 +/- 0.04 seconds Processes 0.16 +/- 0.01 seconds The benefits are huge (upto ~60%) in some cases on x86 primarily due to the atomic operations during acquisition of ->file_lock and cache line bouncing in fast path. ppc64 benefits are modest due to LL/SC based locking, but still statistically significant. This patch: RCU head initilizer no longer needs the head varible name since we don't use list.h lists anymore. Signed-off-by: Dipankar Sarma Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/rcupdate.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index fd276ad..4e65eb4 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -52,8 +52,8 @@ struct rcu_head { void (*func)(struct rcu_head *head); }; -#define RCU_HEAD_INIT(head) { .next = NULL, .func = NULL } -#define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT(head) +#define RCU_HEAD_INIT { .next = NULL, .func = NULL } +#define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT #define INIT_RCU_HEAD(ptr) do { \ (ptr)->next = NULL; (ptr)->func = NULL; \ } while (0) -- cgit v1.1