Skip to content
  • Dipankar Sarma's avatar
    [PATCH] files: fix rcu initializers · 8b6490e5
    Dipankar Sarma authored
    
    
    First of a number of files_lock scaability patches.
    
     Here are the x86 numbers -
    
     tiobench on a 4(8)-way (HT) P4 system on ramdisk :
    
                                             (lockfree)
     Test            2.6.10-vanilla  Stdev   2.6.10-fd       Stdev
     -------------------------------------------------------------
     Seqread         1400.8          11.52   1465.4          34.27
     Randread        1594            8.86    2397.2          29.21
     Seqwrite        242.72          3.47    238.46          6.53
     Randwrite       445.74          9.15    446.4           9.75
    
     The performance improvement is very significant.
     We are getting killed by the cacheline bouncing of the files_struct
     lock here. Writes on ramdisk (ext2) seems to vary just too
     much to get any meaningful number.
    
     Also, With Tridge's thread_perf test on a 4(8)-way (HT) P4 xeon system :
    
     2.6.12-rc5-vanilla :
    
     Running test 'readwrite' with 8 tasks
     Threads     0.34 +/- 0.01 seconds
     Processes   0.16 +/- 0.00 seconds
    
     2.6.12-rc5-fd :
    
     Running test 'readwrite' with 8 tasks
     Threads     0.17 +/- 0.02 seconds
     Processes   0.17 +/- 0.02 seconds
    
     I repeated the measurements on ramfs (as opposed to ext2 on ramdisk in
     the earlier measurement) and I got more consistent results from tiobench :
    
     4(8) way xeon P4
     -----------------
                                             (lock-free)
     Test            2.6.12-rc5      Stdev   2.6.12-rc5-fd   Stdev
     -------------------------------------------------------------
     Seqread         1282            18.59   1343.6          26.37
     Randread        1517            7       2415            34.27
     Seqwrite        702.2           5.27    709.46           5.9
     Randwrite       846.86          15.15   919.68          21.4
    
     4-way ppc64
     ------------
                                             (lock-free)
     Test            2.6.12-rc5      Stdev   2.6.12-rc5-fd   Stdev
     -------------------------------------------------------------
     Seqread         1549            91.16   1569.6          47.2
     Randread        1473.6          25.11   1585.4          69.99
     Seqwrite        1096.8          20.03   1136            29.61
     Randwrite       1189.6           4.04   1275.2          32.96
    
     Also running Tridge's thread_perf test on ppc64 :
    
     2.6.12-rc5-vanilla
     --------------------
     Running test 'readwrite' with 4 tasks
     Threads     0.20 +/- 0.02 seconds
     Processes   0.16 +/- 0.01 seconds
    
     2.6.12-rc5-fd
     --------------------
     Running test 'readwrite' with 4 tasks
     Threads     0.18 +/- 0.04 seconds
     Processes   0.16 +/- 0.01 seconds
    
     The benefits are huge (upto ~60%) in some cases on x86 primarily
     due to the atomic operations during acquisition of ->file_lock
     and cache line bouncing in fast path. ppc64 benefits are modest
     due to LL/SC based locking, but still statistically significant.
    
    This patch:
    
    RCU head initilizer no longer needs the head varible name since we don't use
    list.h lists anymore.
    
    Signed-off-by: default avatarDipankar Sarma <dipankar@in.ibm.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    8b6490e5