1. 14 Sep, 2009 26 commits
  2. 11 Sep, 2009 6 commits
    • Jens Axboe's avatar
      writeback: check for registered bdi in flusher add and inode dirty · 500b067c
      Jens Axboe authored
      
      
      Also a debugging aid. We want to catch dirty inodes being added to
      backing devices that don't do writeback.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      500b067c
    • Jens Axboe's avatar
      writeback: add name to backing_dev_info · d993831f
      Jens Axboe authored
      
      
      This enables us to track who does what and print info. Its main use
      is catching dirty inodes on the default_backing_dev_info, so we can
      fix that up.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d993831f
    • Jens Axboe's avatar
      writeback: get rid of pdflush completely · d0bceac7
      Jens Axboe authored
      
      
      It is now unused, so kill it off.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d0bceac7
    • Jens Axboe's avatar
      writeback: switch to per-bdi threads for flushing data · 03ba3782
      Jens Axboe authored
      
      
      This gets rid of pdflush for bdi writeout and kupdated style cleaning.
      pdflush writeout suffers from lack of locality and also requires more
      threads to handle the same workload, since it has to work in a
      non-blocking fashion against each queue. This also introduces lumpy
      behaviour and potential request starvation, since pdflush can be starved
      for queue access if others are accessing it. A sample ffsb workload that
      does random writes to files is about 8% faster here on a simple SATA drive
      during the benchmark phase. File layout also seems a LOT more smooth in
      vmstat:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
       0  1      0 608848   2652 375372    0    0     0 71024  604    24  1 10 48 42
       0  1      0 549644   2712 433736    0    0     0 60692  505    27  1  8 48 44
       1  0      0 476928   2784 505192    0    0     4 29540  553    24  0  9 53 37
       0  1      0 457972   2808 524008    0    0     0 54876  331    16  0  4 38 58
       0  1      0 366128   2928 614284    0    0     4 92168  710    58  0 13 53 34
       0  1      0 295092   3000 684140    0    0     0 62924  572    23  0  9 53 37
       0  1      0 236592   3064 741704    0    0     4 58256  523    17  0  8 48 44
       0  1      0 165608   3132 811464    0    0     0 57460  560    21  0  8 54 38
       0  1      0 102952   3200 873164    0    0     4 74748  540    29  1 10 48 41
       0  1      0  48604   3252 926472    0    0     0 53248  469    29  0  7 47 45
      
      where vanilla tends to fluctuate a lot in the creation phase:
      
       r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
       1  1      0 678716   5792 303380    0    0     0 74064  565    50  1 11 52 36
       1  0      0 662488   5864 319396    0    0     4   352  302   329  0  2 47 51
       0  1      0 599312   5924 381468    0    0     0 78164  516    55  0  9 51 40
       0  1      0 519952   6008 459516    0    0     4 78156  622    56  1 11 52 37
       1  1      0 436640   6092 541632    0    0     0 82244  622    54  0 11 48 41
       0  1      0 436640   6092 541660    0    0     0     8  152    39  0  0 51 49
       0  1      0 332224   6200 644252    0    0     4 102800  728    46  1 13 49 36
       1  0      0 274492   6260 701056    0    0     4 12328  459    49  0  7 50 43
       0  1      0 211220   6324 763356    0    0     0 106940  515    37  1 10 51 39
       1  0      0 160412   6376 813468    0    0     0  8224  415    43  0  6 49 45
       1  1      0  85980   6452 886556    0    0     4 113516  575    39  1 11 54 34
       0  2      0  85968   6452 886620    0    0     0  1640  158   211  0  0 46 54
      
      A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
      SSD based writeback test on XFS performs over 20% better as well, with
      the throughput being very stable around 1GB/sec, where pdflush only
      manages 750MB/sec and fluctuates wildly while doing so. Random buffered
      writes to many files behave a lot better as well, as does random mmap'ed
      writes.
      
      A separate thread is added to sync the super blocks. In the long term,
      adding sync_supers_bdi() functionality could get rid of this thread again.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      03ba3782
    • Jens Axboe's avatar
      writeback: move dirty inodes from super_block to backing_dev_info · 66f3b8e2
      Jens Axboe authored
      
      
      This is a first step at introducing per-bdi flusher threads. We should
      have no change in behaviour, although sb_has_dirty_inodes() is now
      ridiculously expensive, as there's no easy way to answer that question.
      Not a huge problem, since it'll be deleted in subsequent patches.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      66f3b8e2
    • Jens Axboe's avatar
      writeback: get rid of generic_sync_sb_inodes() export · d8a8559c
      Jens Axboe authored
      
      
      This adds two new exported functions:
      
      - writeback_inodes_sb(), which only attempts to writeback dirty inodes on
        this super_block, for WB_SYNC_NONE writeout.
      - sync_inodes_sb(), which writes out all dirty inodes on this super_block
        and also waits for the IO to complete.
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      d8a8559c
  3. 10 Sep, 2009 1 commit
    • Roland McGrath's avatar
      binfmt_elf: fix PT_INTERP bss handling · 9f0ab4a3
      Roland McGrath authored
      
      
      In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
      the PT_LOAD has no PROT_WRITE and no .bss.  This generates EFAULT.
      
      Here is a small test case.  (Yes, there are other, useful PT_INTERP
      which have only .text and no .data/.bss.)
      
      	----- ptinterp.S
      	_start: .globl _start
      		 nop
      		 int3
      	-----
      	$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
      	$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
      	$ ./hello
      	Segmentation fault  # during execve() itself
      
      	After applying the patch:
      	$ ./hello
      	Trace trap  # user-mode execution after execve() finishes
      
      If the ELF headers are actually self-inconsistent, then dying is fine.
      But having no PROT_WRITE segment is perfectly normal and correct if
      there is no segment with p_memsz > p_filesz (i.e. bss).  John Reiser
      suggested checking for PROT_WRITE in the bss logic.  I think it makes
      most sense to simply apply the bss logic only when there is bss.
      
      This patch looks less trivial than it is due to some reindentation.
      It just moves the "if (last_bss > elf_bss) {" test up to include the
      partial-page bss logic as well as the more-pages bss logic.
      Reported-by: default avatarJohn Reiser <jreiser@bitwagon.com>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      9f0ab4a3
  4. 09 Sep, 2009 4 commits
    • Roland McGrath's avatar
      binfmt_elf: fix PT_INTERP bss handling · 752015d1
      Roland McGrath authored
      
      
      In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if
      the PT_LOAD has no PROT_WRITE and no .bss.  This generates EFAULT.
      
      Here is a small test case.  (Yes, there are other, useful PT_INTERP
      which have only .text and no .data/.bss.)
      
      	----- ptinterp.S
      	_start: .globl _start
      		 nop
      		 int3
      	-----
      	$ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S
      	$ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c
      	$ ./hello
      	Segmentation fault  # during execve() itself
      
      	After applying the patch:
      	$ ./hello
      	Trace trap  # user-mode execution after execve() finishes
      
      If the ELF headers are actually self-inconsistent, then dying is fine.
      But having no PROT_WRITE segment is perfectly normal and correct if
      there is no segment with p_memsz > p_filesz (i.e. bss).  John Reiser
      suggested checking for PROT_WRITE in the bss logic.  I think it makes
      most sense to simply apply the bss logic only when there is bss.
      
      This patch looks less trivial than it is due to some reindentation.
      It just moves the "if (last_bss > elf_bss) {" test up to include the
      partial-page bss logic as well as the more-pages bss logic.
      Reported-by: default avatarJohn Reiser <jreiser@bitwagon.com>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      752015d1
    • David P. Quigley's avatar
      sysfs: Add labeling support for sysfs · ddd29ec6
      David P. Quigley authored
      
      
      This patch adds a setxattr handler to the file, directory, and symlink
      inode_operations structures for sysfs. The patch uses hooks introduced in the
      previous patch to handle the getting and setting of security information for
      the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the
      sysfs_dirent structure has been replaced by a structure which contains the
      iattr, secdata and secdata length to allow the changes to persist in the event
      that the inode representing the sysfs_dirent is evicted. Because sysfs only
      stores this information when a change is made all the optional data is moved
      into one dynamically allocated field.
      
      This patch addresses an issue where SELinux was denying virtd access to the PCI
      configuration entries in sysfs. The lack of setxattr handlers for sysfs
      required that a single label be assigned to all entries in sysfs. Granting virtd
      access to every entry in sysfs is not an acceptable solution so fine grained
      labeling of sysfs is required such that individual entries can be labeled
      appropriately.
      
      [sds:  Fixed compile-time warnings, coding style, and setting of inode security init flags.]
      Signed-off-by: default avatarDavid P. Quigley <dpquigl@tycho.nsa.gov>
      Signed-off-by: default avatarStephen D. Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      ddd29ec6
    • David P. Quigley's avatar
      VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. · b1ab7e4b
      David P. Quigley authored
      
      
      This factors out the part of the vfs_setxattr function that performs the
      setting of the xattr and its notification. This is needed so the SELinux
      implementation of inode_setsecctx can handle the setting of the xattr while
      maintaining the proper separation of layers.
      Signed-off-by: default avatarDavid P. Quigley <dpquigl@tycho.nsa.gov>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      b1ab7e4b
    • Steven Whitehouse's avatar
      GFS2: Remove unused sysfs file · 2b88f7c5
      Steven Whitehouse authored
      
      
      The /sys/fs/gfs2/<fsname>/lock_module/id file has been unused for
      some time now, so we can remove it. We still accept the mount option
      though, as userspace still sends that.
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      2b88f7c5
  5. 08 Sep, 2009 3 commits