1. 07 Aug, 2016 1 commit
  2. 21 Jul, 2016 1 commit
    • Bob Peterson's avatar
      GFS2: Fix gfs2_replay_incr_blk for multiple journal sizes · e1cb6be9
      Bob Peterson authored
      Before this patch, if you used gfs2_jadd to add new journals of a
      size smaller than the existing journals, replaying those new journals
      would withdraw. That's because function gfs2_replay_incr_blk was
      using the number of journal blocks (jd_block) from the superblock's
      journal pointer. In other words, "My journal's max size" rather than
      "the journal we're replaying's size." This patch changes the function
      to use the size of the pertinent journal rather than always using the
      journal we happen to be using.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      e1cb6be9
  3. 20 Jul, 2016 1 commit
  4. 12 Jul, 2016 1 commit
    • Bob Peterson's avatar
      GFS2: Check rs_free with rd_rsspin protection · 44f52122
      Bob Peterson authored
      For the last process to close a file opened for write, function
      gfs2_rsqa_delete was deleting the file's inode's block reservation
      out of the rgrp reservations tree. Then it was checking to make sure
      rs_free was 0, but it was performing the check outside the protection
      of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed
      to prevent a race between the process freeing the reservation and
      another who is allocating a new set of blocks inside the same rgrp
      for the same inode, thus changing its value.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      44f52122
  5. 05 Jul, 2016 1 commit
  6. 27 Jun, 2016 5 commits
    • Benjamin Marzinski's avatar
      gfs2: writeout truncated pages · fd4c5748
      Benjamin Marzinski authored
      When gfs2 attempts to write a page to a file that is being truncated,
      and notices that the page is completely outside of the file size, it
      tries to invalidate it.  However, this may require a transaction for
      journaled data files to revoke any buffers from the page on the active
      items list. Unfortunately, this can happen inside a log flush, where a
      transaction cannot be started. Also, gfs2 may need to be able to remove
      the buffer from the ail1 list before it can finish the log flush.
      
      To deal with this, when writing a page of a file with data journalling
      enabled gfs2 now skips the check to see if the write is outside the file
      size, and simply writes it anyway. This situation can only occur when
      the truncate code still has the file locked exclusively, and hasn't
      marked this block as free in the metadata (which happens later in
      truc_dealloc).  After gfs2 writes this page out, the truncation code
      will shortly invalidate it and write out any revokes if necessary.
      
      To do this, gfs2 now implements its own version of block_write_full_page
      without the check, and calls the newly exported __block_write_full_page.
      It also no longer calls gfs2_writepage_common from gfs2_jdata_writepage.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      fd4c5748
    • Andreas Gruenbacher's avatar
      gfs2: Lock holder cleanup · 6df9f9a2
      Andreas Gruenbacher authored
      Make the code more readable by cleaning up the different ways of
      initializing lock holders and checking for initialized lock holders:
      mark lock holders as uninitialized by setting the holder's glock to NULL
      (gfs2_holder_mark_uninitialized) instead of zeroing out the entire
      object or using a separate flag.  Recognize initialized holders by their
      non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
      which are immeditiately initialized via gfs2_holder_init or
      gfs2_glock_nq_init.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      6df9f9a2
    • Andreas Gruenbacher's avatar
      gfs2: Large-filesystem fix for 32-bit systems · cda9dd42
      Andreas Gruenbacher authored
      Commit ff34245d switched from iget5_locked to iget_locked among other
      things, but iget_locked doesn't work for filesystems larger than 2^32
      blocks on 32-bit systems.  Switch back to iget5_locked.  Filesystems
      larger than 2^32 blocks are unrealistic to work well on 32-bit systems,
      so this is mostly a code cleanliness fix.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      cda9dd42
    • Andreas Gruenbacher's avatar
      gfs2: Get rid of gfs2_ilookup · ec5ec66b
      Andreas Gruenbacher authored
      Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
      (and not for cached inodes anymore), there no longer is a need to
      optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
      and gfs2_ilookup can be removed.
      
      In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
      i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
      inconsistency goes away as well.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      ec5ec66b
    • Andreas Gruenbacher's avatar
      gfs2: Fix gfs2_lookup_by_inum lock inversion · 3ce37b2c
      Andreas Gruenbacher authored
      The current gfs2_lookup_by_inum takes the glock of a presumed inode
      identified by block number, verifies that the block is indeed an inode,
      and then instantiates and reads the new inode via gfs2_inode_lookup.
      
      However, instantiating a new inode may block on freeing a previous
      instance of that inode (__wait_on_freeing_inode), and freeing an inode
      requires to take the glock already held, leading to lock inversion and
      deadlock.
      
      Fix this by first instantiating the new inode, then verifying that the
      block is an inode (if required), and then reading in the new inode, all
      in gfs2_inode_lookup.
      
      If the block we are looking for is not an inode, we discard the new
      inode via iget_failed, which marks inodes as bad and unhashes them.
      Other tasks waiting on that inode will get back a bad inode back from
      ilookup or iget_locked; in that case, retry the lookup.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      3ce37b2c
  7. 17 Jun, 2016 1 commit
  8. 10 Jun, 2016 1 commit
    • Bob Peterson's avatar
      GFS2: don't set rgrp gl_object until it's inserted into rgrp tree · 36e4ad03
      Bob Peterson authored
      Before this patch, function read_rindex_entry would set a rgrp
      glock's gl_object pointer to itself before inserting the rgrp into
      the rgrp rbtree. The problem is: if another process was also reading
      the rgrp in, and had already inserted its newly created rgrp, then
      the second call to read_rindex_entry would overwrite that value,
      then return a bad return code to the caller. Later, other functions
      would reference the now-freed rgrp memory by way of gl_object.
      In some cases, that could result in gfs2_rgrp_brelse being called
      twice for the same rgrp: once for the failed attempt and once for
      the "real" rgrp release. Eventually the kernel would panic.
      There are also a number of other things that could go wrong when
      a kernel module is accessing freed storage. For example, this could
      result in rgrp corruption because the fake rgrp would point to a
      fake bitmap in memory too, causing gfs2_inplace_reserve to search
      some random memory for free blocks, and find some, since we were
      never setting rgd->rd_bits to NULL before freeing it.
      
      This patch fixes the problem by not setting gl_object until we
      have successfully inserted the rgrp into the rbtree. Also, it sets
      rd_bits to NULL as it frees them, which will ensure any accidental
      access to the wrong rgrp will result in a kernel panic rather than
      file system corruption, which is preferred.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      36e4ad03
  9. 07 Jun, 2016 4 commits
  10. 27 May, 2016 2 commits
    • Arnd Bergmann's avatar
      remove lots of IS_ERR_VALUE abuses · 287980e4
      Arnd Bergmann authored
      Most users of IS_ERR_VALUE() in the kernel are wrong, as they
      pass an 'int' into a function that takes an 'unsigned long'
      argument. This happens to work because the type is sign-extended
      on 64-bit architectures before it gets converted into an
      unsigned type.
      
      However, anything that passes an 'unsigned short' or 'unsigned int'
      argument into IS_ERR_VALUE() is guaranteed to be broken, as are
      8-bit integers and types that are wider than 'unsigned long'.
      
      Andrzej Hajda has already fixed a lot of the worst abusers that
      were causing actual bugs, but it would be nice to prevent any
      users that are not passing 'unsigned long' arguments.
      
      This patch changes all users of IS_ERR_VALUE() that I could find
      on 32-bit ARM randconfig builds and x86 allmodconfig. For the
      moment, this doesn't change the definition of IS_ERR_VALUE()
      because there are probably still architecture specific users
      elsewhere.
      
      Almost all the warnings I got are for files that are better off
      using 'if (err)' or 'if (err < 0)'.
      The only legitimate user I could find that we get a warning for
      is the (32-bit only) freescale fman driver, so I did not remove
      the IS_ERR_VALUE() there but changed the type to 'unsigned long'.
      For 9pfs, I just worked around one user whose calling conventions
      are so obscure that I did not dare change the behavior.
      
      I was using this definition for testing:
      
       #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \
             unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO))
      
      which ends up making all 16-bit or wider types work correctly with
      the most plausible interpretation of what IS_ERR_VALUE() was supposed
      to return according to its users, but also causes a compile-time
      warning for any users that do not pass an 'unsigned long' argument.
      
      I suggested this approach earlier this year, but back then we ended
      up deciding to just fix the users that are obviously broken. After
      the initial warning that caused me to get involved in the discussion
      (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus
      asked me to send the whole thing again.
      
      [ Updated the 9p parts as per Al Viro  - Linus ]
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Andrzej Hajda <a.hajda@samsung.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: https://lkml.org/lkml/2016/1/7/363
      Link: https://lkml.org/lkml/2016/5/27/486
      Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      287980e4
    • Al Viro's avatar
      switch xattr_handler->set() to passing dentry and inode separately · 59301226
      Al Viro authored
      preparation for similar switch in ->setxattr() (see the next commit for
      rationale).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      59301226
  11. 12 May, 2016 2 commits
  12. 06 May, 2016 1 commit
  13. 02 May, 2016 3 commits
  14. 01 May, 2016 2 commits
  15. 19 Apr, 2016 1 commit
  16. 14 Apr, 2016 1 commit
  17. 12 Apr, 2016 1 commit
  18. 10 Apr, 2016 3 commits
  19. 05 Apr, 2016 3 commits
  20. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  21. 24 Mar, 2016 1 commit
  22. 15 Mar, 2016 3 commits