1. 23 Apr, 2016 1 commit
    • Theodore Ts'o's avatar
      ext4: allow readdir()'s of large empty directories to be interrupted · 1f60fbe7
      Theodore Ts'o authored
      If a directory has a large number of empty blocks, iterating over all
      of them can take a long time, leading to scheduler warnings and users
      getting irritated when they can't kill a process in the middle of one
      of these long-running readdir operations.  Fix this by adding checks to
      ext4_readdir() and ext4_htree_fill_tree().
      
      This was reverted earlier due to a typo in the original commit where I
      experimented with using signal_pending() instead of
      fatal_signal_pending().  The test was in the wrong place if we were
      going to return signal_pending() since we would end up returning
      duplicant entries.  See 9f2394c9
      
       for a more detailed explanation.
      
      Added fix as suggested by Linus to check for signal_pending() in
      in the filldir() functions.
      Reported-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Google-Bug-Id: 27880676
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1f60fbe7
  2. 10 Apr, 2016 1 commit
    • Linus Torvalds's avatar
      Revert "ext4: allow readdir()'s of large empty directories to be interrupted" · 9f2394c9
      Linus Torvalds authored
      This reverts commit 1028b55b
      
      .
      
      It's broken: it makes ext4 return an error at an invalid point, causing
      the readdir wrappers to write the the position of the last successful
      directory entry into the position field, which means that the next
      readdir will now return that last successful entry _again_.
      
      You can only return fatal errors (that terminate the readdir directory
      walk) from within the filesystem readdir functions, the "normal" errors
      (that happen when the readdir buffer fills up, for example) happen in
      the iterorator where we know the position of the actual failing entry.
      
      I do have a very different patch that does the "signal_pending()"
      handling inside the iterator function where it is allowable, but while
      that one passes all the sanity checks, I screwed up something like four
      times while emailing it out, so I'm not going to commit it today.
      
      So my track record is not good enough, and the stars will have to align
      better before that one gets committed.  And it would be good to get some
      review too, of course, since celestial alignments are always an iffy
      debugging model.
      
      IOW, let's just revert the commit that caused the problem for now.
      Reported-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f2394c9
  3. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      
      
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  4. 30 Mar, 2016 1 commit
  5. 22 Mar, 2016 1 commit
  6. 15 Feb, 2016 1 commit
  7. 07 Feb, 2016 1 commit
  8. 17 Oct, 2015 2 commits
  9. 31 May, 2015 2 commits
  10. 18 May, 2015 2 commits
    • Theodore Ts'o's avatar
      ext4 crypto: reorganize how we store keys in the inode · b7236e21
      Theodore Ts'o authored
      
      
      This is a pretty massive patch which does a number of different things:
      
      1) The per-inode encryption information is now stored in an allocated
         data structure, ext4_crypt_info, instead of directly in the node.
         This reduces the size usage of an in-memory inode when it is not
         using encryption.
      
      2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
         encryption structure instead.  This remove an unnecessary memory
         allocation and free for the fname_crypto_ctx as well as allowing us
         to reuse the ctfm in a directory for multiple lookups and file
         creations.
      
      3) We also cache the inode's policy information in the ext4_crypt_info
         structure so we don't have to continually read it out of the
         extended attributes.
      
      4) We now keep the keyring key in the inode's encryption structure
         instead of releasing it after we are done using it to derive the
         per-inode key.  This allows us to test to see if the key has been
         revoked; if it has, we prevent the use of the derived key and free
         it.
      
      5) When an inode is released (or when the derived key is freed), we
         will use memset_explicit() to zero out the derived key, so it's not
         left hanging around in memory.  This implies that when a user logs
         out, it is important to first revoke the key, and then unlink it,
         and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
         release any decrypted pages and dcache entries from the system
         caches.
      
      6) All this, and we also shrink the number of lines of code by around
         100.  :-)
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      b7236e21
    • Theodore Ts'o's avatar
  11. 01 May, 2015 1 commit
  12. 11 Apr, 2015 2 commits
  13. 02 Apr, 2015 1 commit
  14. 29 Aug, 2014 1 commit
  15. 28 Jul, 2014 1 commit
  16. 27 May, 2014 1 commit
  17. 23 Jan, 2014 1 commit
  18. 28 Aug, 2013 1 commit
  19. 29 Jun, 2013 1 commit
  20. 19 Apr, 2013 1 commit
    • Tao Ma's avatar
      ext4: fix readdir error in the case of inline_data+dir_index · 8af0f082
      Tao Ma authored
      
      
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.' and what's worse,
      if there is a conversion happens when the user calls getdents
      many times, he/she may get the same entry twice.
      
      In theory, a dir block would also fail if it is converted to a
      hashed-index based dir since f_pos will become a hash value, not the
      real one, but it doesn't happen.  And a deep investigation shows that
      we uses a hash based solution even for a normal dir if the dir_index
      feature is enabled.
      
      So this patch just adds a new htree_inlinedir_to_tree for inline dir,
      and if we find that the hash index is supported, we will do like what
      we do for a dir block.
      Reported-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarTao Ma <boyu.mt@taobao.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      8af0f082
  21. 02 Mar, 2013 1 commit
  22. 22 Feb, 2013 1 commit
  23. 28 Jan, 2013 1 commit
  24. 17 Dec, 2012 1 commit
  25. 10 Dec, 2012 2 commits
  26. 22 Jul, 2012 1 commit
  27. 29 Apr, 2012 1 commit
  28. 19 Mar, 2012 1 commit
  29. 18 Mar, 2012 1 commit
  30. 20 Feb, 2012 1 commit
  31. 10 Jan, 2011 1 commit
  32. 19 Dec, 2010 1 commit
  33. 27 Oct, 2010 1 commit
    • Toshiyuki Okajima's avatar
      ext4: improve llseek error handling for overly large seek offsets · e0d10bfa
      Toshiyuki Okajima authored
      
      
      The llseek system call should return EINVAL if passed a seek offset
      which results in a write error.  What this maximum offset should be
      depends on whether or not the huge_file file system feature is set,
      and whether or not the file is extent based or not.
      
      
      If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be 
      written (write systemcall) is different from the maximum size which can be 
      sought (lseek systemcall).
      
      For example, the following 2 cases demonstrates the differences
      between the maximum size which can be written, versus the seek offset
      allowed by the llseek system call:
      
      #1: mkfs.ext3 <dev>; mount -t ext4 <dev>
      #2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>
      
      Table. the max file size which we can write or seek
             at each filesystem feature tuning and file flag setting
      +============+===============================+===============================+
      | \ File flag|                               |                               |
      |      \     |     !EXT4_EXTENTS_FL          |        EXT4_EXTETNS_FL        |
      |case       \|                               |                               |
      +------------+-------------------------------+-------------------------------+
      | #1         |   write:      2194719883264   | write:       --------------   |
      |            |   seek:       2199023251456   | seek:        --------------   |
      +------------+-------------------------------+-------------------------------+
      | #2         |   write:      4402345721856   | write:       17592186044415   |
      |            |   seek:      17592186044415   | seek:        17592186044415   |
      +------------+-------------------------------+-------------------------------+
      
      The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
      (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped 
      maxbytes).  Although generic_file_llseek uses only extent-mapped maxbytes.
      (llseek of ext4_file_operations is generic_file_llseek which uses
      sb->s_maxbytes.)
      
      Therefore we create ext4 llseek function which uses 2 maxbytes.
      
      The new own function originates from generic_file_llseek().
      If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters 
      inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.
      Signed-off-by: default avatarToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      e0d10bfa
  34. 27 Jul, 2010 2 commits