Skip to content
  • Hugh Dickins's avatar
    shmem: fix splicing from a hole while it's punched · b1a36650
    Hugh Dickins authored
    shmem_fault() is the actual culprit in trinity's hole-punch starvation,
    and the most significant cause of such problems: since a page faulted is
    one that then appears page_mapped(), needing unmap_mapping_range() and
    i_mmap_mutex to be unmapped again.
    
    But it is not the only way in which a page can be brought into a hole in
    the radix_tree while that hole is being punched; and Vlastimil's testing
    implies that if enough other processors are busy filling in the hole,
    then shmem_undo_range() can be kept from completing indefinitely.
    
    shmem_file_splice_read() is the main other user of SGP_CACHE, which can
    instantiate shmem pagecache pages in the read-only case (without holding
    i_mutex, so perhaps concurrently with a hole-punch).  Probably it's
    silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
    ought to be safe, but might bring surprises - not a change to be rushed.
    
    shmem_read_mapping_page_gfp() is an internal interface used by
    drivers/gpu/drm GEM (and next by uprobes): it should be okay.  And
    shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
    called internally by the kernel (perhaps for a stacking filesystem,
    which might rely on holes to be reserved): it's unclear whether it could
    be provoked to keep hole-punch busy or not.
    
    We could apply the same umbrella as now used in shmem_fault() to
    shmem_file_splice_read() and the others; but it looks ugly, and use over
    a range raises questions - should it actually be per page? can these get
    starved themselves?
    
    The origin of this part of the problem is my v3.1 commit d0823576
    
    
    ("mm: pincer in truncate_inode_pages_range"), once it was duplicated
    into shmem.c.  It seemed like a nice idea at the time, to ensure
    (barring RCU lookup fuzziness) that there's an instant when the entire
    hole is empty; but the indefinitely repeated scans to ensure that make
    it vulnerable.
    
    Revert that "enhancement" to hole-punch from shmem_undo_range(), but
    retain the unproblematic rescanning when it's truncating; add a couple
    of comments there.
    
    Remove the "indices[0] >= end" test: that is now handled satisfactorily
    by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
    to be worth avoiding here.
    
    But if we do not always loop indefinitely, we do need to handle the case
    of swap swizzled back to page before shmem_free_swap() gets it: add a
    retry for that case, as suggested by Konstantin Khlebnikov; and for the
    case of page swizzled back to swap, as suggested by Johannes Weiner.
    
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
    Suggested-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Konstantin Khlebnikov <koct9i@gmail.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Lukas Czerner <lczerner@redhat.com>
    Cc: Dave Jones <davej@redhat.com>
    Cc: <stable@vger.kernel.org>	[3.1+]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    b1a36650