• Nick Piggin's avatar
    mm lockless pagecache barrier fix · e8c82c2e
    Nick Piggin authored
    An XFS workload showed up a bug in the lockless pagecache patch. Basically it
    would go into an "infinite" loop, although it would sometimes be able to break
    out of the loop! The reason is a missing compiler barrier in the "increment
    reference count unless it was zero" case of the lockless pagecache protocol in
    the gang lookup functions.
    
    This would cause the compiler to use a cached value of struct page pointer to
    retry the operation with, rather than reload it. So the page might have been
    removed from pagecache and freed (refcount==0) but the lookup would not correctly
    notice the page is no longer in pagecache, and keep attempting to increment the
    refcount and failing, until the page gets reallocated for something else. This
    isn't a data corruption because the condition will be detected if the page has
    been reallocated. However it can result in a lockup.
    
    Linus points out that ACCESS_ONCE is also required in that pointer load, even
    if it's absence is not causing a bug on our particular build. The most general
    way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
    
    Assembly of find_get_pages,
    before:
    .L220:
            movq    (%rbx), %rax    #* ivtmp.1162, tmp82
            movq    (%rax), %rdi    #, prephitmp.1149
    .L218:
            testb   $1, %dil        #, prephitmp.1149
            jne     .L217   #,
            testq   %rdi, %rdi      # prephitmp.1149
            je      .L203   #,
            cmpq    $-1, %rdi       #, prephitmp.1149
            je      .L217   #,
            movl    8(%rdi), %esi   # <variable>._count.counter, c
            testl   %esi, %esi      # c
            je      .L218   #,
    
    after:
    .L212:
            movq    (%rbx), %rax    #* ivtmp.1109, tmp81
            movq    (%rax), %rdi    #, ret
            testb   $1, %dil        #, ret
            jne     .L211   #,
            testq   %rdi, %rdi      # ret
            je      .L197   #,
            cmpq    $-1, %rdi       #, ret
            je      .L211   #,
            movl    8(%rdi), %esi   # <variable>._count.counter, c
            testl   %esi, %esi      # c
            je      .L212   #,
    
    (notice the obvious infinite loop in the first example, if page->count remains 0)
    Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
    Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e8c82c2e
radix-tree.h 6.74 KB