Skip to content
  • Glauber Costa's avatar
    cfq: fix starvation of asynchronous writes · 9362516c
    Glauber Costa authored
    commit 3932a86b4b9d1f0b049d64d4591ce58ad18b44ec upstream.
    
    While debugging timeouts happening in my application workload (ScyllaDB), I have
    observed calls to open() taking a long time, ranging everywhere from 2 seconds -
    the first ones that are enough to time out my application - to more than 30
    seconds.
    
    The problem seems to happen because XFS may block on pending metadata updates
    under certain circumnstances, and that's confirmed with the following backtrace
    taken by the offcputime tool (iovisor/bcc):
    
        ffffffffb90c57b1 finish_task_switch
        ffffffffb97dffb5 schedule
        ffffffffb97e310c schedule_timeout
        ffffffffb97e1f12 __down
        ffffffffb90ea821 down
        ffffffffc046a9dc xfs_buf_lock
        ffffffffc046abfb _xfs_buf_find
        ffffffffc046ae4a xfs_buf_get_map
        ffffffffc046babd xfs_buf_read_map
        ffffffffc0499931 xfs_trans_read_buf_map
        ffffffffc044a561 xfs_da_read_buf
        ffffffffc0451390 xfs_dir3_leaf_read.constprop.16
        ffffffffc0452b90 xfs_di...
    9362516c