Skip to content
  • Prakash Surya's avatar
    staging: lustre: osc: Track and limit "unstable" pages · ac5b1481
    Prakash Surya authored
    
    
    This change adds a global counter to track the number of "unstable"
    pages held by a given client, along with per file system counters. An
    "unstable" page is defined as a page which has been sent to the server
    as part of a bulk request, but is uncommitted to stable storage.
    
    In addition to simply tracking the unstable pages, they now also count
    towards the maximum number of "pinned" pages on the system at any given
    time. Thus, a client will now be bound on the number of dirty and
    unstable pages it can pin in memory. Previously only dirty pages were
    accounted for in this limit.
    
    In addition to tracking the number of unstable pages in Lustre, the
    NR_UNSTABLE_NFS memory zone is also incremented and decremented for
    easy monitoring using the "NFS_Unstable:" field in /proc/meminfo.
    This field is also used internally by the kernel to limit the total
    amount of unstable pages on the system.
    
    The motivation for this change is twofold. First, the client must not
    allow itself to disconnect from an OST while still holding unstable
    pages. Otherwise, these unstable pages can get lost due to an OST
    failure, and replay is not possible due to the disconnect via unmount.
    
    Secondly, the client needs a mechanism to prevent it from allocating too
    much of its available RAM to unreclaimable pages pinned by the ptlrpc
    layer. If this case occurs, out of memory events can trigger as a side
    effect, which we need to avoid.
    
    The current number of unstable pages accounted for on a per file system
    granularity is exported by the unstable_stats proc file, contained under
    each file system's llite namespace. An example of retrieving this
    information is below:
    
    	$ lctl get_param llite.*.unstable_stats
    
    Signed-off-by: default avatarPrakash Surya <surya1@llnl.gov>
    Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2139
    Reviewed-on: http://review.whamcloud.com/6284
    
    
    Reviewed-by: default avatarJinshan Xiong <jinshan.xiong@intel.com>
    Reviewed-by: default avatarAndreas Dilger <andreas.dilger@intel.com>
    Reviewed-by: default avatarOleg Drokin <oleg.drokin@intel.com>
    Signed-off-by: default avatarJames Simmons <jsimmons@infradead.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    ac5b1481