Skip to content
  • Tejun Heo's avatar
    cgroup: don't hold css_set_rwsem across css task iteration · ed27b9f7
    Tejun Heo authored
    
    
    css_sets are synchronized through css_set_rwsem but the locking scheme
    is kinda bizarre.  The hot paths - fork and exit - have to write lock
    the rwsem making the rw part pointless; furthermore, many readers
    already hold cgroup_mutex.
    
    One of the readers is css task iteration.  It read locks the rwsem
    over the entire duration of iteration.  This leads to silly locking
    behavior.  When cpuset tries to migrate processes of a cgroup to a
    different NUMA node, css_set_rwsem is held across the entire migration
    attempt which can take a long time locking out forking, exiting and
    other cgroup operations.
    
    This patch updates css task iteration so that it locks css_set_rwsem
    only while the iterator is being advanced.  css task iteration
    involves two levels - css_set and task iteration.  As css_sets in use
    are practically immutable, simply pinning the current one is enough
    for resuming iteration afterwards.  Task iteration is tricky as tasks
    may leave their css_set while iteration is in progress.  This is
    solved by keeping track of active iterators and advancing them if
    their next task leaves its css_set.
    
    v2: put_task_struct() in css_task_iter_next() moved outside
        css_set_rwsem.  A later patch will add cgroup operations to
        task_struct free path which may grab the same lock and this avoids
        deadlock possibilities.
    
        css_set_move_task() updated to use list_for_each_entry_safe() when
        walking task_iters and advancing them.  This is necessary as
        advancing an iter may remove it from the list.
    
    Signed-off-by: default avatarTejun Heo <tj@kernel.org>
    ed27b9f7