1. 29 Jan, 2009 2 commits
  2. 15 Jan, 2009 6 commits
    • Li Zefan's avatar
      memcg: fix a race when setting memory.swappiness · 068b38c1
      Li Zefan authored
      
      
      (suppose: memcg->use_hierarchy == 0 and memcg->swappiness == 60)
      
      echo 10 > /memcg/0/swappiness   |
        mem_cgroup_swappiness_write() |
          ...                         | echo 1 > /memcg/0/use_hierarchy
                                      | mkdir /mnt/0/1
                                      |   sub_memcg->swappiness = 60;
          memcg->swappiness = 10;     |
      
      In the above scenario, we end up having 2 different swappiness
      values in a single hierarchy.
      
      We should hold cgroup_lock() when cheking cgrp->children list.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      068b38c1
    • Li Zefan's avatar
      memcg: fix section mismatch · 0eb253e2
      Li Zefan authored
      
      
      At system boot when creating the top cgroup, mem_cgroup_create() calls
      enable_swap_cgroup() which is marked as __init, so mark
      mem_cgroup_create() as __ref to avoid false section mismatch warning.
      Reported-by: default avatarRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by; KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0eb253e2
    • Daisuke Nishimura's avatar
      memcg: make oom less frequently · 4d1c6273
      Daisuke Nishimura authored
      
      
      In previous implementation, mem_cgroup_try_charge checked the return
      value of mem_cgroup_try_to_free_pages, and just retried if some pages
      had been reclaimed.
      But now, try_charge(and mem_cgroup_hierarchical_reclaim called from it)
      only checks whether the usage is less than the limit.
      
      This patch tries to change the behavior as before to cause oom less
      frequently.
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4d1c6273
    • Daisuke Nishimura's avatar
      memcg: fix hierarchical reclaim · c268e994
      Daisuke Nishimura authored
      
      
      If root_mem has no children, last_scaned_child is set to root_mem itself.
      But after some children added to root_mem, mem_cgroup_get_next_node can
      mem_cgroup_put the root_mem although root_mem has not been mem_cgroup_get.
      
      This patch fixes this behavior by:
      
      - Set last_scanned_child to NULL if root_mem has no children or DFS
        search has returned to root_mem itself(root_mem is not a "child" of
        root_mem).  Make mem_cgroup_get_first_node return root_mem in this case.
         There are no mem_cgroup_get/put for root_mem.
      
      - Rename mem_cgroup_get_next_node to __mem_cgroup_get_next_node, and
        mem_cgroup_get_first_node to mem_cgroup_get_next_node.  Make
        mem_cgroup_hierarchical_reclaim call only new mem_cgroup_get_next_node.
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c268e994
    • Daisuke Nishimura's avatar
      memcg: fix error path of mem_cgroup_move_parent · 40d58138
      Daisuke Nishimura authored
      
      
      There is a bug in error path of mem_cgroup_move_parent.
      
      Extra refcnt got from try_charge should be dropped, and usages incremented
      by try_charge should be decremented in both error paths:
      
          A: failure at get_page_unless_zero
          B: failure at isolate_lru_page
      
      This bug makes this parent directory unremovable.
      
      In case of A, rmdir doesn't return, because res.usage doesn't go down to 0
      at mem_cgroup_force_empty even after all the pc in lru are removed.
      
      In case of B, rmdir fails and returns -EBUSY, because it has extra ref
      counts even after res.usage goes down to 0.
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40d58138
    • Daisuke Nishimura's avatar
      memcg: fix mem_cgroup_get_reclaim_stat_from_page · bd112db8
      Daisuke Nishimura authored
      
      
      In case of swapin, a new page is added to lru before it is charged,
      so page->pc->mem_cgroup points to NULL or last mem_cgroup the page
      was charged before.
      
      In the latter case, if the mem_cgroup has already freed by rmdir,
      the area pointed to by page->pc->mem_cgroup may have invalid data.
      
      Actually, I saw general protection fault.
      
          general protection fault: 0000 [#1] SMP
          last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
          CPU 4
          Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery ac lp sg ide_cd_mod cdrom button serio_raw acpi_memhotplug parport_pc e1000 rtc_cmos parport rtc_core rtc_lib i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
          Pid: 26038, comm: page01 Tainted: G        W  2.6.28-rc9-mm1-mmotm-2008-12-22-16-14-f2ab3dea #1
          RIP: 0010:[<ffffffff8028e710>]  [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
          RSP: 0000:ffff8801ee457da8  EFLAGS: 00010002
          RAX: 32353438312021c8 RBX: 0000000000000000 RCX: 32353438312021c8
          RDX: 0000000000000000 RSI: ffff8800cb0b1000 RDI: ffff8801164d1d28
          RBP: ffff880110002cb8 R08: ffff88010f2eae23 R09: 0000000000000001
          R10: ffff8800bc514b00 R11: ffff880110002c00 R12: 0000000000000000
          R13: ffff88000f484100 R14: 0000000000000003 R15: 00000000001200d2
          FS:  00007f8a261726f0(0000) GS:ffff88010f2eaa80(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 00007f8a25d22000 CR3: 00000001ef18c000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process page01 (pid: 26038, threadinfo ffff8801ee456000, task ffff8800b585b960)
          Stack:
           ffffe200071ee568 ffff880110001f00 0000000000000000 ffffffff8028ea17
           ffff88000f484100 0000000000000000 0000000000000020 00007f8a25d22000
           ffff8800bc514b00 ffffffff8028ec34 0000000000000000 0000000000016fd8
          Call Trace:
           [<ffffffff8028ea17>] ? ____pagevec_lru_add+0xc1/0x13c
           [<ffffffff8028ec34>] ? drain_cpu_pagevecs+0x36/0x89
           [<ffffffff802a4f8c>] ? swapin_readahead+0x78/0x98
           [<ffffffff8029a37a>] ? handle_mm_fault+0x3d9/0x741
           [<ffffffff804da654>] ? do_page_fault+0x3ce/0x78c
           [<ffffffff804d7a42>] ? trace_hardirqs_off_thunk+0x3a/0x3c
           [<ffffffff804d860f>] ? page_fault+0x1f/0x30
          Code: cc 55 48 8d af b8 0d 00 00 48 89 f7 53 89 d3 e8 39 85 02 00 48 63 d3 48 ff 44 d5 10 45 85 e4 74 05 48 ff 44 d5 00 48 85 c0 74 0e <48> ff 44 d0 10 45 85 e4 74 04 48 ff 04 d0 5b 5d 41 5c c3 41 54
          RIP  [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
           RSP <ffff8801ee457da8>
      Signed-off-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd112db8
  3. 08 Jan, 2009 32 commits