1. 07 Feb, 2006 2 commits
  2. 05 Feb, 2006 1 commit
  3. 08 Jan, 2006 1 commit
  4. 06 Jan, 2006 7 commits
  5. 22 Nov, 2005 1 commit
    • Eric Paris's avatar
      [PATCH] hugetlb: fix race in set_max_huge_pages for multiple updaters of nr_huge_pages · 0bd0f9fb
      Eric Paris authored
      
      
      If there are multiple updaters to /proc/sys/vm/nr_hugepages simultaneously
      it is possible for the nr_huge_pages variable to become incorrect.  There
      is no locking in the set_max_huge_pages function around
      alloc_fresh_huge_page which is able to update nr_huge_pages.  Two callers
      to alloc_fresh_huge_page could race against each other as could a call to
      alloc_fresh_huge_page and a call to update_and_free_page.  This patch just
      expands the area covered by the hugetlb_lock to cover the call into
      alloc_fresh_huge_page.  I'm not sure how we could say that a sysctl section
      is performance critical where more specific locking would be needed.
      
      My reproducer was to run a couple copies of the following script
      simultaneously
      
      while [ true ]; do
      	echo 1000 > /proc/sys/vm/nr_hugepages
      	echo 500 > /proc/sys/vm/nr_hugepages
      	echo 750 > /proc/sys/vm/nr_hugepages
      	echo 100 > /proc/sys/vm/nr_hugepages
      	echo 0 > /proc/sys/vm/nr_hugepages
      done
      
      and then watch /proc/meminfo and eventually you will see things like
      
      HugePages_Total:     100
      HugePages_Free:      109
      
      After applying the patch all seemed well.
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Acked-by: default avatarWilliam Irwin <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0bd0f9fb
  6. 07 Nov, 2005 1 commit
  7. 06 Nov, 2005 1 commit
    • Benjamin Herrenschmidt's avatar
      [PATCH] ppc64: support 64k pages · 3c726f8d
      Benjamin Herrenschmidt authored
      
      
      Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
      base page size to 64K.  The resulting kernel still boots on any
      hardware.  On current machines with 4K pages support only, the kernel
      will maintain 16 "subpages" for each 64K page transparently.
      
      Note that while real 64K capable HW has been tested, the current patch
      will not enable it yet as such hardware is not released yet, and I'm
      still verifying with the firmware architects the proper to get the
      information from the newer hypervisors.
      
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3c726f8d
  8. 29 Oct, 2005 5 commits
    • Adam Litke's avatar
      [PATCH] hugetlb: demand fault handler · 4c887265
      Adam Litke authored
      
      
      Below is a patch to implement demand faulting for huge pages.  The main
      motivation for changing from prefaulting to demand faulting is so that huge
      page memory areas can be allocated according to NUMA policy.
      
      Thanks to consolidated hugetlb code, switching the behavior requires changing
      only one fault handler.  The bulk of the patch just moves the logic from
      hugelb_prefault() to hugetlb_pte_fault() and find_get_huge_page().
      
      Signed-off-by: default avatarAdam Litke <agl@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4c887265
    • Hugh Dickins's avatar
      [PATCH] mm: unmap_vmas with inner ptlock · 508034a3
      Hugh Dickins authored
      
      
      Remove the page_table_lock from around the calls to unmap_vmas, and replace
      the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
      now safe to descend without page_table_lock.
      
      Don't attempt fancy locking for hugepages, just take page_table_lock in
      unmap_hugepage_range.  Which makes zap_hugepage_range, and the hugetlb test in
      zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway.  Nor
      does unmap_vmas have much use for its mm arg now.
      
      The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
      page_table_lock: if they're implemented at all, they typically come down to
      flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
      (which we already audited for the mprotect case).
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      508034a3
    • Hugh Dickins's avatar
      [PATCH] mm: ptd_alloc take ptlock · c74df32c
      Hugh Dickins authored
      
      
      Second step in pushing down the page_table_lock.  Remove the temporary
      bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not
      to hold page_table_lock, whether it's on init_mm or a user mm; take
      page_table_lock internally to check if a racing task already allocated.
      
      Convert their callers from common code.  But avoid coming back to change them
      again later: instead of moving the spin_lock(&mm->page_table_lock) down,
      switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which
      encapsulate the mapping+locking and unlocking+unmapping together, and in the
      end may use alternatives to the mm page_table_lock itself.
      
      These callers all hold mmap_sem (some exclusively, some not), so at no level
      can a page table be whipped away from beneath them; and pte_alloc uses the
      "atomic" pmd_present to test whether it needs to allocate.  It appears that on
      all arches we can safely descend without page_table_lock.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c74df32c
    • Hugh Dickins's avatar
      [PATCH] mm: update_hiwaters just in time · 365e9c87
      Hugh Dickins authored
      
      
      update_mem_hiwater has attracted various criticisms, in particular from those
      concerned with mm scalability.  Originally it was called whenever rss or
      total_vm got raised.  Then many of those callsites were replaced by a timer
      tick call from account_system_time.  Now Frank van Maarseveen reports that to
      be found inadequate.  How about this?  Works for Frank.
      
      Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
      update_hiwater_rss and update_hiwater_vm.  Don't attempt to keep
      mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
      by 1): those are hot paths.  Do the opposite, update only when about to lower
      rss (usually by many), or just before final accounting in do_exit.  Handle
      mm->hiwater_vm in the same way, though it's much less of an issue.  Demand
      that whoever collects these hiwater statistics do the work of taking the
      maximum with rss or total_vm.
      
      And there has been no collector of these hiwater statistics in the tree.  The
      new convention needs an example, so match Frank's usage by adding a VmPeak
      line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS
      (High-Water-Mark or High-Water-Memory).
      
      There was a particular anomaly during mremap move, that hiwater_vm might be
      captured too high.  A fleeting such anomaly remains, but it's quickly
      corrected now, whereas before it would stick.
      
      What locking?  None: if the app is racy then these statistics will be racy,
      it's not worth any overhead to make them exact.  But whenever it suits,
      hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
      page_table_lock (for now) or with preemption disabled (later on): without
      going to any trouble, minimize the time between reading current values and
      updating, to minimize those occasions when a racing thread bumps a count up
      and back down in between.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      365e9c87
    • Hugh Dickins's avatar
      [PATCH] mm: rss = file_rss + anon_rss · 4294621f
      Hugh Dickins authored
      
      
      I was lazy when we added anon_rss, and chose to change as few places as
      possible.  So currently each anonymous page has to be counted twice, in rss
      and in anon_rss.  Which won't be so good if those are atomic counts in some
      configurations.
      
      Change that around: keep file_rss and anon_rss separately, and add them
      together (with get_mm_rss macro) when the total is needed - reading two
      atomics is much cheaper than updating two atomics.  And update anon_rss
      upfront, typically in memory.c, not tucked away in page_add_anon_rmap.
      
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4294621f
  9. 20 Oct, 2005 2 commits
  10. 05 Sep, 2005 1 commit
    • Adam Litke's avatar
      [PATCH] hugetlb: move stale pte check into huge_pte_alloc() · 7bf07f3d
      Adam Litke authored
      Initial Post (Wed, 17 Aug 2005)
      
      This patch moves the
      	if (! pte_none(*pte))
      		hugetlb_clean_stale_pgtable(pte);
      logic into huge_pte_alloc() so all of its callers can be immune to the bug
      described by Kenneth Chen at http://lkml.org/lkml/2004/6/16/246
      
      
      
      > It turns out there is a bug in hugetlb_prefault(): with 3 level page table,
      > huge_pte_alloc() might return a pmd that points to a PTE page. It happens
      > if the virtual address for hugetlb mmap is recycled from previously used
      > normal page mmap. free_pgtables() might not scrub the pmd entry on
      > munmap and hugetlb_prefault skips on any pmd presence regardless what type
      > it is.
      
      Unless I am missing something, it seems more correct to place the check inside
      huge_pte_alloc() to prevent a the same bug wherever a huge pte is allocated.
      It also allows checking for this condition when lazily faulting huge pages
      later in the series.
      
      Signed-off-by: default avatarAdam Litke <agl@us.ibm.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7bf07f3d
  11. 05 Aug, 2005 1 commit
  12. 21 Jun, 2005 1 commit
    • David Gibson's avatar
      [PATCH] Hugepage consolidation · 63551ae0
      David Gibson authored
      
      
      A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
      attempts to consolidate a lot of the code across the arch's, putting the
      combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
      order to covert all the hugepage archs, but the result is a very large
      reduction in the total amount of code.  It also means things like hugepage
      lazy allocation could be implemented in one place, instead of six.
      
      Tested, at least a little, on ppc64, i386 and x86_64.
      
      Notes:
      	- this patch changes the meaning of set_huge_pte() to be more
      	  analagous to set_pte()
      	- does SH4 need s special huge_ptep_get_and_clear()??
      
      Acked-by: default avatarWilliam Lee Irwin <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      63551ae0
  13. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4