1. 20 Oct, 2006 1 commit
    • Andrew Morton's avatar
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton authored
      
      
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3fcfab16
  2. 30 Sep, 2006 2 commits
    • David Howells's avatar
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells authored
      
      
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9361401e
    • David Howells's avatar
      [PATCH] BLOCK: Separate the bounce buffering code from the highmem code [try #6] · 831058de
      David Howells authored
      
      
      Move the bounce buffer code from mm/highmem.c to mm/bounce.c so that it can be
      more easily disabled when the block layer is disabled.
      
      !!!NOTE!!! There may be a bug in this code: Should init_emergency_pool() be
      	   contingent on CONFIG_HIGHMEM?
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      831058de
  3. 29 Sep, 2006 1 commit
  4. 26 Sep, 2006 1 commit
  5. 30 Jun, 2006 1 commit
    • Christoph Lameter's avatar
      [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h · f6ac2354
      Christoph Lameter authored
      NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
      event counters do not need to be.
      
      Zone based VM statistics are necessary to be able to determine what the state
      of memory in one zone is.  In a NUMA system this can be helpful for local
      reclaim and other memory optimizations that may be able to shift VM load in
      order to get more balanced memory use.
      
      It is also useful to know how the computing load affects the memory
      allocations on various zones.  This patchset allows the retrieval of that data
      from userspace.
      
      The patchset introduces a framework for counters that is a cross between the
      existing page_stats --which are simply global counters split per cpu-- and the
      approach of deferred incremental updates implemented for nr_pagecache.
      
      Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
      certain thresholds then the counters are accumulated in an array of
      atomic_long in the zone and in a global array that sums up all zone values.
      The small 8 bit counters are next to the per cpu page pointers and so they
      will be in high in the cpu cache when pages are allocated and freed.
      
      Access to VM counter information for a zone and for the whole machine is then
      possible by simply indexing an array (Thanks to Nick Piggin for pointing out
      that approach).  The access to the total number of pages of various types does
      no longer require the summing up of all per cpu counters.
      
      Benefits of this patchset right now:
      
      - Ability for UP and SMP configuration to determine how memory
        is balanced between the DMA, NORMAL and HIGHMEM zones.
      
      - loops over all processors are avoided in writeback and
        reclaim paths. We can avoid caching the writeback information
        because the needed information is directly accessible.
      
      - Special handling for nr_pagecache removed.
      
      - zone_reclaim_interval vanishes since VM stats can now determine
        when it is worth to do local reclaim.
      
      - Fast inline per node page state determination.
      
      - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
        counters are counting simply which processor allocated a page somewhere
        and guestimate based on that. So the counters were not useful to show
        the actual distribution of page use on a specific zone.
      
      - The swap_prefetch patch requires per node statistics in order to
        figure out when processors of a node can prefetch. This patch provides
        some of the needed numbers.
      
      - Detailed VM counters available in more /proc and /sys status files.
      
      References to earlier discussions:
      V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
      V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
      V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
      V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2
      
      
      
      Performance tests with AIM7 did not show any regressions.  Seems to be a tad
      faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
      fixes for s390/arm/uml arch code.
      
      This patch:
      
      Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.
      
      Create vmstat.c/vmstat.h by separating the counter code and the proc
      functions.
      
      Move the vm_stat_text array before zoneinfo_show.
      
      [akpm@osdl.org: s390 build fix]
      [akpm@osdl.org: HOTPLUG_CPU build fix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f6ac2354
  6. 27 Mar, 2006 1 commit
  7. 22 Mar, 2006 1 commit
    • Christoph Lameter's avatar
      [PATCH] page migration reorg · b20a3503
      Christoph Lameter authored
      
      
      Centralize the page migration functions in anticipation of additional
      tinkering.  Creates a new file mm/migrate.c
      
      1. Extract buffer_migrate_page() from fs/buffer.c
      
      2. Extract central migration code from vmscan.c
      
      3. Extract some components from mempolicy.c
      
      4. Export pageout() and remove_from_swap() from vmscan.c
      
      5. Make it possible to configure NUMA systems without page migration
         and non-NUMA systems with page migration.
      
      I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b20a3503
  8. 08 Jan, 2006 2 commits
    • Matt Mackall's avatar
      [PATCH] slob: introduce the SLOB allocator · 10cef602
      Matt Mackall authored
      
      
      configurable replacement for slab allocator
      
      This adds a CONFIG_SLAB option under CONFIG_EMBEDDED.  When CONFIG_SLAB is
      disabled, the kernel falls back to using the 'SLOB' allocator.
      
      SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer,
      similar to the original Linux kmalloc allocator that SLAB replaced.  It's
      signicantly smaller code and is more memory efficient.  But like all
      similar allocators, it scales poorly and suffers from fragmentation more
      than SLAB, so it's only appropriate for small systems.
      
      It's been tested extensively in the Linux-tiny tree.  I've also
      stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not
      recommended).
      
      Here's a comparison for otherwise identical builds, showing SLOB saving
      nearly half a megabyte of RAM:
      
      $ size vmlinux*
         text    data     bss     dec     hex filename
      3336372  529360  190812 4056544  3de5e0 vmlinux-slab
      3323208  527948  190684 4041840  3dac70 vmlinux-slob
      
      $ size mm/{slab,slob}.o
         text    data     bss     dec     hex filename
        13221     752      48   14021    36c5 mm/slab.o
         1896      52       8    1956     7a4 mm/slob.o
      
      /proc/meminfo:
                        SLAB          SLOB      delta
      MemTotal:        27964 kB      27980 kB     +16 kB
      MemFree:         24596 kB      25092 kB    +496 kB
      Buffers:            36 kB         36 kB       0 kB
      Cached:           1188 kB       1188 kB       0 kB
      SwapCached:          0 kB          0 kB       0 kB
      Active:            608 kB        600 kB      -8 kB
      Inactive:          808 kB        812 kB      +4 kB
      HighTotal:           0 kB          0 kB       0 kB
      HighFree:            0 kB          0 kB       0 kB
      LowTotal:        27964 kB      27980 kB     +16 kB
      LowFree:         24596 kB      25092 kB    +496 kB
      SwapTotal:           0 kB          0 kB       0 kB
      SwapFree:            0 kB          0 kB       0 kB
      Dirty:               4 kB         12 kB      +8 kB
      Writeback:           0 kB          0 kB       0 kB
      Mapped:            560 kB        556 kB      -4 kB
      Slab:             1756 kB          0 kB   -1756 kB
      CommitLimit:     13980 kB      13988 kB      +8 kB
      Committed_AS:     4208 kB       4208 kB       0 kB
      PageTables:         28 kB         28 kB       0 kB
      VmallocTotal:  1007312 kB    1007312 kB       0 kB
      VmallocUsed:        48 kB         48 kB       0 kB
      VmallocChunk:  1007264 kB    1007264 kB       0 kB
      
      (this work has been sponsored in part by CELF)
      
      From: Ingo Molnar <mingo@elte.hu>
      
         Fix 32-bitness bugs in mm/slob.c.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      10cef602
    • Matt Mackall's avatar
      [PATCH] slob: introduce mm/util.c for shared functions · 30992c97
      Matt Mackall authored
      
      
      Add mm/util.c for functions common between SLAB and SLOB.
      Signed-off-by: default avatarMatt Mackall <mpm@selenic.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      30992c97
  9. 29 Oct, 2005 1 commit
  10. 24 Jun, 2005 1 commit
    • Carsten Otte's avatar
      [PATCH] xip: fs/mm: execute in place · ceffc078
      Carsten Otte authored
      
      
      - generic_file* file operations do no longer have a xip/non-xip split
      - filemap_xip.c implements a new set of fops that require get_xip_page
        aop to work proper. all new fops are exported GPL-only (don't like to
        see whatever code use those except GPL modules)
      - __xip_unmap now uses page_check_address, which is no longer static
        in rmap.c, and defined in linux/rmap.h
      - mm/filemap.h is now much more clean, plainly having just Linus'
        inline funcs moved here from filemap.c
      - fix includes in filemap_xip to make it build cleanly on i386
      Signed-off-by: default avatarCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ceffc078
  11. 23 Jun, 2005 1 commit
    • Andy Whitcroft's avatar
      [PATCH] sparsemem memory model · d41dee36
      Andy Whitcroft authored
      
      
      Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
      mem_map[] is needed by discontiguous memory machines (like in the old
      CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
      replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
      become a complete replacement.
      
      A significant advantage over DISCONTIGMEM is that it's completely separated
      from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
      and DISCONTIG are often confused.
      
      Another advantage is that sparse doesn't require each NUMA node's ranges to be
      contiguous.  It can handle overlapping ranges between nodes with no problems,
      where DISCONTIGMEM currently throws away that memory.
      
      Sparsemem uses an array to provide different pfn_to_page() translations for
      each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
      to be chopped up.
      
      In order to do quick pfn_to_page() operations, the section number of the page
      is encoded in page->flags.  Part of the sparsemem infrastructure enables
      sharing of these bits more dynamically (at compile-time) between the
      page_zone() and sparsemem operations.  However, on 32-bit architectures, the
      number of bits is quite limited, and may require growing the size of the
      page->flags type in certain conditions.  Several things might force this to
      occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
      memory), an increase in the physical address space, or an increase in the
      number of used page->flags.
      
      One thing to note is that, once sparsemem is present, the NUMA node
      information no longer needs to be stored in the page->flags.  It might provide
      speed increases on certain platforms and will be stored there if there is
      room.  But, if out of room, an alternate (theoretically slower) mechanism is
      used.
      
      This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
      there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
      often have to compile out the same areas of code.
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: default avatarMartin Bligh <mbligh@aracnet.com>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarBob Picco <bob.picco@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d41dee36
  12. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4