1. 13 Jul, 2013 3 commits
  2. 08 Jul, 2013 2 commits
  3. 05 Jul, 2013 1 commit
  4. 03 Jul, 2013 34 commits
    • Paul Clements's avatar
      nbd: correct disconnect behavior · c378f70a
      Paul Clements authored
      
      
      Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
      ioctl) the return from NBD_DO_IT is undefined (it is usually one of
      several error codes).  This means that nbd-client does not know if a
      manual disconnect was performed or whether a network error occurred.
      Because of this, nbd-client's persist mode (which tries to reconnect after
      error, but not after manual disconnect) does not always work correctly.
      
      This change fixes this by causing NBD_DO_IT to always return 0 if a user
      requests a disconnect.  This means that nbd-client can correctly either
      persist the connection (if an error occurred) or disconnect (if the user
      requested it).
      Signed-off-by: default avatarPaul Clements <paul.clements@steeleye.com>
      Acked-by: default avatarRob Landley <rob@landley.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c378f70a
    • Alexandre Bounine's avatar
      rapidio: change endpoint device name format · 6ca40c25
      Alexandre Bounine authored
      
      
      Change endpoint device name format to use a component tag value instead of
      device destination ID.
      
      RapidIO specification defines a component tag to be a unique identifier
      for devices in a network.  RapidIO switches already use component tag as
      part of their device name and also use it for device identification when
      processing error management event notifications.
      
      Forming an endpoint's device name using its component tag instead of
      destination ID allows to keep sysfs device directories unchanged in case
      if a routing process dynamically changes endpoint's destination ID as a
      result of route optimization.
      
      This change should not affect any existing users because a valid device
      destination ID always should be obtained by reading "destid" attribute and
      not by parsing device name.
      
      This patch also removes switchid member from struct rio_switch because it
      simply duplicates the component tag and does not have other use than in
      device name generation.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
      Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
      Cc: Stef van Os <stef.van.os@Prodrive.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ca40c25
    • Alexandre Bounine's avatar
      rapidio: add udev notification · 3bdbb62f
      Alexandre Bounine authored
      
      
      Add RapidIO-specific modalias generation to enable udev notifications
      about RapidIO-specific events.
      
      The RapidIO modalias string format is shown below:
      
      "rapidio:vNNNNdNNNNavNNNNadNNNN"
      
      Where:
      v  - Device Vendor ID (16 bit),
      d  - Device ID (16 bit),
      av - Assembly Vendor ID (16 bit),
      ad - Assembly ID (16 bit),
      
      as they are reported in corresponding Capability Registers (CARs)
      of each RapidIO device.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
      Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
      Cc: Stef van Os <stef.van.os@Prodrive.nl>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3bdbb62f
    • Alexandre Bounine's avatar
      rapidio: update enumerator registration mechanism · 9edbc30b
      Alexandre Bounine authored
      
      
      Update enumeration/discovery method registration mechanism to allow
      loading enumeration/discovery methods before all mports are registered.
      
      Existing statically linked RapidIO subsystem expects that all available
      RapidIO mport devices are initialized and registered before the
      enumeration/discovery method is registered.  Switching to loadable mport
      device drivers creates situation when mport device driver can be loaded
      after enumeration/discovery method is attached (e.g., loadable mport
      driver in a system with statically linked RapidIO core and enumerator).
      This also will happen in a system with hot-pluggable RapidIO controllers.
      
      To remove the dependency on the initialization/registration order this
      patch introduces enumeration/discovery registration mechanism that
      supports arbitrary registration order of mports and enumerator/discovery
      methods.
      
      The following registration rules are implemented:
      - only one enumeration/discovery method can be registered for given mport ID
        (including RIO_MPORT_ANY);
      - when new enumeration/discovery methods tries to attach to the registered mport
        device, method with matching mport ID will replace a default method previously
        registered for given mport (if any);
      - enumeration/discovery method with target ID=RIO_MPORT_ANY will be attached
        only to mports that do not have another enumerator attached to them;
      - when new mport device is registered with RapidIO subsystem, registration
        routine searches for the enumeration/discovery method with the best matching
        mport ID;
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
      Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
      Cc: Stef van Os <stef.van.os@Prodrive.nl>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9edbc30b
    • Alexandre Bounine's avatar
      rapidio: convert switch drivers to modules · 2ec3ba69
      Alexandre Bounine authored
      
      
      Rework RapidIO switch drivers to add an option to build them as loadable
      kernel modules.
      
      This patch removes RapidIO-specific vmlinux section and converts switch
      drivers to be compatible with LDM driver registration method.  To simplify
      registration of device-specific callback routines this patch introduces
      rio_switch_ops data structure.  The sw_sysfs() callback is removed from
      the list of device-specific operations because under the new structure its
      functions can be handled by switch driver's probe() and remove() routines.
      
      If a specific switch device driver is not loaded the RapidIO subsystem
      core will use default standard-based operations to configure a switch.
      Because the current implementation of RapidIO enumeration/discovery method
      relies on availability of device-specific operations for error management,
      switch device drivers must be loaded before the RapidIO
      enumeration/discovery starts.
      
      This patch also moves several common routines from enumeration/discovery
      module into the RapidIO core code to make switch-specific operations
      accessible to all components of RapidIO subsystem.
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
      Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
      Cc: Stef van Os <stef.van.os@Prodrive.nl>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2ec3ba69
    • Oleg Nesterov's avatar
      kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists · 81907739
      Oleg Nesterov authored
      
      
      copy_process() adds the new child to thread_group/init_task.tasks list and
      then does attach_pid(child, PIDTYPE_PID).  This means that the lockless
      next_thread() or next_task() can see this thread with the wrong pid.  Say,
      "ls /proc/pid/task" can list the same inode twice.
      
      We could move attach_pid(child, PIDTYPE_PID) up, but in this case
      find_task_by_vpid() can find the new thread before it was fully
      initialized.
      
      And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
      copy_process() initializes child->pids[*].pid first, then calls
      attach_pid() to insert the task into the pid->tasks list.
      
      attach_pid() no longer need the "struct pid*" argument, it is always
      called after pid_link->pid was already set.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81907739
    • Oleg Nesterov's avatar
      exit.c: unexport __set_special_pids() · 81dabb46
      Oleg Nesterov authored
      
      
      Move __set_special_pids() from exit.c to sys.c close to its single caller
      and make it static.
      
      And rename it to set_special_pids(), another helper with this name has
      gone away.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81dabb46
    • Jingoo Han's avatar
      lcd: add devm_lcd_device_{register,unregister}() · 1d0c48e6
      Jingoo Han authored
      
      
      These functions allow the driver core to automatically clean up any
      allocation made by lcd drivers.  Thus it simplifies the error paths.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d0c48e6
    • Jingoo Han's avatar
      backlight: add devm_backlight_device_{register,unregister}() · 8318fde4
      Jingoo Han authored
      
      
      These functions allow the driver core to automatically clean up any
      allocation made by backlight drivers.  Thus it simplifies the error
      paths.
      Signed-off-by: default avatarJingoo Han <jg1.han@samsung.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8318fde4
    • Bartlomiej Zolnierkiewicz's avatar
      drivers/dma: remove unused support for MEMSET operations · 48a9db46
      Bartlomiej Zolnierkiewicz authored
      There have never been any real users of MEMSET operations since they
      have been introduced in January 2007 by commit 7405f74b
      
       ("dmaengine:
      refactor dmaengine around dma_async_tx_descriptor").  Therefore remove
      support for them for now, it can be always brought back when needed.
      
      [sebastian.hesselbarth@gmail.com: fix drivers/dma/mv_xor]
      Signed-off-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: default avatarKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Acked-by: default avatarDan Williams <djbw@fb.com>
      Cc: Tomasz Figa <t.figa@samsung.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Kevin Hilman <khilman@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48a9db46
    • Jani Nikula's avatar
      dmi: add support for exact DMI matches in addition to substring matching · 5017b285
      Jani Nikula authored
      
      
      dmi_match() considers a substring match to be a successful match.  This is
      not always sufficient to distinguish between DMI data for different
      systems.  Add support for exact string matching using strcmp() in addition
      to the substring matching using strstr().
      
      The specific use case in the i915 driver is to allow us to use an exact
      match for D510MO, without also incorrectly matching D510MOV:
      
        {
      	.ident = "Intel D510MO",
      	.matches = {
      		DMI_MATCH(DMI_BOARD_VENDOR, "Intel"),
      		DMI_EXACT_MATCH(DMI_BOARD_NAME, "D510MO"),
      	},
        }
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Cc: <annndddrr@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Cornel Panceac <cpanceac@gmail.com>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5017b285
    • Kees Cook's avatar
      drivers: avoid format strings in names passed to alloc_workqueue() · d8537548
      Kees Cook authored
      
      
      For the workqueue creation interfaces that do not expect format strings,
      make sure they cannot accidently be parsed that way.  Additionally, clean
      up calls made with a single parameter that would be handled as a format
      string.  Many callers are passing potentially dynamic string content, so
      use "%s" in those cases to avoid any potential accidents.
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8537548
    • Dan Carpenter's avatar
      err.h: IS_ERR() can accept __user pointers · e7152b97
      Dan Carpenter authored
      
      
      Sparse generates a false positive when you pass a __user or __iomem
      pointer to the IS_ERR() functions.
      
        drivers/rtc/rtc-ds1286.c:344:36: sparse: incorrect type in argument 1 (different address spaces)
        drivers/rtc/rtc-ds1286.c:344:36:    expected void const *ptr
        drivers/rtc/rtc-ds1286.c:344:36:    got unsigned int [noderef] [usertype] <asn:2>*rtcregs
      
      We can silence these by adding a __force here and upgrading to Sparse
      v0.4.5-rc1 or later.
      
      This change has no effect when using current Sparse releases.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarChristopher Li <sparse@chrisli.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7152b97
    • Cody P Schafer's avatar
      sparsemem: add BUILD_BUG_ON when sizeof mem_section is non-power-of-2 · 55878e88
      Cody P Schafer authored
      Instead of leaving a hidden trap for the next person who comes along and
      wants to add something to mem_section, add a big fat warning about it
      needing to be a power-of-2, and insert a BUILD_BUG_ON() in sparse_init()
      to catch mistakes.
      
      Right now non-power-of-2 mem_sections cause a number of WARNs at boot
      (which don't clearly point to the size of mem_section as an issue), but
      the system limps on (temporarily, at least).
      
      This is based upon Dave Hansen's earlier RFC where he ran into the same
      issue:
      	"sparsemem: fix boot when SECTIONS_PER_ROOT is not power-of-2"
      	http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/03077.html
      
      Signed-off-by: default avatarCody P Schafer <cody@linux.vnet.ibm.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      55878e88
    • Jiang Liu's avatar
      mm: kill free_all_bootmem_node() · e1280be0
      Jiang Liu authored
      
      
      Now nobody makes use of free_all_bootmem_node(), kill it.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1280be0
    • Jiang Liu's avatar
      mm: introduce helper function set_max_mapnr() · fccc9987
      Jiang Liu authored
      
      
      Introduce a helper function set_max_mapnr() to set global variable
      max_mapnr.
      
      Also unify condition compilation for max_mapnr with
      CONFIG_NEED_MULTIPLE_NODES instead of CONFIG_DISCONTIGMEM.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fccc9987
    • Jiang Liu's avatar
      mm: kill global variable num_physpages · 18954181
      Jiang Liu authored
      
      
      Now all references to num_physpages have been removed, so kill it.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18954181
    • Jiang Liu's avatar
      mm: introduce helper function mem_init_print_info() to simplify mem_init() · 7ee3d4e8
      Jiang Liu authored
      
      
      Introduce helper function mem_init_print_info() to simplify mem_init()
      across different architectures, which also unifies the format and
      information printed.
      
      Function mem_init_print_info() calculates memory statistics information
      without walking each page, so it should be a little faster on some
      architectures.
      
      Also introduce another helper get_num_physpages() to kill the global
      variable num_physpages.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ee3d4e8
    • Jiang Liu's avatar
      mm: use a dedicated lock to protect totalram_pages and zone->managed_pages · c3d5f5f0
      Jiang Liu authored
      
      
      Currently lock_memory_hotplug()/unlock_memory_hotplug() are used to
      protect totalram_pages and zone->managed_pages.  Other than the memory
      hotplug driver, totalram_pages and zone->managed_pages may also be
      modified at runtime by other drivers, such as Xen balloon,
      virtio_balloon etc.  For those cases, memory hotplug lock is a little
      too heavy, so introduce a dedicated lock to protect totalram_pages and
      zone->managed_pages.
      
      Now we have a simplified locking rules totalram_pages and
      zone->managed_pages as:
      
      1) no locking for read accesses because they are unsigned long.
      2) no locking for write accesses at boot time in single-threaded context.
      3) serialize write accesses at runtime by acquiring the dedicated
         managed_page_count_lock.
      
      Also adjust zone->managed_pages when freeing reserved pages into the
      buddy system, to keep totalram_pages and zone->managed_pages in
      consistence.
      
      [akpm@linux-foundation.org: don't export adjust_managed_page_count to modules (for now)]
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c3d5f5f0
    • Jiang Liu's avatar
      mm: accurately calculate zone->managed_pages for highmem zones · 7b4b2a0d
      Jiang Liu authored
      
      
      Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
      that all highmem pages will be freed into the buddy system by function
      mem_init().  But that's not always true, some architectures may reserve
      some highmem pages during boot.  For example PPC may allocate highmem
      pages for giagant HugeTLB pages, and several architectures have code to
      check PageReserved flag to exclude highmem pages allocated during boot
      when freeing highmem pages into the buddy system.
      
      So treat highmem pages in the same way as normal pages, that is to:
      1) reset zone->managed_pages to zero in mem_init().
      2) recalculate managed_pages when freeing pages into the buddy system.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b4b2a0d
    • Jiang Liu's avatar
      mm: enhance free_reserved_area() to support poisoning memory with zero · dbe67df4
      Jiang Liu authored
      
      
      Address more review comments from last round of code review.
      1) Enhance free_reserved_area() to support poisoning freed memory with
         pattern '0'. This could be used to get rid of poison_init_mem()
         on ARM64.
      2) A previous patch has disabled memory poison for initmem on s390
         by mistake, so restore to the original behavior.
      3) Remove redundant PAGE_ALIGN() when calling free_reserved_area().
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dbe67df4
    • Jiang Liu's avatar
      mm: change signature of free_reserved_area() to fix building warnings · 11199692
      Jiang Liu authored
      
      
      Change signature of free_reserved_area() according to Russell King's
      suggestion to fix following build warnings:
      
        arch/arm/mm/init.c: In function 'mem_init':
        arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
          free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
          ^
        In file included from include/linux/mman.h:4:0,
                         from arch/arm/mm/init.c:15:
        include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
         extern unsigned long free_reserved_area(unsigned long start, unsigned long end,
      
         mm/page_alloc.c: In function 'free_reserved_area':
      >> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
         In file included from arch/mips/include/asm/page.h:49:0,
                          from include/linux/mmzone.h:20,
                          from include/linux/gfp.h:4,
                          from include/linux/mm.h:8,
                          from mm/page_alloc.c:18:
         arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
         mm/page_alloc.c: In function 'free_area_init_nodes':
         mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]
      
      Also address some minor code review comments.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11199692
    • Rafael Aquini's avatar
      swap: discard while swapping only if SWAP_FLAG_DISCARD_PAGES · dcf6b7dd
      Rafael Aquini authored
      
      
      Considering the use cases where the swap device supports discard:
      a) and can do it quickly;
      b) but it's slow to do in small granularities (or concurrent with other
         I/O);
      c) but the implementation is so horrendous that you don't even want to
         send one down;
      
      And assuming that the sysadmin considers it useful to send the discards down
      at all, we would (probably) want the following solutions:
      
        i. do the fine-grained discards for freed swap pages, if device is
           capable of doing so optimally;
       ii. do single-time (batched) swap area discards, either at swapon
           or via something like fstrim (not implemented yet);
      iii. allow doing both single-time and fine-grained discards; or
       iv. turn it off completely (default behavior)
      
      As implemented today, one can only enable/disable discards for swap, but
      one cannot select, for instance, solution (ii) on a swap device like (b)
      even though the single-time discard is regarded to be interesting, or
      necessary to the workload because it would imply (1), and the device is
      not capable of performing it optimally.
      
      This patch addresses the scenario depicted above by introducing a way to
      ensure the (probably) wanted solutions (i, ii, iii and iv) can be flexibly
      flagged through swapon(8) to allow a sysadmin to select the best suitable
      swap discard policy accordingly to system constraints.
      
      This patch introduces SWAP_FLAG_DISCARD_PAGES and SWAP_FLAG_DISCARD_ONCE
      new flags to allow more flexibe swap discard policies being flagged
      through swapon(8).  The default behavior is to keep both single-time, or
      batched, area discards (SWAP_FLAG_DISCARD_ONCE) and fine-grained discards
      for page-clusters (SWAP_FLAG_DISCARD_PAGES) enabled, in order to keep
      consistentcy with older kernel behavior, as well as maintain compatibility
      with older swapon(8).  However, through the new introduced flags the best
      suitable discard policy can be selected accordingly to any given swap
      device constraint.
      
      [akpm@linux-foundation.org: tweak comments]
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Acked-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Karel Zak <kzak@redhat.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcf6b7dd
    • Tim Chen's avatar
      mm: tune vm_committed_as percpu_counter batching size · 917d9290
      Tim Chen authored
      
      
      Currently the per cpu counter's batch size for memory accounting is
      configured as twice the number of cpus in the system.  However, for
      system with very large memory, it is more appropriate to make it
      proportional to the memory size per cpu in the system.
      
      For example, for a x86_64 system with 64 cpus and 128 GB of memory, the
      batch size is only 2*64 pages (0.5 MB).  So any memory accounting
      changes of more than 0.5MB will overflow the per cpu counter into the
      global counter.  Instead, for the new scheme, the batch size is
      configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), which is
      more inline with the memory size.
      
      I've done a repeated brk test of 800KB (from will-it-scale test suite)
      with 80 concurrent processes on a 4 socket Westmere machine with a total
      of 40 cores.  Without the patch, about 80% of cpu is spent on spin-lock
      contention within the vm_committed_as counter.  With the patch, there's
      a 73x speedup on the benchmark and the lock contention drops off almost
      entirely.
      
      [akpm@linux-foundation.org: fix section mismatch]
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      917d9290
    • Wanpeng Li's avatar
      mm/hugetlb: remove hugetlb_prefault · 5f1e31d2
      Wanpeng Li authored
      
      
      hugetlb_prefault() is not used any more, this patch removes it.
      Signed-off-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f1e31d2
    • Wanpeng Li's avatar
      mm/pageblock: remove get/set_pageblock_flags · 4c42efa2
      Wanpeng Li authored
      
      
      get_pageblock_flags and set_pageblock_flags are not used any more, this
      patch removes them.
      Signed-off-by: default avatarWanpeng Li <liwanp@linux.vnet.ibm.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c42efa2
    • Mel Gorman's avatar
      mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru · c53954a0
      Mel Gorman authored
      
      
      Similar to __pagevec_lru_add, this patch removes the LRU parameter from
      __lru_cache_add and lru_cache_add_lru as the caller does not control the
      exact LRU the page gets added to.  lru_cache_add_lru gets renamed to
      lru_cache_add the name is silly without the lru parameter.  With the
      parameter removed, it is required that the caller indicate if they want
      the page added to the active or inactive list by setting or clearing
      PageActive respectively.
      
      [akpm@linux-foundation.org: Suggested the patch]
      [gang.chen@asianux.com: fix used-unintialized warning]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com>
      Cc: Andrew Perepechko <anserper@ya.ru>
      Cc: Robin Dong <sanbai@taobao.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Bernd Schubert <bernd.schubert@fastmail.fm>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c53954a0
    • Mel Gorman's avatar
      mm: remove lru parameter from __pagevec_lru_add and remove parts of pagevec API · a0b8cab3
      Mel Gorman authored
      
      
      Now that the LRU to add a page to is decided at LRU-add time, remove the
      misleading lru parameter from __pagevec_lru_add.  A consequence of this
      is that the pagevec_lru_add_file, pagevec_lru_add_anon and similar
      helpers are misleading as the caller no longer has direct control over
      what LRU the page is added to.  Unused helpers are removed by this patch
      and existing users of pagevec_lru_add_file() are converted to use
      lru_cache_add_file() directly and use the per-cpu pagevecs instead of
      creating their own pagevec.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Alexey Lyahkov <alexey.lyashkov@gmail.com>
      Cc: Andrew Perepechko <anserper@ya.ru>
      Cc: Robin Dong <sanbai@taobao.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Bernd Schubert <bernd.schubert@fastmail.fm>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0b8cab3
    • HATAYAMA Daisuke's avatar
      vmalloc: introduce remap_vmalloc_range_partial · e69e9d4a
      HATAYAMA Daisuke authored
      
      
      We want to allocate ELF note segment buffer on the 2nd kernel in vmalloc
      space and remap it to user-space in order to reduce the risk that memory
      allocation fails on system with huge number of CPUs and so with huge ELF
      note segment that exceeds 11-order block size.
      
      Although there's already remap_vmalloc_range for the purpose of
      remapping vmalloc memory to user-space, we need to specify user-space
      range via vma.
       Mmap on /proc/vmcore needs to remap range across multiple objects, so
      the interface that requires vma to cover full range is problematic.
      
      This patch introduces remap_vmalloc_range_partial that receives user-space
      range as a pair of base address and size and can be used for mmap on
      /proc/vmcore case.
      
      remap_vmalloc_range is rewritten using remap_vmalloc_range_partial.
      
      [akpm@linux-foundation.org: use PAGE_ALIGNED()]
      Signed-off-by: default avatarHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Lisa Mitchell <lisa.mitchell@hp.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e69e9d4a
    • Andrew Morton's avatar
      include/linux/mm.h: add PAGE_ALIGNED() helper · 0fa73b86
      Andrew Morton authored
      
      
      To test whether an address is aligned to PAGE_SIZE.
      
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fa73b86
    • Cody P Schafer's avatar
      114d4b79
    • Cody P Schafer's avatar
    • Mel Gorman's avatar
      mm: vmscan: take page buffers dirty and locked state into account · b4597226
      Mel Gorman authored
      
      
      Page reclaim keeps track of dirty and under writeback pages and uses it
      to determine if wait_iff_congested() should stall or if kswapd should
      begin writing back pages.  This fails to account for buffer pages that
      can be under writeback but not PageWriteback which is the case for
      filesystems like ext3 ordered mode.  Furthermore, PageDirty buffer pages
      can have all the buffers clean and writepage does no IO so it should not
      be accounted as congested.
      
      This patch adds an address_space operation that filesystems may
      optionally use to check if a page is really dirty or really under
      writeback.  An implementation is provided for for buffer_heads is added
      and used for block operations and ext3 in ordered mode.  By default the
      page flags are obeyed.
      
      Credit goes to Jan Kara for identifying that the page flags alone are
      not sufficient for ext3 and sanity checking a number of ideas on how the
      problem could be addressed.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Zlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4597226
    • Mel Gorman's avatar
      mm: vmscan: block kswapd if it is encountering pages under writeback · 283aba9f
      Mel Gorman authored
      Historically, kswapd used to congestion_wait() at higher priorities if
      it was not making forward progress.  This made no sense as the failure
      to make progress could be completely independent of IO.  It was later
      replaced by wait_iff_congested() and removed entirely by commit 258401a6
      
      
      (mm: don't wait on congested zones in balance_pgdat()) as it was
      duplicating logic in shrink_inactive_list().
      
      This is problematic.  If kswapd encounters many pages under writeback
      and it continues to scan until it reaches the high watermark then it
      will quickly skip over the pages under writeback and reclaim clean young
      pages or push applications out to swap.
      
      The use of wait_iff_congested() is not suited to kswapd as it will only
      stall if the underlying BDI is really congested or a direct reclaimer
      was unable to write to the underlying BDI.  kswapd bypasses the BDI
      congestion as it sets PF_SWAPWRITE but even if this was taken into
      account then it would cause direct reclaimers to stall on writeback
      which is not desirable.
      
      This patch sets a ZONE_WRITEBACK flag if direct reclaim or kswapd is
      encountering too many pages under writeback.  If this flag is set and
      kswapd encounters a PageReclaim page under writeback then it'll assume
      that the LRU lists are being recycled too quickly before IO can complete
      and block waiting for some IO to complete.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Tested-by: default avatarZlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      283aba9f