      Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
      Pull irqdomain changes from Grant Likely:
       "Round of refactoring and enhancements to irq_domain infrastructure.
        This series starts the process of simplifying irqdomain.  The ultimate
        goal is to merge LEGACY, LINEAR and TREE mappings into a single
        system, but had to back off from that after some last minute bugs.
        Instead it mainly reorganizes the code and ensures that the reverse
        map gets populated when the irq is mapped instead of the first time it
        is looked up.
        Merging of the irq_domain types is deferred to v3.7
        In other news, this series adds helpers for creating static mappings
        on a linear or tree mapping."
      Merge branch 'akpm' (Andrew's patch-bomb)
      Merge Andrew's second set of patches:
       - MM
       - a few random fixes
       - a couple of RTC leftovers
        rtc/rtc-88pm80x: remove unneed devm_kfree
        rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
        mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
        tmpfs: distribute interleave better across nodes
        mm: remove redundant initialization
        mm: warn if pg_data_t isn't initialized with zero
        mips: zero out pg_data_t when it's allocated
        memcg: gix memory accounting scalability in shrink_page_list
        mm/sparse: remove index_init_lock
        mm/sparse: more checks on mem_section number
        mm/sparse: optimize sparse_index_alloc
        memcg: add mem_cgroup_from_css() helper
        memcg: further prevent OOM with too many dirty pages
        memcg: prevent OOM with too many dirty pages
        mm: mmu_notifier: fix freed page still mapped in secondary MMU
        mm: memcg: only check anon swapin page charges for swap cache
        mm: memcg: only check swap cache pages for repeated charging
        mm: memcg: split swapin charge function into private and public part
        mm: memcg: remove needless !mm fixup to init_mm when charging
        mm: memcg: remove unneeded shmem charge type
      Merge tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio
      Pull VFIO core from Alex Williamson:
       "This series includes the VFIO userspace driver interface for the 3.6
        kernel merge window.  This driver is intended to provide a secure
        interface for device access using IOMMU protection for applications
        like assignment of physical devices to virtual machines.
        Qemu will be the first user of this interface, enabling assignment of
        PCI devices to Qemu guests.  This interface is intended to eventually
        replace the x86-specific assignment mechanism currently available in
        This interface has the advantage of being more secure, by working with
        IOMMU groups to ensure device isolation and providing it's own
        filtered resource access mechanism, and also more flexible, in not
        being x86 or KVM specific (extensions to enable POWER are already
        This driver is originally the work of Tom Lyon, but has since been
        handed over to me and gone through a complete overhaul thanks to the
        input from David Gibson, Ben Herrenschmidt, Chris Wright, Joerg
        Roedel, and others.  This driver has been available in linux-next for
        the last month."
      Paul Mackerras says:
       "I would be glad to see it go in since we want to use it with KVM on
        PowerPC.  If possible we'd like the PowerPC bits for it to go in as
        vfio: Add PCI device driver
        vfio: Type1 IOMMU implementation
        vfio: Add documentation
        vfio: VFIO core
      Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
      Pull random subsystem patches from Ted Ts'o:
       "This patch series contains a major revamp of how we collect entropy
        from interrupts for /dev/random and /dev/urandom.
        The goal is to addresses weaknesses discussed in the paper "Mining
        your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
        by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J.  Alex Halderman,
        which will be published in the Proceedings of the 21st Usenix Security
        Symposium, August 2012.  (See https://factorable.net for more
        information and an extended version of the paper.)"
      Fix up trivial conflicts due to nearby changes in
      drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}
        random: mix in architectural randomness in extract_buf()
        dmi: Feed DMI table to /dev/random driver
        random: Add comment to random_initialize()
        random: final removal of IRQF_SAMPLE_RANDOM
        um: remove IRQF_SAMPLE_RANDOM which is now a no-op
        sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
        [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
        board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
        isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
        pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
        omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
        goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
        uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
        drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
        xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
        n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
        pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
        i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
        input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
        mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
      Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
      Pull final RDMA changes from Roland Dreier:
       - Fix IPoIB to stop using unsafe linkage between networking neighbour
         layer and private path database.
       - Small fixes for bugs found by Fengguang Wu's automated builds.
      Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
      Pull second set of media updates from Mauro Carvalho Chehab:
       - radio API: add support to work with radio frequency bands
       - new AM/FM radio drivers: radio-shark, radio-shark2
       - new Remote Controller USB driver: iguanair
       - conversion of several drivers to the v4l2 core control framework
       - new board additions at existing drivers
       - the remaining (and vast majority of the patches) are due to
         drivers/DocBook fixes/cleanups.
        [media] radio-tea5777: use library for 64bits div
        [media] tlg2300: Declare MODULE_FIRMWARE usage
        [media] lgs8gxx: Declare MODULE_FIRMWARE usage
        [media] xc5000: Add MODULE_FIRMWARE statements
        [media] s2255drv: Add MODULE_FIRMWARE statement
        [media] dib8000: move dereference after check for NULL
        [media] Documentation: Update cardlists
        [media] bttv: add support for Aposonic W-DVR
        [media] cx25821: Remove bad strcpy to read-only char*
        [media] pms.c: remove duplicated include
        [media] smiapp-core.c: remove duplicated include
        [media] via-camera: pass correct format settings to sensor
        [media] rtl2832.c: minor cleanup
        [media] Add support for the IguanaWorks USB IR Transceiver
        [media] Minor cleanups for MCE USB
        [media] drivers/media/dvb/siano/smscoreapi.c: use list_for_each_entry
        [media] Use a named union in struct v4l2_ioctl_info
        [media] mceusb: Add Twisted Melon USB IDs
        [media] staging/media/solo6x10: use module_pci_driver macro
        [media] staging/media/dt3155v4l: use module_pci_driver macro
      Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
      Pull second wave of NFS client updates from Trond Myklebust:
       - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
         separate modules.
       - Fix Oopses in the NFSv4 idmapper
       - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
         up recursing into the NFS code due to memory reclaim.
       - Increase the number of permitted callback connections.
        nfs: explicitly reject LOCK_MAND flock() requests
        nfs: increase number of permitted callback connections.
        SUNRPC: return negative value in case rpcbind client creation error
        NFS: Convert v4 into a module
        NFS: Convert v3 into a module
        NFS: Convert v2 into a module
        NFS: Keep module parameters in the generic NFS client
        NFS: Split out remaining NFS v4 inode functions
        NFS: Pass super operations and xattr handlers in the nfs_subversion
        NFS: Only initialize the ACL client in the v3 case
        NFS: Create a try_mount rpc op
        NFS: Remove the NFS v4 xdev mount function
        NFS: Add version registering framework
        NFS: Fix a number of bugs in the idmapper
        nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
        sunrpc: clarify comments on rpc_make_runnable
        pnfsblock: bail out partial page IO
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
      Pull networking update from David S. Miller:
       "I think Eric Dumazet and I have dealt with all of the known routing
        cache removal fallout.  Some other minor fixes all around.
        1) Fix RCU of cached routes, particular of output routes which require
           liberation via call_rcu() instead of call_rcu_bh().  From Eric
        2) Make sure we purge net device references in cached routes properly.
        3) TG3 driver bug fixes from Michael Chan.
        4) Fix reported 'expires' value in ipv6 routes, from Li Wei.
        5) TUN driver ioctl leaks kernel bytes to userspace, from Mathias
        ipv4: Properly purge netdev references on uncached routes.
        ipv4: Cache routes in nexthop exception entries.
        ipv4: percpu nh_rth_output cache
        ipv4: Restore old dst_free() behavior.
        bridge: make port attributes const
        ipv4: remove rt_cache_rebuild_count
        net: ipv4: fix RCU races on dst refcounts
        net: TCP early demux cleanup
        tun: Fix formatting.
        net/tun: fix ioctl() based info leaks
        tg3: Update version to 3.124
        tg3: Fix race condition in tg3_get_stats64()
        tg3: Add New 5719 Read DMA workaround
        tg3: Fix Read DMA workaround for 5719 A0.
        tg3: Request APE_LOCK_PHY before PHY access
        ipv6: fix incorrect route 'expires' value passed to userspace
        mISDN: Bugfix only few bytes are transfered on a connection
        seeq: use PTR_RET at init_module of driver
        bnx2x: remove cast around the kmalloc in bnx2x_prev_mark_path
        ipv4: clean up put_child
      rtc/rtc-88pm80x: remove unneed devm_kfree
      devm_kzalloc() doesn't need a matching devm_kfree(), the freeing mechanism
will trigger when driver unloads.
      will trigger when driver unloads.
      Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
      Cc: David Dajun Chen <dchen@diasemi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
      At the probe we are assigning ret to return value of PTR_ERR right after
      the rtc_register_drive()r, as we would have done it in the if
      (IS_ERR(ptr)) check, since the function fails and goes inside that case
      Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Ashish Jangam <ashish.jangam@kpitcummins.com>
      Cc: David Dajun Chen <dchen@diasemi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
      If a process creates a large hugetlbfs mapping that is eligible for page
      table sharing and forks heavily with children some of whom fault and
      others which destroy the mapping then it is possible for page tables to
      get corrupted.  Some teardowns of the mapping encounter a "bad pmd" and
      output a message to the kernel log.  The final teardown will trigger a
      BUG_ON in mm/filemap.c.
      This was reproduced in 3.4 but is known to have existed for a long time
      and goes back at least as far as 2.6.37.  It was probably was introduced
      in 2.6.20 by [39dde65c: shared page table for hugetlb page].  The messages
      look like this;
      [  ..........] Lots of bad pmd messages followed by this
      [  127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
      [  127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
      [  127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
      [  127.186778] ------------[ cut here ]------------
      [  127.186781] kernel BUG at mm/filemap.c:134!
      [  127.186782] invalid opcode: 0000 [#1] SMP
      [  127.186783] CPU 7
      [  127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
      [  127.186801]
      [  127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
      [  127.186804] RIP: 0010:[<ffffffff810ed6ce>]  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186809] RSP: 0000:ffff8804144b5c08  EFLAGS: 00010002
      [  127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
      [  127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
      [  127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
      [  127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
      [  127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
      [  127.186815] FS:  00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
      [  127.186816] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
      [  127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
      [  127.186821] Stack:
      [  127.186822]  ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
      [  127.186824]  ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
      [  127.186825]  ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
      [  127.186827] Call Trace:
      [  127.186829]  [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
      [  127.186832]  [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
      [  127.186834]  [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
      [  127.186837]  [<ffffffff811655c7>] evict+0xa7/0x1b0
      [  127.186839]  [<ffffffff811657a3>] iput_final+0xd3/0x1f0
      [  127.186840]  [<ffffffff811658f9>] iput+0x39/0x50
      [  127.186842]  [<ffffffff81162708>] d_kill+0xf8/0x130
      [  127.186843]  [<ffffffff81162812>] dput+0xd2/0x1a0
      [  127.186845]  [<ffffffff8114e2d0>] __fput+0x170/0x230
      [  127.186848]  [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
      [  127.186849]  [<ffffffff8114e3ad>] fput+0x1d/0x30
      [  127.186851]  [<ffffffff81117db7>] remove_vma+0x37/0x80
      [  127.186853]  [<ffffffff81119182>] do_munmap+0x2d2/0x360
      [  127.186855]  [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
      [  127.186857]  [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
      [  127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
      [  127.186868] RIP  [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
      [  127.186870]  RSP <ffff8804144b5c08>
      [  127.186871] ---[ end trace 7cbac5d1db69f426 ]---
      The bug is a race and not always easy to reproduce.  To reproduce it I was
      doing the following on a single socket I7-based machine with 16G of RAM.
      $ hugeadm --pool-pages-max DEFAULT:13G
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
      $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
      $ for i in `seq 1 9000`; do ./hugetlbfs-test; done
      On my particular machine, it usually triggers within 10 minutes but
      enabling debug options can change the timing such that it never hits.
      Once the bug is triggered, the machine is in trouble and needs to be
      rebooted.  The machine will respond but processes accessing proc like "ps
      aux" will hang due to the BUG_ON.  shutdown will also hang and needs a
      hard reset or a sysrq-b.
      The basic problem is a race between page table sharing and teardown.  For
      the most part page table sharing depends on i_mmap_mutex.  In some cases,
      it is also taking the mm->page_table_lock for the PTE updates but with
      shared page tables, it is the i_mmap_mutex that is more important.
      Unfortunately it appears to be also insufficient. Consider the following
      Process A					Process B
      ---------					---------
      hugetlb_fault					shmdt
      							    huge_pmd_unshare/unmap tables <--- (1)
        huge_pte_alloc				      ...
          Lock(i_mmap_mutex)				      ...
          vma_prio_walk, find svma, spte		      ...
          Lock(mm->page_table_lock)			      ...
          share spte					      ...
          Unlock(mm->page_table_lock)			      ...
          Unlock(i_mmap_mutex)			      ...
        hugetlb_no_page									  <--- (2)
      In this scenario, it is possible for Process A to share page tables with
      Process B that is trying to tear them down.  The i_mmap_mutex on its own
      does not prevent Process A walking Process B's page tables.  At (1) above,
      the page tables are not shared yet so it unmaps the PMDs.  Process A sets
      up page table sharing and at (2) faults a new entry.  Process B then trips
      up on it in free_pgtables.
      This patch fixes the problem by adding a new function
      __unmap_hugepage_range_final that is only called when the VMA is about to
      be destroyed.  This function clears VM_MAYSHARE during
      unmap_hugepage_range() under the i_mmap_mutex.  This makes the VMA
      ineligible for sharing and avoids the race.  Superficially this looks like
      it would then be vunerable to truncate and madvise issues but hugetlbfs
      has its own truncate handlers so does not use unmap_mapping_range() and
      does not support madvise(DONTNEED).
      This should be treated as a -stable candidate if it is merged.
      Test program is as follows. The test case was mostly written by Michal
      Hocko with a few minor changes to reproduce this bug.
      ==== CUT HERE ====
      static size_t huge_page_size = (2UL << 20);
      static size_t nr_huge_page_A = 512;
      static size_t nr_huge_page_B = 5632;
      unsigned int get_random(unsigned int max)
      	struct timeval tv;
      	gettimeofday(&tv, NULL);
      	return random() % max;
      static void play(void *addr, size_t size)
      	unsigned char *start = addr,
      		      *end = start + size,
      	start += get_random(size/2);
      	/* we could itterate on huge pages but let's give it more time. */
      	for (a = start; a < end; a += 4096)
      		*a = 0;
      int main(int argc, char **argv)
      	key_t key = IPC_PRIVATE;
      	size_t sizeA = nr_huge_page_A * huge_page_size;
      	size_t sizeB = nr_huge_page_B * huge_page_size;
      	int shmidA, shmidB;
      	void *addrA = NULL, *addrB = NULL;
      	int nr_children = 300, n = 0;
      	if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		return 1;
      	if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
      		return 1;
      	if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
      		return 1;
      	if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
      		return 1;
      	switch(fork()) {
      		case 0:
      			switch (n%3) {
      			case 0:
      				play(addrA, sizeA);
      			case 1:
      				play(addrB, sizeB);
      			case 2:
      		case -1:
      			if (++n < nr_children)
      				goto fork_child;
      			play(addrA, sizeA);
      	do {
      	} while (--n > 0);
      	shmctl(shmidA, IPC_RMID, NULL);
      	shmctl(shmidB, IPC_RMID, NULL);
      	return 0;
      [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      tmpfs: distribute interleave better across nodes
      When tmpfs has the interleave memory policy, it always starts allocating
      for each file from node 0 at offset 0.  When there are many small files,
      the lower nodes fill up disproportionately.
      This patch spreads out node usage by starting files at nodes other than 0,
      by using the inode number to bias the starting node for interleave.
      Signed-off-by: default avatarNathan Zimmer <nzimmer@sgi.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      mm: remove redundant initialization
      pg_data_t is zeroed before reaching free_area_init_core(), so remove the
now unnecessary initializations.
      now unnecessary initializations.
      Signed-off-by: Minchan Kim <minchan@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>