1. 03 Jun, 2016 10 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d29e4723
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "The main thing here is reviving hugetlb support using contiguous ptes,
        which we ended up reverting at the last minute in 4.5 pending a fix
        which went into the core mm/ code during the recent merge window.
      
         - Revert a previous revert and get hugetlb going with contiguous hints
         - Wire up missing compat syscalls
         - Enable CONFIG_SET_MODULE_RONX by default
         - Add missing line to our compat /proc/cpuinfo output
         - Clarify levels in our page table dumps
         - Fix booting with RANDOMIZE_TEXT_OFFSET enabled
         - Misc fixes to the ARM CPU PMU driver (refcounting, probe failure)
         - Remove some dead code and update a comment"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled
        arm64: move {PAGE,CONT}_SHIFT into Kconfig
        arm64: mm: dump: log span level
        arm64: update stale PAGE_OFFSET comment
        drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error
        drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu
        drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg
        arm64: report CPU number in bad_mode
        arm64: unistd32.h: wire up missing syscalls for compat tasks
        arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
        arm64: enable CONFIG_SET_MODULE_RONX by default
        arm64: Remove orphaned __addr_ok() definition
        Revert "arm64: hugetlb: partial revert of 66b3923a"
      d29e4723
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 5306d766
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - Handle RTAS delay requests in configure_bridge from Russell Currey
       - Refactor the configure_bridge RTAS tokens from Russell Currey
       - Fix definition of SIAR and SDAR registers from Thomas Huth
       - Use privileged SPR number for MMCR2 from Thomas Huth
       - Update LPCR only if it is powernv from Aneesh Kumar K.V
       - Fix the reference bit update when handling hash fault from Aneesh
         Kumar K.V
       - Add missing tlb flush from Aneesh Kumar K.V
       - Add POWER8NVL support to ibm,client-architecture-support call from
         Thomas Huth
      
      * tag 'powerpc-4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
        powerpc/mm/radix: Add missing tlb flush
        powerpc/mm/hash: Fix the reference bit update when handling hash fault
        powerpc/mm/radix: Update LPCR only if it is powernv
        powerpc: Use privileged SPR number for MMCR2
        powerpc: Fix definition of SIAR and SDAR registers
        powerpc/pseries/eeh: Refactor the configure_bridge RTAS tokens
        powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
      5306d766
    • Mark Rutland's avatar
      arm64: fix alignment when RANDOMIZE_TEXT_OFFSET is enabled · aed7eb83
      Mark Rutland authored
      With ARM64_64K_PAGES and RANDOMIZE_TEXT_OFFSET enabled, we hit the
      following issue on the boot:
      
      kernel BUG at arch/arm64/mm/mmu.c:480!
      Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.6.0 #310
      Hardware name: ARM Juno development board (r2) (DT)
      task: ffff000008d58a80 ti: ffff000008d30000 task.ti: ffff000008d30000
      PC is at map_kernel_segment+0x44/0xb0
      LR is at paging_init+0x84/0x5b0
      pc : [<ffff000008c450b4>] lr : [<ffff000008c451a4>] pstate: 600002c5
      
      Call trace:
      [<ffff000008c450b4>] map_kernel_segment+0x44/0xb0
      [<ffff000008c451a4>] paging_init+0x84/0x5b0
      [<ffff000008c42728>] setup_arch+0x198/0x534
      [<ffff000008c40848>] start_kernel+0x70/0x388
      [<ffff000008c401bc>] __primary_switched+0x30/0x74
      
      Commit 7eb90f2f ("arm64: cover the .head.text section in the .text
      segment mapping") removed the alignment between the .head.text and .text
      sections, and used the _text rather than the _stext interval for mapping
      the .text segment.
      
      Prior to this commit _stext was always section aligned and didn't cause
      any issue even when RANDOMIZE_TEXT_OFFSET was enabled. Since that
      alignment has been removed and _text is used to map the .text segment,
      we need ensure _text is always page aligned when RANDOMIZE_TEXT_OFFSET
      is enabled.
      
      This patch adds logic to TEXT_OFFSET fuzzing to ensure that the offset
      is always aligned to the kernel page size. To ensure this, we rely on
      the PAGE_SHIFT being available via Kconfig.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reported-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 7eb90f2f ("arm64: cover the .head.text section in the .text segment mapping")
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      aed7eb83
    • Mark Rutland's avatar
      arm64: move {PAGE,CONT}_SHIFT into Kconfig · 030c4d24
      Mark Rutland authored
      In some cases (e.g. the awk for CONFIG_RANDOMIZE_TEXT_OFFSET) we would
      like to make use of PAGE_SHIFT outside of code that can include the
      usual header files.
      
      Add a new CONFIG_ARM64_PAGE_SHIFT for this, likewise with
      ARM64_CONT_SHIFT for consistency.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      030c4d24
    • Mark Rutland's avatar
      arm64: mm: dump: log span level · 48dd73c5
      Mark Rutland authored
      The page table dump code logs spans of entries at the same level
      (pgd/pud/pmd/pte) which have the same attributes. While we log the
      (decoded) attributes, we don't log the level, which leaves the output
      ambiguous and/or confusing in some cases.
      
      For example:
      
      0xffff800800000000-0xffff800980000000           6G       RW NX SHD AF        BLK UXN MEM/NORMAL
      
      If using 4K pages, this may describe a span of 6 1G block entries at the
      PGD/PUD level, or 3072 2M block entries at the PMD level.
      
      This patch adds the page table level to each output line, removing this
      ambiguity. For the example above, this will produce:
      
      0xffffffc800000000-0xffffffc980000000           6G PUD       RW NX SHD AF        BLK UXN MEM/NORMAL
      
      When 3 level tables are in use, and we use the asm-generic/nopud.h
      definitions, the dump code treats each entry in the PGD as a 1 element
      table at the PUD level, and logs spans as being PUDs, which can be
      confusing. To counteract this, the "PUD" mnemonic is replaced with "PGD"
      when CONFIG_PGTABLE_LEVELS <= 3. Likewise for "PMD" when
      CONFIG_PGTABLE_LEVELS <= 2.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Huang Shijie <shijie.huang@arm.com>
      Cc: Laura Abbott <labbott@fedoraproject.org>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      48dd73c5
    • Mark Rutland's avatar
      arm64: update stale PAGE_OFFSET comment · a13e3a5b
      Mark Rutland authored
      Commit ab893fb9 ("arm64: introduce KIMAGE_VADDR as the virtual
      base of the kernel region") logically split KIMAGE_VADDR from
      PAGE_OFFSET, and since commit f9040773 ("arm64: move kernel
      image to base of vmalloc area") the two have been distinct values.
      
      Unfortunately, neither commit updated the comment above these
      definitions, which now erroneously states that PAGE_OFFSET is the start
      of the kernel image rather than the start of the linear mapping.
      
      This patch fixes said comment, and introduces an explanation of
      KIMAGE_VADDR.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      a13e3a5b
    • Julien Grall's avatar
      drivers/perf: arm_pmu: Avoid leaking pmu->irq_affinity on error · 5988a363
      Julien Grall authored
      pmu->irq_affinity will not be freed if an error occurred within
      arm_pmu_device_probe after of_pmu_irq_cfg has been called.
      
      Note that in the case of_pmu_irq_cfg is returning an error,
      pmu->irq_affinity will not be set, but it should be NULL as pmu was
      kzalloc'd. Therefore the result kfree(NULL) is benign.
      Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      5988a363
    • Julien Grall's avatar
      drivers/perf: arm_pmu: Defer the setting of __oprofile_cpu_pmu · 0f254c76
      Julien Grall authored
      The global variable __oprofile_cpu_pmu is set before the PMU is fully
      initialized. If an error occurs before the end of the initialization,
      the PMU will be freed and the variable will contain an invalid pointer.
      
      This will result in a kernel crash when perf will be used.
      
      Fix it by moving the setting of __oprofile_cpu_pmu when the PMU is fully
      initialized (i.e when it is no longer possible to fail).
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      0f254c76
    • Julien Grall's avatar
      drivers/perf: arm_pmu: Fix reference count of a device_node in of_pmu_irq_cfg · 121323ae
      Julien Grall authored
      The only function called by of_pmu_irq_cfg that will increment the
      reference count on dn is of_parse_phandle.
      
      Each time we successfully parse a possible CPU from an
      interrupt-affinity property, we increment the refcount of that CPU node
      once via of_parse_handle. After validating the CPU is possible, we
      decrement the refcount once. Subsequently, we decrement the refcount
      again, either as part of an early break if we don't have a matching SPI,
      or as part of the end of the loop body.
      
      This will lead to decrementing twice the refcounnt.
      Remove the second pairs of call to of_node_put as nobody is using dn
      between the first and second call to of_node_put.
      Signed-off-by: default avatarJulien Grall <julien.grall@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      121323ae
    • Mark Rutland's avatar
      arm64: report CPU number in bad_mode · 8051f4d1
      Mark Rutland authored
      If we take an exception we don't expect (e.g. SError), we report this in
      the bad_mode handler with pr_crit. Depending on the configured log
      level, we may or may not log additional information in functions called
      subsequently. Notably, the messages in dump_stack (including the CPU
      number) are printed with KERN_DEFAULT and may not appear.
      
      Some exceptions have an IMPLEMENTATION DEFINED ESR_ELx.ISS encoding, and
      knowing the CPU number is crucial to correctly decode them. To ensure
      that this is always possible, we should log the CPU number along with
      the ESR_ELx value, so we are not reliant on subsequent logs or
      additional printk configuration options.
      
      This patch logs the CPU number in bad_mode such that it is possible for
      a developer to decode these exceptions, provided access to sufficient
      documentation.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reported-by: default avatarAl Grant <Al.Grant@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      8051f4d1
  2. 02 Jun, 2016 11 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 4340fa55
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "ARM:
         - two fixes for 4.6 vgic [Christoffer] (cc stable)
      
         - six fixes for 4.7 vgic [Marc]
      
        x86:
         - six fixes from syzkaller reports [Paolo] (two of them cc stable)
      
         - allow OS X to boot [Dmitry]
      
         - don't trust compilers [Nadav]"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
        KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
        KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
        KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
        KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID
        kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
        KVM: Handle MSR_IA32_PERF_CTL
        KVM: x86: avoid write-tearing of TDP
        KVM: arm/arm64: vgic-new: Removel harmful BUG_ON
        arm64: KVM: vgic-v3: Relax synchronization when SRE==1
        arm64: KVM: vgic-v3: Prevent the guest from messing with ICC_SRE_EL1
        arm64: KVM: Make ICC_SRE_EL1 access return the configured SRE value
        KVM: arm/arm64: vgic-v3: Always resample level interrupts
        KVM: arm/arm64: vgic-v2: Always resample level interrupts
        KVM: arm/arm64: vgic-v3: Clear all dirty LRs
        KVM: arm/arm64: vgic-v2: Clear all dirty LRs
      4340fa55
    • Paolo Bonzini's avatar
      KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS · d14bdb55
      Paolo Bonzini authored
      MOV to DR6 or DR7 causes a #GP if an attempt is made to write a 1 to
      any of bits 63:32.  However, this is not detected at KVM_SET_DEBUGREGS
      time, and the next KVM_RUN oopses:
      
         general protection fault: 0000 [#1] SMP
         CPU: 2 PID: 14987 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
         Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
         [...]
         Call Trace:
          [<ffffffffa072c93d>] kvm_arch_vcpu_ioctl_run+0x141d/0x14e0 [kvm]
          [<ffffffffa071405d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
          [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
          [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
          [<ffffffff817a0f2e>] entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 55 83 ff 07 48 89 e5 77 27 89 ff ff 24 fd 90 87 80 81 0f 23 fe 5d c3 0f 23 c6 5d c3 0f 23 ce 5d c3 0f 23 d6 5d c3 0f 23 de 5d c3 <0f> 23 f6 5d c3 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
         RIP  [<ffffffff810639eb>] native_set_debugreg+0x2b/0x40
          RSP <ffff88005836bd50>
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_debugregs dr = { 0 };
      
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
      
              memcpy(&dr,
                     "\x5d\x6a\x6b\xe8\x57\x3b\x4b\x7e\xcf\x0d\xa1\x72"
                     "\xa3\x4a\x29\x0c\xfc\x6d\x44\x00\xa7\x52\xc7\xd8"
                     "\x00\xdb\x89\x9d\x78\xb5\x54\x6b\x6b\x13\x1c\xe9"
                     "\x5e\xd3\x0e\x40\x6f\xb4\x66\xf7\x5b\xe3\x36\xcb",
                     48);
              r[7] = ioctl(r[4], KVM_SET_DEBUGREGS, &dr);
              r[6] = ioctl(r[4], KVM_RUN, 0);
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      d14bdb55
    • Paolo Bonzini's avatar
      KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID · f8c1b85b
      Paolo Bonzini authored
      This causes an ugly dmesg splat.  Beautified syzkaller testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <sys/ioctl.h>
          #include <fcntl.h>
          #include <linux/kvm.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_irq_routing ir = { 0 };
              r[2] = open("/dev/kvm", O_RDWR);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_SET_GSI_ROUTING, &ir);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      f8c1b85b
    • Paolo Bonzini's avatar
      KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi · c622a3c2
      Paolo Bonzini authored
      Found by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000120
          IP: [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
          PGD 6f80b067 PUD b6535067 PMD 0
          Oops: 0000 [#1] SMP
          CPU: 3 PID: 4988 Comm: a.out Not tainted 4.4.9-300.fc23.x86_64 #1
          [...]
          Call Trace:
           [<ffffffffa0795f62>] irqfd_update+0x32/0xc0 [kvm]
           [<ffffffffa0796c7c>] kvm_irqfd+0x3dc/0x5b0 [kvm]
           [<ffffffffa07943f4>] kvm_vm_ioctl+0x164/0x6f0 [kvm]
           [<ffffffff81241648>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812418a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a1062>] tracesys_phase2+0x84/0x89
          Code: b5 71 a7 e0 5b 41 5c 41 5d 5d f3 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 10 2e 00 00 31 c0 48 89 e5 <39> 91 20 01 00 00 76 6a 48 63 d2 48 8b 94 d1 28 01 00 00 48 85
          RIP  [<ffffffffa0797202>] kvm_irq_map_gsi+0x12/0x90 [kvm]
           RSP <ffff8800926cbca8>
          CR2: 0000000000000120
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[26];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", 0);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
      
              struct kvm_irqfd ifd;
              ifd.fd = syscall(SYS_eventfd2, 5, 0);
              ifd.gsi = 3;
              ifd.flags = 2;
              ifd.resamplefd = ifd.fd;
              r[25] = ioctl(r[3], KVM_IRQFD, &ifd);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      c622a3c2
    • Paolo Bonzini's avatar
      KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number · 78e546c8
      Paolo Bonzini authored
      This cannot be returned by KVM_GET_VCPU_EVENTS, so it is okay to return
      EINVAL.  It causes a WARN from exception_type:
      
          WARNING: CPU: 3 PID: 16732 at arch/x86/kvm/x86.c:345 exception_type+0x49/0x50 [kvm]()
          CPU: 3 PID: 16732 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
          Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
           0000000000000286 000000006308a48b ffff8800bec7fcf8 ffffffff813b542e
           0000000000000000 ffffffffa0966496 ffff8800bec7fd30 ffffffff810a40f2
           ffff8800552a8000 0000000000000000 00000000002c267c 0000000000000001
          Call Trace:
           [<ffffffff813b542e>] dump_stack+0x63/0x85
           [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
           [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa0924809>] exception_type+0x49/0x50 [kvm]
           [<ffffffffa0934622>] kvm_arch_vcpu_ioctl_run+0x10a2/0x14e0 [kvm]
           [<ffffffffa091c04d>] kvm_vcpu_ioctl+0x33d/0x620 [kvm]
           [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
          ---[ end trace b1a0391266848f50 ]---
      
      Testcase (beautified/reduced from syzkaller output):
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
          #include <linux/kvm.h>
      
          long r[31];
      
          int main()
          {
              memset(r, -1, sizeof(r));
              r[2] = open("/dev/kvm", O_RDONLY);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[7] = ioctl(r[3], KVM_CREATE_VCPU, 0);
      
              struct kvm_vcpu_events ve = {
                      .exception.injected = 1,
                      .exception.nr = 0xd4
              };
              r[27] = ioctl(r[7], KVM_SET_VCPU_EVENTS, &ve);
              r[30] = ioctl(r[7], KVM_RUN, 0);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      78e546c8
    • Paolo Bonzini's avatar
      KVM: x86: avoid vmalloc(0) in the KVM_SET_CPUID · 83676e92
      Paolo Bonzini authored
      This causes an ugly dmesg splat.  Beautified syzkaller testcase:
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <sys/ioctl.h>
          #include <fcntl.h>
          #include <linux/kvm.h>
      
          long r[8];
      
          int main()
          {
              struct kvm_cpuid2 c = { 0 };
              r[2] = open("/dev/kvm", O_RDWR);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
              r[4] = ioctl(r[3], KVM_CREATE_VCPU, 0x8);
              r[7] = ioctl(r[4], KVM_SET_CPUID, &c);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      83676e92
    • Paolo Bonzini's avatar
      kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR · b21629da
      Paolo Bonzini authored
      Found by syzkaller:
      
          WARNING: CPU: 3 PID: 15175 at arch/x86/kvm/x86.c:7705 __x86_set_memory_region+0x1dc/0x1f0 [kvm]()
          CPU: 3 PID: 15175 Comm: a.out Tainted: G        W       4.4.6-300.fc23.x86_64 #1
          Hardware name: LENOVO 2325F51/2325F51, BIOS G2ET32WW (1.12 ) 05/30/2012
           0000000000000286 00000000950899a7 ffff88011ab3fbf0 ffffffff813b542e
           0000000000000000 ffffffffa0966496 ffff88011ab3fc28 ffffffff810a40f2
           00000000000001fd 0000000000003000 ffff88014fc50000 0000000000000000
          Call Trace:
           [<ffffffff813b542e>] dump_stack+0x63/0x85
           [<ffffffff810a40f2>] warn_slowpath_common+0x82/0xc0
           [<ffffffff810a423a>] warn_slowpath_null+0x1a/0x20
           [<ffffffffa09251cc>] __x86_set_memory_region+0x1dc/0x1f0 [kvm]
           [<ffffffffa092521b>] x86_set_memory_region+0x3b/0x60 [kvm]
           [<ffffffffa09bb61c>] vmx_set_tss_addr+0x3c/0x150 [kvm_intel]
           [<ffffffffa092f4d4>] kvm_arch_vm_ioctl+0x654/0xbc0 [kvm]
           [<ffffffffa091d31a>] kvm_vm_ioctl+0x9a/0x6f0 [kvm]
           [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
           [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
           [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
      
      Testcase:
      
          #include <unistd.h>
          #include <sys/ioctl.h>
          #include <fcntl.h>
          #include <string.h>
          #include <linux/kvm.h>
      
          long r[8];
      
          int main()
          {
              memset(r, -1, sizeof(r));
      	r[2] = open("/dev/kvm", O_RDONLY|O_TRUNC);
              r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);
              r[5] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
              r[7] = ioctl(r[3], KVM_SET_TSS_ADDR, 0x20000000ul);
              return 0;
          }
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      b21629da
    • Dmitry Bilunov's avatar
      KVM: Handle MSR_IA32_PERF_CTL · 0c2df2a1
      Dmitry Bilunov authored
      Intel CPUs having Turbo Boost feature implement an MSR to provide a
      control interface via rdmsr/wrmsr instructions. One could detect the
      presence of this feature by issuing one of these instructions and
      handling the #GP exception which is generated in case the referenced MSR
      is not implemented by the CPU.
      
      KVM's vCPU model behaves exactly as a real CPU in this case by injecting
      a fault when MSR_IA32_PERF_CTL is called (which KVM does not support).
      However, some operating systems use this register during an early boot
      stage in which their kernel is not capable of handling #GP correctly,
      causing #DP and finally a triple fault effectively resetting the vCPU.
      
      This patch implements a dummy handler for MSR_IA32_PERF_CTL to avoid the
      crashes.
      Signed-off-by: default avatarDmitry Bilunov <kmeaw@yandex-team.ru>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      0c2df2a1
    • Nadav Amit's avatar
      KVM: x86: avoid write-tearing of TDP · b19ee2ff
      Nadav Amit authored
      In theory, nothing prevents the compiler from write-tearing PTEs, or
      split PTE writes. These partially-modified PTEs can be fetched by other
      cores and cause mayhem. I have not really encountered such case in
      real-life, but it does seem possible.
      
      For example, the compiler may try to do something creative for
      kvm_set_pte_rmapp() and perform multiple writes to the PTE.
      Signed-off-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      b19ee2ff
    • Radim Krčmář's avatar
      Merge tag 'kvm-arm-for-v4.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm · 13e98fd1
      Radim Krčmář authored
      KVM/ARM Fixes for v4.7-rc2
      
      Fixes for the vgic, 2 of the patches address a bug introduced in v4.6
      while the rest are for the new vgic.
      13e98fd1
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-new: Removel harmful BUG_ON · 05fb05a6
      Marc Zyngier authored
      When changing the active bit from an MMIO trap, we decide to
      explode if the intid is that of a private interrupt.
      
      This flawed logic comes from the fact that we were assuming that
      kvm_vcpu_kick() as called by kvm_arm_halt_vcpu() would not return before
      the called vcpu responded, but this is not the case, so we need to
      perform this wait even for private interrupts.
      
      Dropping the BUG_ON seems like the right thing to do.
      
       [ Commit message tweaked by Christoffer ]
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      05fb05a6
  3. 01 Jun, 2016 3 commits
  4. 31 May, 2016 16 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6b15d665
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.
      
       2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
          properly.  From Ezequiel Garcia.
      
       3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.
      
       4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.
      
       5) Fix deadlock on team->lock when propagating features, from Ivan
          Vecera.
      
       6) Work around buffer offset hw bug in alx chips, from Feng Tang.
      
       7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
          Long.
      
       8) Various statistics bug fixes in mlx4 from Eric Dumazet.
      
       9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.
      
      10) All of l2tp was namespace aware, but the ipv6 support code was not
          doing so.  From Shmulik Ladkani.
      
      11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.
      
      12) Propagate MAC changes properly through VLAN devices, from Mike
          Manning.
      
      13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
        sfc: Track RPS flow IDs per channel instead of per function
        usbnet: smsc95xx: fix link detection for disabled autonegotiation
        virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
        bnx2x: avoid leaking memory on bnx2x_init_one() failures
        fou: fix IPv6 Kconfig options
        openvswitch: update checksum in {push,pop}_mpls
        sctp: sctp_diag should dump sctp socket type
        net: fec: update dirty_tx even if no skb
        vlan: Propagate MAC address to VLANs
        atm: iphase: off by one in rx_pkt()
        atm: firestream: add more reserved strings
        vxlan: Accept user specified MTU value when create new vxlan link
        net: pktgen: Call destroy_hrtimer_on_stack()
        timer: Export destroy_hrtimer_on_stack()
        net: l2tp: Make l2tp_ip6 namespace aware
        Documentation: ip-sysctl.txt: clarify secure_redirects
        sfc: use flow dissector helpers for aRFS
        ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
        net: nps_enet: Disable interrupts before napi reschedule
        net/lapb: tuse %*ph to dump buffers
        ...
      6b15d665
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 58c1f995
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "sparc64 mmu context allocation and trap return bug fixes"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc64: Fix return from trap window fill crashes.
        sparc: Harden signal return frame checks.
        sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
      58c1f995
    • Thomas Huth's avatar
      powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call · 7cc85103
      Thomas Huth authored
      If we do not provide the PVR for POWER8NVL, a guest on this system
      currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU
      does not provide a generic PowerISA 2.07 mode yet. So some new
      instructions from POWER8 (like "mtvsrd") get disabled for the guest,
      resulting in crashes when using code compiled explicitly for
      POWER8 (e.g. with the "-mcpu=power8" option of GCC).
      
      Fixes: ddee09c0 ("powerpc: Add PVR for POWER8NVL processor")
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      7cc85103
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Add missing tlb flush · 157d4d06
      Aneesh Kumar K.V authored
      This should not have any impact on hash, because hash does tlb
      invalidate with every pte update and we don't implement
      flush_tlb_* functions for hash. With radix we should make an explicit
      call to flush tlb outside pte update.
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      157d4d06
    • Aneesh Kumar K.V's avatar
      powerpc/mm/hash: Fix the reference bit update when handling hash fault · dc47c0c1
      Aneesh Kumar K.V authored
      When we converted the asm routines to C functions, we missed updating
      HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
      bits from pte via.
      
      andi.	r3,r30,0x1fe		/* Get basic set of flags */
      
      We also update the code such that we won't update the Change bit ('C'
      bit) always. This was added by commit c5cf0e30 ("powerpc: Fix
      buglet with MMU hash management").
      
      With hash64, we need to make sure that hardware doesn't do a pte update
      directly. This is because we do end up with entries in TLB with no hash
      page table entry. This happens because when we find a hash bucket full,
      we "evict" a more/less random entry from it. When we do that we don't
      invalidate the TLB (hpte_remove) because we assume the old translation
      is still technically "valid". For more info look at commit
      0608d692("powerpc/mm: Always invalidate tlb on hpte invalidate and
      update").
      
      Thus it's critical that valid hash PTEs always have reference bit set
      and writeable ones have change bit set. We do this by hashing a
      non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and
      thus R) when hashing anything else in. Any attempt by Linux at clearing
      those bits also removes the corresponding hash entry.
      
      Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
      We don't really need to do that because we never map a RW pte entry
      without setting 'C' bit. On READ fault on a RW pte entry, we still map
      it READ only, hence a store update in the page will still cause a hash
      pte fault.
      
      This patch reverts the part of commit c5cf0e30 ("[PATCH] powerpc:
      Fix buglet with MMU hash management") and retain the updatepp part.
      
      - If we hit the updatepp path on native, the old code without that
        commit, would fail to set C bcause native_hpte_updatepp()
        was implemented to filter the same bits as H_PROTECT and not let C
        through thus we would "upgrade" a RO HPTE to RW without setting C
        thus causing the bug. So the real fix in that commit was the change
        to native_hpte_updatepp
      
      Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C")
      Cc: stable@vger.kernel.org # v4.5+
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      dc47c0c1
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Update LPCR only if it is powernv · d6c88600
      Aneesh Kumar K.V authored
      LPCR cannot be updated when running in guest mode.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d6c88600
    • Jon Cooper's avatar
      sfc: Track RPS flow IDs per channel instead of per function · faf8dcc1
      Jon Cooper authored
      Otherwise we get confused when two flows on different channels get the
       same flow ID.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faf8dcc1
    • Christoph Fritz's avatar
      usbnet: smsc95xx: fix link detection for disabled autonegotiation · d69d1694
      Christoph Fritz authored
      To detect link status up/down for connections where autonegotiation is
      explicitly disabled, we don't get an irq but need to poll the status
      register for link up/down detection.
      This patch adds a workqueue to poll for link status.
      Signed-off-by: default avatarChristoph Fritz <chf.fritz@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d69d1694
    • wangyunjian's avatar
      virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv · f00e35e2
      wangyunjian authored
      In function virtnet_open() and virtnet_probe(), func try_fill_recv() may
      be executed at the same time. VQ in virtqueue_add() has not been protected
      well and BUG_ON will be triggered when virito_net.ko being removed.
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f00e35e2
    • Vitaly Kuznetsov's avatar
      bnx2x: avoid leaking memory on bnx2x_init_one() failures · bae5499c
      Vitaly Kuznetsov authored
      bnx2x_init_bp() allocates memory with bnx2x_alloc_mem_bp() so if we
      fail later in bnx2x_init_one() we need to free this memory
      with bnx2x_free_mem_bp() to avoid leakages. E.g. I'm observing memory
      leaks reported by kmemleak when a failure (unrelated) happens in
      bnx2x_vfpf_acquire().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bae5499c
    • Arnd Bergmann's avatar
      fou: fix IPv6 Kconfig options · 95e4daa8
      Arnd Bergmann authored
      The Kconfig options I added to work around broken compilation ended
      up screwing up things more, as I used the wrong symbol to control
      compilation of the file, resulting in IPv6 fou support to never be built
      into the kernel.
      
      Changing CONFIG_NET_FOU_IPV6_TUNNELS to CONFIG_IPV6_FOU fixes that
      problem, I had renamed the symbol in one location but not the other,
      and as the file is never being used by other kernel code, this did not
      lead to a build failure that I would have caught.
      
      After that fix, another issue with the same patch becomes obvious, as we
      'select INET6_TUNNEL', which is related to IPV6_TUNNEL, but not the same,
      and this can still cause the original build failure when IPV6_TUNNEL is
      not built-in but IPV6_FOU is. The fix is equally trivial, we just need
      to select the right symbol.
      
      I have successfully build 350 randconfig kernels with this patch
      and verified that the driver is now being built.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarValentin Rothberg <valentinrothberg@gmail.com>
      Fixes: fabb13db ("fou: add Kconfig options for IPv6 support")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95e4daa8
    • Simon Horman's avatar
      openvswitch: update checksum in {push,pop}_mpls · bc7cc599
      Simon Horman authored
      In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
      {push,pop}_mpls() as they the type in the ethernet header.
      
      As suggested by Pravin Shelar.
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Fixes: 25cd9ba0 ("openvswitch: Add basic MPLS support to kernel")
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc7cc599
    • Xin Long's avatar
      sctp: sctp_diag should dump sctp socket type · 40eb90e9
      Xin Long authored
      Now we cannot distinguish that one sk is a udp or sctp style when
      we use ss to dump sctp_info. it's necessary to dump it as well.
      
      For sctp_diag, ss support is not officially available, thus there
      are no official users of this yet, so we can add this field in the
      middle of sctp_info without breaking user API.
      
      v1->v2:
        - move 'sctpi_s_type' field to the end of struct sctp_info, so
          that it won't cause incompatibility with applications already
          built.
        - add __reserved3 in sctp_info to make sure sctp_info is 8-byte
          alignment.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40eb90e9
    • Troy Kisky's avatar
      net: fec: update dirty_tx even if no skb · 7fafe803
      Troy Kisky authored
      If dirty_tx isn't updated, then dma_unmap_single
      can be called twice.
      
      This fixes a
      [   58.420980] ------------[ cut here ]------------
      [   58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8()
      [   58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=66 bytes]
      
      encountered by Holger
      Signed-off-by: default avatarTroy Kisky <troy.kisky@boundarydevices.com>
      Tested-by: <holgerschurig@gmail.com>
      Acked-by: default avatarFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fafe803
    • Mike Manning's avatar
      vlan: Propagate MAC address to VLANs · 308453aa
      Mike Manning authored
      The MAC address of the physical interface is only copied to the VLAN
      when it is first created, resulting in an inconsistency after MAC
      address changes of only newly created VLANs having an up-to-date MAC.
      
      The VLANs should continue inheriting the MAC address of the physical
      interface until the VLAN MAC address is explicitly set to any value.
      This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
      to the MAC of the physical interface and thus for DAD to behave as
      expected.
      Signed-off-by: default avatarMike Manning <mmanning@brocade.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      308453aa
    • Dan Carpenter's avatar
      atm: iphase: off by one in rx_pkt() · f2633d2e
      Dan Carpenter authored
      The iadev->rx_open[] array holds "iadev->num_vc" pointers (this code
      assumes that pointers are 32 bits).  So the > here should be >= or else
      we could end up reading a garbage pointer from one element beyond the
      end of the array.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2633d2e