1. 01 Jun, 2016 1 commit
  2. 31 May, 2016 2 commits
  3. 20 May, 2016 3 commits
  4. 19 May, 2016 1 commit
    • Hugh Dickins's avatar
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins authored
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 13 May, 2016 1 commit
    • Christian Borntraeger's avatar
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger authored
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  6. 09 May, 2016 2 commits
    • Catalin Marinas's avatar
      kvm: arm64: Enable hardware updates of the Access Flag for Stage 2 page tables · 06485053
      Catalin Marinas authored
      The ARMv8.1 architecture extensions introduce support for hardware
      updates of the access and dirty information in page table entries. With
      VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
      PTE_AF bit cleared in the stage 2 page table, instead of raising an
      Access Flag fault to EL2 the CPU sets the actual page table entry bit
      (10). To ensure that kernel modifications to the page table do not
      inadvertently revert a bit set by hardware updates, certain Stage 2
      software pte/pmd operations must be performed atomically.
      The main user of the AF bit is the kvm_age_hva() mechanism. The
      kvm_age_hva_handler() function performs a "test and clear young" action
      on the pte/pmd. This needs to be atomic in respect of automatic hardware
      updates of the AF bit. Since the AF bit is in the same position for both
      Stage 1 and Stage 2, the patch reuses the existing
      ptep_test_and_clear_young() functionality if
      __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
      existing pte_young/pte_mkold mechanism is preserved.
      The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
      to perform atomic modifications in order to avoid a race with updates of
      the AF bit. The arm64 implementation has been re-written using
      Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
      argument and modify the pte/pmd in place. However, these functions are
      only used on local variables rather than actual page table entries, so
      it makes more sense to follow the pte_mkwrite() approach for stage 1
      attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
      functions do not modify the actual page table entries.
      The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
      explicitly) do not need to be modified since hardware updates of the
      dirty status are not supported by KVM, so there is no possibility of
      losing such information.
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
    • Robin Murphy's avatar
      iommu: of: enforce const-ness of struct iommu_ops · 53c92d79
      Robin Murphy authored
      As a set of driver-provided callbacks and static data, there is no
      compelling reason for struct iommu_ops to be mutable in core code, so
      enforce const-ness throughout.
      Acked-by: default avatarThierry Reding <treding@nvidia.com>
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
  7. 06 May, 2016 4 commits
  8. 03 May, 2016 2 commits
    • Yang Shi's avatar
      arm64: always use STRICT_MM_TYPECHECKS · 2326df55
      Yang Shi authored
      Inspired by the counterpart of powerpc [1], which shows there is no negative
      effect on code generation from enabling STRICT_MM_TYPECHECKS with a modern
      And, Arnd's comment [2] about that patch says STRICT_MM_TYPECHECKS could
      be default as long as the architecture can pass structures in registers as
      function arguments. ARM64 can do it as long as the size of structure <= 16
      bytes. All the page table value types are u64 on ARM64.
      The below disassembly demonstrates it, entry is pte_t type:
                  entry = arch_make_huge_pte(entry, vma, page, writable);
         0xffff00000826fc38 <+80>:    and     x0, x0, #0xfffffffffffffffd
         0xffff00000826fc3c <+84>:    mov     w3, w21
         0xffff00000826fc40 <+88>:    mov     x2, x20
         0xffff00000826fc44 <+92>:    mov     x1, x19
         0xffff00000826fc48 <+96>:    orr     x0, x0, #0x400
         0xffff00000826fc4c <+100>:   bl      0xffff00000809bcc0 <arch_make_huge_pte>
      [1] http://www.spinics.net/lists/linux-mm/msg105951.html
      [2] http://www.spinics.net/lists/linux-mm/msg105969.html
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
    • James Morse's avatar
      arm64: kvm: Fix kvm teardown for systems using the extended idmap · c612505f
      James Morse authored
      If memory is located above 1<<VA_BITS, kvm adds an extra level to its page
      tables, merging the runtime tables and boot tables that contain the idmap.
      This lets us avoid the trampoline dance during initialisation.
      This also means there is no trampoline page mapped, so
      __cpu_reset_hyp_mode() can't call __kvm_hyp_reset() in this page. The good
      news is the idmap is still mapped, so we don't need the trampoline page.
      The bad news is we can't call it directly as the idmap is above
      HYP_PAGE_OFFSET, so its address is masked by kvm_call_hyp.
      Add a function __extended_idmap_trampoline which will branch into
      __kvm_hyp_reset in the idmap, change kvm_hyp_reset_entry() to return
      this address if __kvm_cpu_uses_extended_idmap(). In this case
      __kvm_hyp_reset() will still switch to the boot tables (which are the
      merged tables that were already in use), and branch into the idmap (where
      it already was).
      This fixes boot failures on these systems, where we fail to execute the
      missing trampoline page when tearing down kvm in init_subsystems():
      [    2.508922] kvm [1]: 8-bit VMID
      [    2.512057] kvm [1]: Hyp mode initialized successfully
      [    2.517242] kvm [1]: interrupt-controller@e1140000 IRQ13
      [    2.522622] kvm [1]: timer IRQ3
      [    2.525783] Kernel panic - not syncing: HYP panic:
      [    2.525783] PS:200003c9 PC:0000007ffffff820 ESR:86000005
      [    2.525783] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
      [    2.525783] VCPU:          (null)
      [    2.525783]
      [    2.547667] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.6.0-rc5+ #1
      [    2.555137] Hardware name: Default string Default string/Default string, BIOS ROD0084E 09/03/2015
      [    2.563994] Call trace:
      [    2.566432] [<ffffff80080888d0>] dump_backtrace+0x0/0x240
      [    2.571818] [<ffffff8008088b24>] show_stack+0x14/0x20
      [    2.576858] [<ffffff80083423ac>] dump_stack+0x94/0xb8
      [    2.581899] [<ffffff8008152130>] panic+0x10c/0x250
      [    2.586677] [<ffffff8008152024>] panic+0x0/0x250
      [    2.591281] SMP: stopping secondary CPUs
      [    3.649692] SMP: failed to stop secondary CPUs 0-2,4-7
      [    3.654818] Kernel Offset: disabled
      [    3.658293] Memory Limit: none
      [    3.661337] ---[ end Kernel panic - not syncing: HYP panic:
      [    3.661337] PS:200003c9 PC:0000007ffffff820 ESR:86000005
      [    3.661337] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
      [    3.661337] VCPU:          (null)
      [    3.661337]
      Reported-by: default avatarWill Deacon <will.deacon@arm.com>
      Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarJames Morse <james.morse@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
  9. 28 Apr, 2016 17 commits
  10. 26 Apr, 2016 1 commit
  11. 25 Apr, 2016 3 commits
  12. 21 Apr, 2016 3 commits
    • Suzuki K Poulose's avatar
      arm64: kvm: Add support for 16K pages · 02e0b760
      Suzuki K Poulose authored
      Now that we can handle stage-2 page tables independent
      of the host page table levels, wire up the 16K page
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
    • Suzuki K Poulose's avatar
      kvm-arm: Cleanup stage2 pgd handling · 9163ee23
      Suzuki K Poulose authored
      Now that we don't have any fake page table levels for arm64,
      cleanup the common code to get rid of the dead code.
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Acked-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
    • Suzuki K Poulose's avatar
      kvm: arm64: Get rid of fake page table levels · da04fa04
      Suzuki K Poulose authored
      On arm64, the hardware supports concatenation of upto 16 tables,
      at entry level for stage2 translations and we make use that whenever
      possible. This could lead to reduced number of translation levels than
      the normal (stage1 table) table. Also, since the IPA(40bit) is smaller
      than the some of the supported VA_BITS (e.g, 48bit), there could be
      different number of levels in stage-1 vs stage-2 tables. To reuse the
      kernel host page table walker for stage2 we have been using a fake
      software page table level, not known to the hardware. But with 16K
      translations, there could be upto 2 fake software levels (with 48bit VA
      and 40bit IPA), which complicates the code. Hence, we want to get rid of
      the hack.
      Now that we have explicit accessors for hyp vs stage2 page tables,
      define the stage2 walker helpers accordingly based on the actual
      table used by the hardware.
      Once we know the number of translation levels used by the hardware,
      it is merely a job of defining the helpers based on whether a
      particular level is folded or not, looking at the number of levels.
      Some facts before we calculate the translation levels:
      1) Smallest page size supported by arm64 is 4K.
      2) The minimum number of bits resolved at any page table level
         is (PAGE_SHIFT - 3) at intermediate levels.
      Both of them implies, minimum number of bits required for a level
      change is 9.
      Since we can concatenate upto 16 tables at stage2 entry, the total
      number of page table levels used by the hardware for resolving N bits
      is same as that for (N - 4) bits (with concatenation), as there cannot
      be a level in between (N, N-4) as per the above rules.
      Hence, we have
      With the current IPA limit (40bit), for all supported translations
      and VA_BITS, we have the following condition (even for 36bit VA with
      16K page size):
      So, for e.g,  if PUD is present in stage2, it is present in the hyp(host).
      Hence, we fall back to the host definition if we find that a level is not
      folded. Otherwise we redefine it accordingly. A build time check is added
      to make sure the above condition holds. If this condition breaks in future,
      we can rearrange the host level helpers and fix our code easily.
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>