1. 26 Apr, 2015 1 commit
    • Andy Lutomirski's avatar
      x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue · 61f01dd9
      Andy Lutomirski authored
      AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
      SS == 0 results in an invalid usermode state in which SS is apparently
      equal to __USER_DS but causes #SS if used.
      
      Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
      ensuring that SYSRET never happens with SS set to NULL.
      
      This was exposed by a recent vDSO cleanup.
      
      Fixes: e7d6eefa
      
       x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      61f01dd9
  2. 02 Apr, 2015 2 commits
    • Ross Zwisler's avatar
      x86/asm: Add support for the CLWB instruction · d9dc64f3
      Ross Zwisler authored
      Add support for the new CLWB (cache line write back)
      instruction.  This instruction was announced in the document
      "Intel Architecture Instruction Set Extensions Programming
      Reference" with reference number 319433-022.
      
        https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
      
      
      
      The CLWB instruction is used to write back the contents of
      dirtied cache lines to memory without evicting the cache lines
      from the processor's cache hierarchy.  This should be used in
      favor of clflushopt or clflush in cases where you require the
      cache line to be written to memory but plan to access the data
      again in the near future.
      
      One of the main use cases for this is with persistent memory
      where CLWB can be used with PCOMMIT to ensure that data has been
      accepted to memory and is durable on the DIMM.
      
      This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
      and PCOMMIT with appropriate fencing:
      
      void flush_and_commit_buffer(void *vaddr, unsigned int size)
      {
      	void *vend = vaddr + size - 1;
      
      	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
      		clwb(vaddr);
      
      	/* Flush any possible final partial cacheline */
      	clwb(vend);
      
      	/*
      	 * Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
      	 * (MFENCE via mb() also works)
      	 */
      	wmb();
      
      	/* PCOMMIT and the required SFENCE for ordering */
      	pcommit_sfence();
      }
      
      After this function completes the data pointed to by vaddr is
      has been accepted to memory and will be durable if the vaddr
      points to persistent memory.
      
      Regarding the details of how the alternatives assembly is set
      up, we need one additional byte at the beginning of the CLFLUSH
      so that we can flip it into a CLFLUSHOPT by changing that byte
      into a 0x66 prefix.  Two options are to either insert a 1 byte
      ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no
      functional effect with the plain CLFLUSH, but I've been told
      that executing a CLFLUSH + prefix should be faster than
      executing a CLFLUSH + NOP.
      
      We had to hard code the assembly for CLWB because, lacking the
      ability to assemble the CLWB instruction itself, the next
      closest thing is to have an xsaveopt instruction with a 0x66
      prefix.  Unfortunately XSAVEOPT itself is also relatively new,
      and isn't included by all the GCC versions that the kernel needs
      to support.
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d9dc64f3
    • Alexander Shishkin's avatar
      x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection · ed69628b
      Alexander Shishkin authored
      
      
      Intel Processor Trace is an architecture extension that allows for program
      flow tracing.
      Signed-off-by: default avatarAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-11-git-send-email-alexander.shishkin@linux.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ed69628b
  3. 25 Feb, 2015 1 commit
    • Peter P Waskiewicz Jr's avatar
      x86: Add support for Intel Cache QoS Monitoring (CQM) detection · cbc82b17
      Peter P Waskiewicz Jr authored
      
      
      This patch adds support for the new Cache QoS Monitoring (CQM)
      feature found in future Intel Xeon processors.  It includes the
      new values to track CQM resources to the cpuinfo_x86 structure,
      plus the CPUID detection routines for CQM.
      
      CQM allows a process, or set of processes, to be tracked by the CPU
      to determine the cache usage of that task group.  Using this data
      from the CPU, software can be written to extract this data and
      report cache usage and occupancy for a particular process, or
      group of processes.
      
      More information about Cache QoS Monitoring can be found in the
      Intel (R) x86 Architecture Software Developer Manual, section 17.14.
      Signed-off-by: default avatarPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Webb <chris@arachsys.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Honeyman <stevenhoneyman@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cbc82b17
  4. 23 Feb, 2015 2 commits
    • Borislav Petkov's avatar
      x86/alternatives: Make JMPs more robust · 48c7a250
      Borislav Petkov authored
      
      
      Up until now we had to pay attention to relative JMPs in alternatives
      about how their relative offset gets computed so that the jump target
      is still correct. Or, as it is the case for near CALLs (opcode e8), we
      still have to go and readjust the offset at patching time.
      
      What is more, the static_cpu_has_safe() facility had to forcefully
      generate 5-byte JMPs since we couldn't rely on the compiler to generate
      properly sized ones so we had to force the longest ones. Worse than
      that, sometimes it would generate a replacement JMP which is longer than
      the original one, thus overwriting the beginning of the next instruction
      at patching time.
      
      So, in order to alleviate all that and make using JMPs more
      straight-forward we go and pad the original instruction in an
      alternative block with NOPs at build time, should the replacement(s) be
      longer. This way, alternatives users shouldn't pay special attention
      so that original and replacement instruction sizes are fine but the
      assembler would simply add padding where needed and not do anything
      otherwise.
      
      As a second aspect, we go and recompute JMPs at patching time so that we
      can try to make 5-byte JMPs into two-byte ones if possible. If not, we
      still have to recompute the offsets as the replacement JMP gets put far
      away in the .altinstr_replacement section leading to a wrong offset if
      copied verbatim.
      
      For example, on a locally generated kernel image
      
        old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
        __switch_to:
         ffffffff810014bd:      eb 21                   jmp ffffffff810014e0
        repl insn: size: 5
        ffffffff81d0b23c:       e9 b1 62 2f ff          jmpq ffffffff810014f2
      
      gets corrected to a 2-byte JMP:
      
        apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
        alt_insn: e9 b1 62 2f ff
        recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
        converted to: eb 33 90 90 90
      
      and a 5-byte JMP:
      
        old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
        __switch_to:
         ffffffff81001516:      eb 30                   jmp ffffffff81001548
        repl insn: size: 5
         ffffffff81d0b241:      e9 10 63 2f ff          jmpq ffffffff81001556
      
      gets shortened into a two-byte one:
      
        apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
        alt_insn: e9 10 63 2f ff
        recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
        converted to: eb 3e 90 90 90
      
      ... and so on.
      
      This leads to a net win of around
      
      40ish replacements * 3 bytes savings =~ 120 bytes of I$
      
      on an AMD guest which means some savings of precious instruction cache
      bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs
      which on smart microarchitectures means discarding NOPs at decode time
      and thus freeing up execution bandwidth.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      48c7a250
    • Borislav Petkov's avatar
      x86/alternatives: Add instruction padding · 4332195c
      Borislav Petkov authored
      
      
      Up until now we have always paid attention to make sure the length of
      the new instruction replacing the old one is at least less or equal to
      the length of the old instruction. If the new instruction is longer, at
      the time it replaces the old instruction it will overwrite the beginning
      of the next instruction in the kernel image and cause your pants to
      catch fire.
      
      So instead of having to pay attention, teach the alternatives framework
      to pad shorter old instructions with NOPs at buildtime - but only in the
      case when
      
        len(old instruction(s)) < len(new instruction(s))
      
      and add nothing in the >= case. (In that case we do add_nops() when
      patching).
      
      This way the alternatives user shouldn't have to care about instruction
      sizes and simply use the macros.
      
      Add asm ALTERNATIVE* flavor macros too, while at it.
      
      Also, we need to save the pad length in a separate struct alt_instr
      member for NOP optimization and the way to do that reliably is to carry
      the pad length instead of trying to detect whether we're looking at
      single-byte NOPs or at pathological instruction offsets like e9 90 90 90
      90, for example, which is a valid instruction.
      
      Thanks to Michael Matz for the great help with toolchain questions.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      4332195c
  5. 20 Feb, 2015 1 commit
    • Ross Zwisler's avatar
      x86/asm: Add support for the pcommit instruction · 719d359d
      Ross Zwisler authored
      Add support for the new pcommit (persistent commit) instruction.
      This instruction was announced in the document "Intel
      Architecture Instruction Set Extensions Programming Reference"
      with reference number 319433-022:
      
        https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
      
      
      
      The pcommit instruction ensures that data that has been flushed
      from the processor's cache hierarchy with clwb, clflushopt or
      clflush is accepted to memory and is durable on the DIMM.  The
      primary use case for this is persistent memory.
      
      This function shows how to properly use clwb/clflushopt/clflush
      and pcommit with appropriate fencing:
      
      void flush_and_commit_buffer(void *vaddr, unsigned int size)
      {
      	void *vend = vaddr + size - 1;
      
      	for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
      		clwb(vaddr);
      
      	/* Flush any possible final partial cacheline */
      	clwb(vend);
      
      	/*
      	 * sfence to order clwb/clflushopt/clflush cache flushes
      	 * mfence via mb() also works
      	 */
      	wmb();
      
      	/* pcommit and the required sfence for ordering */
      	pcommit_sfence();
      }
      
      After this function completes the data pointed to by vaddr is
      has been accepted to memory and will be durable if the vaddr
      points to persistent memory.
      
      Pcommit must always be ordered by an mfence or sfence, so to
      help simplify things we include both the pcommit and the
      required sfence in the alternatives generated by
      pcommit_sfence().  The other option is to keep them separated,
      but on platforms that don't support pcommit this would then turn
      into:
      
      void flush_and_commit_buffer(void *vaddr, unsigned int size)
      {
              void *vend = vaddr + size - 1;
      
              for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
                      clwb(vaddr);
      
              /* Flush any possible final partial cacheline */
              clwb(vend);
      
              /*
               * sfence to order clwb/clflushopt/clflush cache flushes
               * mfence via mb() also works
               */
              wmb();
      
              nop(); /* from pcommit(), via alternatives */
      
              /*
               * sfence to order pcommit
               * mfence via mb() also works
               */
              wmb();
      }
      
      This is still correct, but now you've got two fences separated
      by only a nop.  With the commit and the fence together in
      pcommit_sfence() you avoid the final unneeded fence.
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1424367448-24254-1-git-send-email-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      719d359d
  6. 03 Dec, 2014 1 commit
  7. 11 Nov, 2014 1 commit
  8. 24 Sep, 2014 1 commit
  9. 11 Sep, 2014 3 commits
    • Dave Hansen's avatar
      x86: Add more disabled features · 9298b815
      Dave Hansen authored
      
      
      The original motivation for these patches was for an Intel CPU
      feature called MPX.  The patch to add a disabled feature for it
      will go in with the other parts of the support.
      
      But, in the meantime, there are a few other features than MPX
      that we can make assumptions about at compile-time based on
      compile options.  Add them to disabled-features.h and check them
      with cpu_feature_enabled().
      
      Note that this gets rid of the last things that needed an #ifdef
      CONFIG_X86_64 in cpufeature.h.  Yay!
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      9298b815
    • Dave Hansen's avatar
      x86: Introduce disabled-features · 381aa07a
      Dave Hansen authored
      
      
      I believe the REQUIRED_MASK aproach was taken so that it was
      easier to consult in assembly (arch/x86/kernel/verify_cpu.S).
      DISABLED_MASK does not have the same restriction, but I
      implemented it the same way for consistency.
      
      We have a REQUIRED_MASK... which does two things:
      1. Keeps a list of cpuid bits to check in very early boot and
         refuse to boot if those are not present.
      2. Consulted during cpu_has() checks, which allows us to
         optimize out things at compile-time.  In other words, if we
         *KNOW* we will not boot with the feature off, then we can
         safely assume that it will be present forever.
      
      But, we don't have a similar mechanism for CPU features which
      may be present but that we know we will not use.  We simply
      use our existing mechanisms to repeatedly check the status of
      the bit at runtime (well, the alternatives patching helps here
      but it does not provide compile-time optimization).
      
      Adding a feature to disabled-features.h allows the bit to be
      checked via a new macro: cpu_feature_enabled().  Note that
      for features in DISABLED_MASK, checks with this macro have
      all of the benefits of an #ifdef.  Before, we would have done
      this in a header:
      
      #ifdef CONFIG_X86_INTEL_MPX
      #define cpu_has_mpx cpu_has(X86_FEATURE_MPX)
      #else
      #define cpu_has_mpx 0
      #endif
      
      and this in the code:
      
      	if (cpu_has_mpx)
      		do_some_mpx_thing();
      
      Now, just add your feature to DISABLED_MASK and you can do this
      everywhere, and get the same benefits you would have from
      #ifdefs:
      
      	if (cpu_feature_enabled(X86_FEATURE_MPX))
      		do_some_mpx_thing();
      
      We need a new function and *not* a modification to cpu_has()
      because there are cases where we actually need to check the CPU
      itself, despite what features the kernel supports.  The best
      example of this is a hypervisor which has no control over what
      features its guests are using and where the guest does not depend
      on the host for support.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.com
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      381aa07a
    • Dave Hansen's avatar
      x86: Axe the lightly-used cpu_has_pae · c8128cce
      Dave Hansen authored
      
      
      cpu_has_pae is only referenced in one place: the X86_32 kexec
      code (in a file not even built on 64-bit).  It hardly warrants
      its own macro, or the trouble we go to ensuring that it can't
      be called in X86_64 code.
      
      Axe the macro and replace it with a direct cpu feature check.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.com
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      c8128cce
  10. 17 Aug, 2014 1 commit
    • Josh Triplett's avatar
      x86: Support compiling out human-friendly processor feature names · 9def39be
      Josh Triplett authored
      
      
      The table mapping CPUID bits to human-readable strings takes up a
      non-trivial amount of space, and only exists to support /proc/cpuinfo
      and a couple of kernel messages.  Since programs depend on the format of
      /proc/cpuinfo, force inclusion of the table when building with /proc
      support; otherwise, support omitting that table to save space, in which
      case the kernel messages will print features numerically instead.
      
      In addition to saving 1408 bytes out of vmlinux, this also saves 1373
      bytes out of the uncompressed setup code, which contributes directly to
      the size of bzImage.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      9def39be
  11. 14 Jul, 2014 2 commits
  12. 18 Jun, 2014 1 commit
  13. 29 May, 2014 2 commits
  14. 27 Feb, 2014 2 commits
  15. 20 Feb, 2014 1 commit
  16. 18 Feb, 2014 1 commit
  17. 06 Dec, 2013 1 commit
  18. 10 Oct, 2013 1 commit
  19. 28 Jun, 2013 1 commit
  20. 20 Jun, 2013 3 commits
  21. 25 Apr, 2013 1 commit
  22. 21 Apr, 2013 1 commit
  23. 10 Apr, 2013 1 commit
  24. 02 Apr, 2013 6 commits
  25. 15 Mar, 2013 1 commit
  26. 16 Feb, 2013 1 commit