Commit d70b3ef5 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'x86-core-for-linus' of git://

Pull x86 core updates from Ingo Molnar:
 "There were so many changes in the x86/asm, x86/apic and x86/mm topics
  in this cycle that the topical separation of -tip broke down somewhat -
  so the result is a more traditional architecture pull request,
  collected into the 'x86/core' topic.

  The topics were still maintained separately as far as possible, so
  bisectability and conceptual separation should still be pretty good -
  but there were a handful of merge points to avoid excessive
  dependencies (and conflicts) that would have been poorly tested in the

  The next cycle will hopefully be much more quiet (or at least will
  have fewer dependencies).

  The main changes in this cycle were:

   * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas

     - This is the second and most intrusive part of changes to the x86
       interrupt handling - full conversion to hierarchical interrupt

          [IOAPIC domain]   -----
          [MSI domain]      --------[Remapping domain] ----- [ Vector domain ]
                                 |   (optional)          |
          [HPET MSI domain] -----                        |
          [DMAR domain]     -----------------------------
          [Legacy domain]   -----------------------------

       This now reflects the actual hardware and allowed us to distangle
       the domain specific code from the underlying parent domain, which
       can be optional in the case of interrupt remapping.  It's a clear
       separation of functionality and removes quite some duct tape
       constructs which plugged the remap code between ioapic/msi/hpet
       and the vector management.

     - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
       injection into guests (Feng Wu)

   * x86/asm changes:

     - Tons of cleanups and small speedups, micro-optimizations.  This
       is in preparation to move a good chunk of the low level entry
       code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
       Brian Gerst)

     - Moved all system entry related code to a new home under
       arch/x86/entry/ (Ingo Molnar)

     - Removal of the fragile and ugly CFI dwarf debuginfo annotations.
       Conversion to C will reintroduce many of them - but meanwhile
       they are only getting in the way, and the upstream kernel does
       not rely on them (Ingo Molnar)

     - NOP handling refinements. (Borislav Petkov)

   * x86/mm changes:

     - Big PAT and MTRR rework: making the code more robust and
       preparing to phase out exposing direct MTRR interfaces to drivers -
       in favor of using PAT driven interfaces (Toshi Kani, Luis R
       Rodriguez, Borislav Petkov)

     - New ioremap_wt()/set_memory_wt() interfaces to support
       Write-Through cached memory mappings.  This is especially
       important for good performance on NVDIMM hardware (Toshi Kani)

   * x86/ras changes:

     - Add support for deferred errors on AMD (Aravind Gopalakrishnan)

       This is an important RAS feature which adds hardware support for
       poisoned data.  That means roughly that the hardware marks data
       which it has detected as corrupted but wasn't able to correct, as
       poisoned data and raises an APIC interrupt to signal that in the
       form of a deferred error.  It is the OS's responsibility then to
       take proper recovery action and thus prolonge system lifetime as
       far as possible.

     - Add support for Intel "Local MCE"s: upcoming CPUs will support
       CPU-local MCE interrupts, as opposed to the traditional system-
       wide broadcasted MCE interrupts (Ashok Raj)

     - Misc cleanups (Borislav Petkov)

   * x86/platform changes:

     - Intel Atom SoC updates

  ... and lots of other cleanups, fixlets and other changes - see the
  shortlog and the Git log for details"

* 'x86-core-for-linus' of git:// (222 commits)
  x86/hpet: Use proper hpet device number for MSI allocation
  x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
  x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
  x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
  x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
  genirq: Prevent crash in irq_move_irq()
  genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
  iommu, x86: Properly handle posted interrupts for IOMMU hotplug
  iommu, x86: Provide irq_remapping_cap() interface
  iommu, x86: Setup Posted-Interrupts capability for Intel iommu
  iommu, x86: Add cap_pi_support() to detect VT-d PI capability
  iommu, x86: Avoid migrating VT-d posted interrupts
  iommu, x86: Save the mode (posted or remapped) of an IRTE
  iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
  iommu: dmar: Provide helper to copy shared irte fields
  iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
  iommu: Add new member capability to struct irq_remap_ops
  x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
  x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
  x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
parents 650ec5a6 7ef3d7d5
......@@ -746,6 +746,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted. [CPU_IDLE]
disable the cpuidle sub-system
[X86] Delay for N microsec between assert and de-assert
of APIC INIT to start processors. This delay occurs
on every CPU online, such as boot, and resume from suspend.
Default: 10000
cpcihp_generic= [HW,PCI] Generic port I/O CompactPCI driver
......@@ -18,10 +18,10 @@ Some of these entries are:
- system_call: syscall instruction from 64-bit code.
- ia32_syscall: int 0x80 from 32-bit or 64-bit code; compat syscall
- entry_INT80_compat: int 0x80 from 32-bit or 64-bit code; compat syscall
either way.
- ia32_syscall, ia32_sysenter: syscall and sysenter from 32-bit
- entry_INT80_compat, ia32_sysenter: syscall and sysenter from 32-bit
- interrupt: An array of entries. Every IDT vector that doesn't
MTRR (Memory Type Range Register) control
3 Jun 1999
Richard Gooch
Richard Gooch <> - 3 Jun 1999
Luis R. Rodriguez <> - April 9, 2015
Phasing out MTRR use
MTRR use is replaced on modern x86 hardware with PAT. Over time the only type
of effective MTRR that is expected to be supported will be for write-combining.
As MTRR use is phased out device drivers should use arch_phys_wc_add() to make
MTRR effective on non-PAT systems while a no-op on PAT enabled systems.
For details refer to Documentation/x86/pat.txt.
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
......@@ -12,7 +12,7 @@ virtual addresses.
PAT allows for different types of memory attributes. The most commonly used
ones that will be supported at this time are Write-back, Uncached,
Write-combined and Uncached Minus.
Write-combined, Write-through and Uncached Minus.
......@@ -34,16 +34,23 @@ ioremap | -- | UC- | UC- |
| | | |
ioremap_cache | -- | WB | WB |
| | | |
ioremap_uc | -- | UC | UC |
| | | |
ioremap_nocache | -- | UC- | UC- |
| | | |
ioremap_wc | -- | -- | WC |
| | | |
ioremap_wt | -- | -- | WT |
| | | |
set_memory_uc | UC- | -- | -- |
set_memory_wb | | | |
| | | |
set_memory_wc | WC | -- | -- |
set_memory_wb | | | |
| | | |
set_memory_wt | WT | -- | -- |
set_memory_wb | | | |
| | | |
pci sysfs resource | -- | -- | UC- |
| | | |
pci sysfs resource_wc | -- | -- | WC |
......@@ -102,7 +109,38 @@ wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
as step 0 above and also track the usage of those pages and use set_memory_wb()
before the page is freed to free pool.
MTRR effects on PAT / non-PAT systems
The following table provides the effects of using write-combining MTRRs when
using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
is made, should already have been ioremapped with WC attributes or PAT entries,
this can be done by using ioremap_wc() / set_memory_wc(). Devices which
combine areas of IO memory desired to remain uncacheable with areas where
write-combining is desirable should consider use of ioremap_uc() followed by
set_memory_wc() to white-list effective write-combined areas. Such use is
nevertheless discouraged as the effective memory type is considered
implementation defined, yet this strategy can be used as last resort on devices
with size-constrained regions where otherwise MTRR write-combining would
otherwise not be effective.
MTRR Non-PAT PAT Linux ioremap value Effective memory type
(*) denotes implementation defined and is discouraged
......@@ -115,8 +153,8 @@ can be more restrictive, in case of any existing aliasing for that address.
For example: If there is an existing uncached mapping, a new ioremap_wc can
return uncached mapping in place of write-combine requested.
set_memory_[uc|wc] and set_memory_wb should be used in pairs, where driver will
first make a region uc or wc and switch it back to wb after use.
set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
will first make a region uc, wc or wt and switch it back to wb after use.
Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
......@@ -124,7 +162,7 @@ interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
Drivers should use set_memory_[uc|wc] to set access type for RAM ranges.
Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
PAT debugging
......@@ -31,6 +31,9 @@ Machine check
(e.g. BIOS or hardware monitoring applications), conflicting
with OS's error handling, and you cannot deactivate the agent,
then this option will be a help.
Do not opt-in to Local MCE delivery. Use legacy method
to broadcast MCEs.
Enable logging of machine checks left over from booting.
Disabled by default on AMD because some BIOS leave bogus ones.
......@@ -10894,7 +10894,7 @@ M: Andy Lutomirski <>
T: git git:// x86/vdso
S: Maintained
F: arch/x86/vdso/
F: arch/x86/entry/vdso/
M: Mauro Carvalho Chehab <>
......@@ -20,6 +20,7 @@ extern void iounmap(const void __iomem *addr);
#define ioremap_nocache(phy, sz) ioremap(phy, sz)
#define ioremap_wc(phy, sz) ioremap(phy, sz)
#define ioremap_wt(phy, sz) ioremap(phy, sz)
/* Change struct page to physical address */
#define page_to_phys(page) (page_to_pfn(page) << PAGE_SHIFT)
......@@ -336,6 +336,7 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
#define ioremap_nocache(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE)
#define ioremap_cache(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_CACHED)
#define ioremap_wc(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE_WC)
#define ioremap_wt(cookie,size) __arm_ioremap((cookie), (size), MT_DEVICE)
#define iounmap __arm_iounmap
......@@ -170,6 +170,7 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size);
#define ioremap(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
#define ioremap_nocache(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
#define ioremap_wc(addr, size) __ioremap((addr), (size), __pgprot(PROT_NORMAL_NC))
#define ioremap_wt(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
#define iounmap __iounmap
......@@ -296,6 +296,7 @@ extern void __iounmap(void __iomem *addr);
#define ioremap_wc ioremap_nocache
#define ioremap_wt ioremap_nocache
#define cached(addr) P1SEGADDR(addr)
#define uncached(addr) P2SEGADDR(addr)
......@@ -17,6 +17,8 @@
#ifdef __KERNEL__
#include <linux/types.h>
#include <asm/virtconvert.h>
#include <asm/string.h>
......@@ -265,7 +267,7 @@ static inline void __iomem *ioremap_nocache(unsigned long physaddr, unsigned lon
return __ioremap(physaddr, size, IOMAP_NOCACHE_SER);
static inline void __iomem *ioremap_writethrough(unsigned long physaddr, unsigned long size)
static inline void __iomem *ioremap_wt(unsigned long physaddr, unsigned long size)
return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
#define irq_remapping_enabled 0
#define dmar_alloc_hwirq create_irq
#define dmar_free_hwirq destroy_irq
......@@ -165,7 +165,7 @@ static struct irq_chip dmar_msi_type = {
.irq_retrigger = ia64_msi_retrigger_irq,
static int
static void
msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg)
struct irq_cfg *cfg = irq_cfg + irq;
......@@ -186,21 +186,29 @@ msi_compose_msg(struct pci_dev *pdev, unsigned int irq, struct msi_msg *msg)
return 0;
int arch_setup_dmar_msi(unsigned int irq)
int dmar_alloc_hwirq(int id, int node, void *arg)
int ret;
int irq;
struct msi_msg msg;
ret = msi_compose_msg(NULL, irq, &msg);
if (ret < 0)
return ret;
dmar_msi_write(irq, &msg);
irq_set_chip_and_handler_name(irq, &dmar_msi_type, handle_edge_irq,
return 0;
irq = create_irq();
if (irq > 0) {
irq_set_handler_data(irq, arg);
irq_set_chip_and_handler_name(irq, &dmar_msi_type,
handle_edge_irq, "edge");
msi_compose_msg(NULL, irq, &msg);
dmar_msi_write(irq, &msg);
return irq;
void dmar_free_hwirq(int irq)
irq_set_handler_data(irq, NULL);
......@@ -68,6 +68,7 @@ static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
extern void iounmap(volatile void __iomem *addr);
#define ioremap_nocache(off,size) ioremap(off,size)
#define ioremap_wc ioremap_nocache
#define ioremap_wt ioremap_nocache
* IO bus memory addresses are also 1:1 with the physical address
......@@ -20,6 +20,8 @@
#ifdef __KERNEL__
#include <linux/compiler.h>
#include <asm/raw_io.h>
#include <asm/virtconvert.h>
......@@ -465,7 +467,7 @@ static inline void __iomem *ioremap_nocache(unsigned long physaddr, unsigned lon
return __ioremap(physaddr, size, IOMAP_NOCACHE_SER);
static inline void __iomem *ioremap_writethrough(unsigned long physaddr,
static inline void __iomem *ioremap_wt(unsigned long physaddr,
unsigned long size)
return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
......@@ -3,6 +3,8 @@
#ifdef __KERNEL__
#include <asm/virtconvert.h>
#include <asm-generic/iomap.h>
......@@ -153,7 +155,7 @@ static inline void *ioremap_nocache(unsigned long physaddr, unsigned long size)
return __ioremap(physaddr, size, IOMAP_NOCACHE_SER);
static inline void *ioremap_writethrough(unsigned long physaddr, unsigned long size)
static inline void *ioremap_wt(unsigned long physaddr, unsigned long size)
return __ioremap(physaddr, size, IOMAP_WRITETHROUGH);
......@@ -160,6 +160,9 @@ extern void __iounmap(void __iomem *addr);
#define ioremap_wc(offset, size) \
__ioremap((offset), (size), _PAGE_WR_COMBINE)
#define ioremap_wt(offset, size) \
__ioremap((offset), (size), 0)
#define iounmap(addr) \
......@@ -39,10 +39,10 @@ extern resource_size_t isa_mem_base;
extern void iounmap(void __iomem *addr);
extern void __iomem *ioremap(phys_addr_t address, unsigned long size);
#define ioremap_writethrough(addr, size) ioremap((addr), (size))
#define ioremap_nocache(addr, size) ioremap((addr), (size))
#define ioremap_fullcache(addr, size) ioremap((addr), (size))
#define ioremap_wc(addr, size) ioremap((addr), (size))
#define ioremap_wt(addr, size) ioremap((addr), (size))
#endif /* CONFIG_MMU */
......@@ -282,6 +282,7 @@ static inline void __iomem *ioremap_nocache(unsigned long offset, unsigned long
#define ioremap_wc ioremap_nocache
#define ioremap_wt ioremap_nocache
static inline void iounmap(void __iomem *addr)
......@@ -46,6 +46,7 @@ static inline void iounmap(void __iomem *addr)
#define ioremap_wc ioremap_nocache
#define ioremap_wt ioremap_nocache
/* Pages to physical address... */
#define page_to_phys(page) virt_to_phys(page_to_virt(page))
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment