1. 03 Nov, 2013 1 commit
    • Paolo Bonzini's avatar
      KVM: x86: fix emulation of "movzbl %bpl, %eax" · daf72722
      Paolo Bonzini authored
      When I was looking at RHEL5.9's failure to start with
      unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a
      slightly older tree than kvm.git.  I now debugged the remaining failure,
      which was introduced by commit 660696d1 (KVM: X86 emulator: fix
      source operand decoding for 8bit mov[zs]x instructions, 2013-04-24)
      introduced a similar mis-emulation to the one in commit 8acb4207
      
       (KVM:
      fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30).  The incorrect
      decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand
      is sil/dil/bpl/spl.
      
      Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression
      prolog, just a handful of instructions before finally giving control to
      the decompressed vmlinux and getting out of the invalid guest state.
      
      Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix
      must be applied to OpMem8.
      Reported-by: default avatarMichele Baldessari <michele@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      daf72722
  2. 31 Oct, 2013 4 commits
  3. 30 Oct, 2013 8 commits
  4. 28 Oct, 2013 3 commits
  5. 15 Oct, 2013 2 commits
  6. 14 Oct, 2013 1 commit
  7. 10 Oct, 2013 4 commits
    • Ingo Molnar's avatar
      compiler/gcc4: Add quirk for 'asm goto' miscompilation bug · 3f0116c3
      Ingo Molnar authored
      Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
      a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
      constructs, as outlined here:
      
        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
      
      
      
      Implement a workaround suggested by Jakub Jelinek.
      Reported-and-tested-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Suggested-by: default avatarJakub Jelinek <jakub@redhat.com>
      Reviewed-by: default avatarRichard Henderson <rth@twiddle.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3f0116c3
    • Arthur Chunqi Li's avatar
      KVM: nVMX: Fully support nested VMX preemption timer · 7854cbca
      Arthur Chunqi Li authored
      
      
      This patch contains the following two changes:
      1. Fix the bug in nested preemption timer support. If vmexit L2->L0
      with some reasons not emulated by L1, preemption timer value should
      be save in such exits.
      2. Add support of "Save VMX-preemption timer value" VM-Exit controls
      to nVMX.
      
      With this patch, nested VMX preemption timer features are fully
      supported.
      Signed-off-by: default avatarArthur Chunqi Li <yzt356@gmail.com>
      Reviewed-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      7854cbca
    • Frediano Ziglio's avatar
      xen: Fix possible user space selector corruption · 7cde9b27
      Frediano Ziglio authored
      Due to the way kernel is initialized under Xen is possible that the
      ring1 selector used by the kernel for the boot cpu end up to be copied
      to userspace leading to segmentation fault in the userspace.
      
      Xen code in the kernel initialize no-boot cpus with correct selectors (ds
      and es set to __USER_DS) but the boot one keep the ring1 (passed by Xen).
      On task context switch (switch_to) we assume that ds, es and cs already
      point to __USER_DS and __KERNEL_CSso these selector are not changed.
      
      If processor is an Intel that support sysenter instruction sysenter/sysexit
      is used so ds and es are not restored switching back from kernel to
      userspace. In the case the selectors point to a ring1 instead of __USER_DS
      the userspace code will crash on first memory access attempt (to be
      precise Xen on the emulated iret used to do sysexit will detect and set ds
      and es to zero which lead to GPF anyway).
      
      Now if an userspace process call kernel using sysenter and get rescheduled
      (for me it happen on a specific init calling wait4) could happen that the
      ring1 selector is set to ds and es.
      
      This is quite hard to detect cause after a while these selectors are fixed
      (__USER_DS seems sticky).
      
      Bisecting the code commit 7076aada
      
       appears
      to be the first one that have this issue.
      Signed-off-by: default avatarFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      7cde9b27
    • Gleb Natapov's avatar
      KVM: nVMX: fix shadow on EPT · d0d538b9
      Gleb Natapov authored
      72f85795
      
       broke shadow on EPT. This patch reverts it and fixes PAE
      on nEPT (which reverted commit fixed) in other way.
      
      Shadow on EPT is now broken because while L1 builds shadow page table
      for L2 (which is PAE while L2 is in real mode) it never loads L2's
      GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
      virtualization HW does this during guest entry if EPT is disabled,
      but in our case L0 emulates L2's vmentry while EPT is enables, so we
      cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
      and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
      is doing, but by clearing cache bits during L2 vmentry we drop values
      that kvm_set_cr3() read from memory.
      
      So why the same code does not work for PAE on nEPT? kvm_set_cr3()
      reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
      vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
      uses vcpu->arch.mmu which contain incorrect values. Fix that by using
      walk_mmu in ept_(load|save)_pdptrs.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Tested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d0d538b9
  8. 06 Oct, 2013 1 commit
  9. 04 Oct, 2013 3 commits
  10. 03 Oct, 2013 7 commits
  11. 02 Oct, 2013 1 commit
    • David Herrmann's avatar
      x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning · 29d274b8
      David Herrmann authored
      
      
      IORESOURCE_BUSY is used to mark temporary driver mem-resources
      instead of global regions. This suppresses warnings if regions
      overlap with a region marked as BUSY.
      
      This was always the case for VESA/VGA/EFI framebuffer regions so
      do the same for simplefb regions. The reason we do this is to
      allow device handover to real GPU drivers like
      i915/radeon/nouveau which get the same regions via PCI BARs.
      
      Maybe at some point we will be able to unregister platform
      devices properly during the handover. In this case the simplefb
      region would get removed before the new region is created.
      However, this is currently not the case and would require rather
      huge changes in remove_conflicting_framebuffers(). Add the BUSY
      marker now and try to eventually rewrite the handover for a next release.
      
      Also see kernel/resource.c for more information:
      
        /*
         * if a resource is "BUSY", it's not a hardware resource
         * but a driver mapping of such a resource; we don't want
         * to warn for those; some drivers legitimately map only
         * partial hardware resources. (example: vesafb)
         */
      
      This suppresses warnings like:
      
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
        Info: mapping multiple BARs. Your kernel is fine.
        Call Trace:
          dump_stack+0x54/0x8d
          warn_slowpath_common+0x7d/0xa0
          warn_slowpath_fmt+0x4c/0x50
          iomem_map_sanity_check+0xac/0xe0
          __ioremap_caller+0x2e3/0x390
          ioremap_wc+0x32/0x40
          i915_driver_load+0x670/0xf50 [i915]
          ...
      Reported-by: default avatarTom Gundersen <teg@jklm.no>
      Tested-by: default avatarTom Gundersen <teg@jklm.no>
      Tested-by: default avatarPavel Roskin <proski@gnu.org>
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      29d274b8
  12. 01 Oct, 2013 1 commit
  13. 30 Sep, 2013 4 commits
    • Paolo Bonzini's avatar
      KVM: Convert kvm_lock back to non-raw spinlock · 2f303b74
      Paolo Bonzini authored
      In commit e935b837
      
       ("KVM: Convert kvm_lock to raw_spinlock"),
      the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
      function tries to grab the (non-raw) mmu_lock within the scope of
      the raw locked kvm_lock being held.  This leads to the following:
      
      BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
      in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
      Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]
      
      Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
      Call Trace:
       [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
       [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
       [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
       [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
       [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
       [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
       [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
       [<ffffffff811185bf>] kswapd+0x18f/0x490
       [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
       [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
       [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
       [<ffffffff81060d2b>] kthread+0xdb/0xe0
       [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
       [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
       [<ffffffff81060c50>] ? __init_kthread_worker+0x
      
      After the previous patch, kvm_lock need not be a raw spinlock anymore,
      so change it back.
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: kvm@vger.kernel.org
      Cc: gleb@redhat.com
      Cc: jan.kiszka@siemens.com
      Reviewed-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      2f303b74
    • Gleb Natapov's avatar
      KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2 · feaf0c7d
      Gleb Natapov authored
      
      
      If #PF happens during delivery of an exception into L2 and L1 also do
      not have the page mapped in its shadow page table then L0 needs to
      generate vmexit to L2 with original event in IDT_VECTORING_INFO, but
      current code combines both exception and generates #DF instead. Fix that
      by providing nVMX specific function to handle page faults during page
      table walk that handles this case correctly.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      feaf0c7d
    • Gleb Natapov's avatar
      KVM: nVMX: Check all exceptions for intercept during delivery to L2 · e011c663
      Gleb Natapov authored
      
      
      All exceptions should be checked for intercept during delivery to L2,
      but we check only #PF currently. Drop nested_run_pending while we are
      at it since exception cannot be injected during vmentry anyway.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      [Renamed the nested_vmx_check_exception function. - Paolo]
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e011c663
    • Gleb Natapov's avatar
      KVM: nVMX: Do not put exception that caused vmexit to IDT_VECTORING_INFO · 851eb667
      Gleb Natapov authored
      
      
      If an exception causes vmexit directly it should not be reported in
      IDT_VECTORING_INFO during the exit. For that we need to be able to
      distinguish between exception that is injected into nested VM and one that
      is reinjected because its delivery failed. Fortunately we already have
      mechanism to do so for nested SVM, so here we just use correct function
      to requeue exceptions and make sure that reinjected exception is not
      moved to IDT_VECTORING_INFO during vmexit emulation and not re-checked
      for interception during delivery.
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      851eb667