- 03 Nov, 2013 1 commit
-
-
Paolo Bonzini authored
When I was looking at RHEL5.9's failure to start with unrestricted_guest=0/emulate_invalid_guest_state=1, I got it working with a slightly older tree than kvm.git. I now debugged the remaining failure, which was introduced by commit 660696d1 (KVM: X86 emulator: fix source operand decoding for 8bit mov[zs]x instructions, 2013-04-24) introduced a similar mis-emulation to the one in commit 8acb4207 (KVM: fix sil/dil/bpl/spl in the mod/rm fields, 2013-05-30). The incorrect decoding occurs in 8-bit movzx/movsx instructions whose 8-bit operand is sil/dil/bpl/spl. Needless to say, "movzbl %bpl, %eax" does occur in RHEL5.9's decompression prolog, just a handful of instructions before finally giving control to the decompressed vmlinux and getting out of the invalid guest state. Because OpMem8 bypasses decode_modrm, the same handling of the REX prefix must be applied to OpMem8. Reported-by:
Michele Baldessari <michele@redhat.com> Cc: stable@vger.kernel.org Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 31 Oct, 2013 4 commits
-
-
Paolo Bonzini authored
Yet another instruction that we fail to emulate, this time found in Windows 2008R2 32-bit. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Michael S. Tsirkin authored
mst can't be blamed for lack of switch entries: the issue is with msrs actually. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The loop was always using 0 as the index. This means that any rubbish after the first element of the array went undetected. It seems reasonable to assume that no KVM userspace did that. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
The KVM_SET_XCRS ioctl must accept anything that KVM_GET_XCRS could return. XCR0's bit 0 is always 1 in real processors with XSAVE, and KVM_GET_XCRS will always leave bit 0 set even if the emulated processor does not have XSAVE. So, KVM_SET_XCRS must ignore that bit when checking for attempts to enable unsupported save states. Reviewed-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 30 Oct, 2013 8 commits
-
-
Alex Williamson authored
We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alex Williamson authored
Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Alex Williamson authored
So far we've succeeded at making KVM and VFIO mostly unaware of each other, but areas are cropping up where a connection beyond eventfds and irqfds needs to be made. This patch introduces a KVM-VFIO device that is meant to be a gateway for such interaction. The user creates the device and can add and remove VFIO groups to it via file descriptors. When a group is added, KVM verifies the group is valid and gets a reference to it via the VFIO external user interface. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
This basically came from the need to be able to boot 32-bit Atom SMP guests on an AMD host, i.e. a host which doesn't support MOVBE. As a matter of fact, qemu has since recently received MOVBE support but we cannot share that with kvm emulation and thus we have to do this in the host. We're waay faster in kvm anyway. :-) So, we piggyback on the #UD path and emulate the MOVBE functionality. With it, an 8-core SMP guest boots in under 6 seconds. Also, requesting MOVBE emulation needs to happen explicitly to work, i.e. qemu -cpu n270,+movbe... Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+ kernel in kvm executes MOVBE ~60K times. Signed-off-by:
Andre Przywara <andre@andrep.de> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add initial support for handling three-byte instructions in the emulator. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Call it EmulateOnUD which is exactly what we're trying to do with vendor-specific instructions. Rename ->only_vendor_specific_insn to something shorter, while at it. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add a field to the current emulation context which contains the instruction opcode length. This will streamline handling of opcodes of different length. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Borislav Petkov authored
Add a kvm ioctl which states which system functionality kvm emulates. The format used is that of CPUID and we return the corresponding CPUID bits set for which we do emulate functionality. Make sure ->padding is being passed on clean from userspace so that we can use it for something in the future, after the ioctl gets cast in stone. s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at it. Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 28 Oct, 2013 3 commits
-
-
Jan Kiszka authored
If the host supports it, we can and should expose it to the guest as well, just like we already do with PIN_BASED_VIRTUAL_NMIS. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
__vmx_complete_interrupts stored uninjected NMIs in arch.nmi_injected, not arch.nmi_pending. So we actually need to check the former field in vmcs12_save_pending_event. This fixes the eventinj unit test when run in nested KVM. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Jan Kiszka authored
As long as the hardware provides us 2MB EPT pages, we can also expose them to the guest because our shadow EPT code already supports this feature. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 17 Oct, 2013 4 commits
-
-
Christoffer Dall authored
Support transparent huge pages in KVM/ARM and KVM/ARM64. The transparent_hugepage_adjust is not very pretty, but this is also how it's solved on x86 and seems to be simply an artifact on how THPs behave. This should eventually be shared across architectures if possible, but that can always be changed down the road. Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Christoffer Dall authored
Support huge pages in KVM/ARM and KVM/ARM64. The pud_huge checking on the unmap path may feel a bit silly as the pud_huge check is always defined to false, but the compiler should be smart about this. Note: This deals only with VMAs marked as huge which are allocated by users through hugetlbfs only. Transparent huge pages can only be detected by looking at the underlying pages (or the page tables themselves) and this patch so far simply maps these on a page-by-page level in the Stage-2 page tables. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Christoffer Dall authored
Update comments to reflect what is really going on and add the TWE bit to the comments in kvm_arm.h. Also renames the function to kvm_handle_wfx like is done on arm64 for consistency and uber-correctness. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
Marc Zyngier authored
On an (even slightly) oversubscribed system, spinlocks are quickly becoming a bottleneck, as some vcpus are spinning, waiting for a lock to be released, while the vcpu holding the lock may not be running at all. This creates contention, and the observed slowdown is 40x for hackbench. No, this isn't a typo. The solution is to trap blocking WFEs and tell KVM that we're now spinning. This ensures that other vpus will get a scheduling boost, allowing the lock to be released more quickly. Also, using CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance when the VM is severely overcommited. Quick test to estimate the performance: hackbench 1 process 1000 2xA15 host (baseline): 1.843s 2xA15 guest w/o patch: 2.083s 4xA15 guest w/o patch: 80.212s 8xA15 guest w/o patch: Could not be bothered to find out 2xA15 guest w/ patch: 2.102s 4xA15 guest w/ patch: 3.205s 8xA15 guest w/ patch: 6.887s So we go from a 40x degradation to 1.5x in the 2x overcommit case, which is vaguely more acceptable. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-
- 15 Oct, 2013 3 commits
-
-
Raghavendra K T authored
We use jump label to enable pv-spinlock. With the changes in (442e0973 Merge branch 'x86/jumplabel'), the jump label behaviour has changed that would result in eventual hang of the VM since we would end up in a situation where slow path locks would halt the vcpus but we will not be able to wakeup the vcpu by lock releaser using unlock kick. Similar problem in Xen and more detailed description is available in a945928e (xen: Do not enable spinlocks before jump_label_init() has executed) This patch splits kvm_spinlock_init to separate jump label changes with pvops patching and also make jump label enabling after jump_label_init(). Signed-off-by:
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Steven Rostedt <rostedt@goodmis.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
chai wen authored
Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by:
chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Marek Szyprowski authored
This reverts commit 10bcdfb8 . There is no consensus on the bindings for the reserved memory, so the code for handing it will be reverted. Signed-off-by:
Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by:
Grant Likely <grant.likely@linaro.org>
-
- 14 Oct, 2013 7 commits
-
-
Christoffer Dall authored
Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Christoffer Dall authored
Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Christoffer Dall authored
Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Christoffer Dall authored
Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Christoffer Dall authored
Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Christoffer Dall authored
The KVM_HPAGE_DEFINES are a little artificial on ARM, since the huge page size is statically defined at compile time and there is only a single huge page size. Now when the main kvm code relying on these defines has been moved to the x86 specific part of the world, we can get rid of these. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
Christoffer Dall authored
The gfn_to_index function relies on huge page defines which either may not make sense on systems that don't support huge pages or are defined in an unconvenient way for other architectures. Since this is x86-specific, move the function to arch/x86/include/asm/kvm_host.h. Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by:
Gleb Natapov <gleb@redhat.com>
-
- 13 Oct, 2013 9 commits
-
-
AKASHI Takahiro authored
In ftrace_syscall_enter(), syscall_get_arguments(..., 0, n, ...) if (i == 0) { <handle ORIG_r0> ...; n--;} memcpy(..., n * sizeof(args[0])); If 'number of arguments(n)' is zero and 'argument index(i)' is also zero in syscall_get_arguments(), none of arguments should be copied by memcpy(). Otherwise 'n--' can be a big positive number and unexpected amount of data will be copied. Tracing system calls which take no argument, say sync(void), may hit this case and eventually make the system corrupted. This patch fixes the issue both in syscall_get_arguments() and syscall_set_arguments(). Cc: <stable@vger.kernel.org> Acked-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by:
Will Deacon <will.deacon@arm.com> Signed-off-by:
Russell King <rmk+kernel@arm.linux.org.uk>
-
Yuvaraj Kumar C D authored
Without the "clock-frequency" property in arch timer node, could able to see the below crash dump. [<c0014e28>] (unwind_backtrace+0x0/0xf4) from [<c0011808>] (show_stack+0x10/0x14) [<c0011808>] (show_stack+0x10/0x14) from [<c036ac1c>] (dump_stack+0x7c/0xb0) [<c036ac1c>] (dump_stack+0x7c/0xb0) from [<c01ab760>] (Ldiv0_64+0x8/0x18) [<c01ab760>] (Ldiv0_64+0x8/0x18) from [<c0062f60>] (clockevents_config.part.2+0x1c/0x74) [<c0062f60>] (clockevents_config.part.2+0x1c/0x74) from [<c0062fd8>] (clockevents_config_and_register+0x20/0x2c) [<c0062fd8>] (clockevents_config_and_register+0x20/0x2c) from [<c02b8e8c>] (arch_timer_setup+0xa8/0x134) [<c02b8e8c>] (arch_timer_setup+0xa8/0x134) from [<c04b47b4>] (arch_timer_init+0x1f4/0x24c) [<c04b47b4>] (arch_timer_init+0x1f4/0x24c) from [<c04b40d8>] (clocksource_of_init+0x34/0x58) [<c04b40d8>] (clocksource_of_init+0x34/0x58) from [<c049ed8c>] (time_init+0x20/0x2c) [<c049ed8c>] (time_init+0x20/0x2c) from [<c049b95c>] (start_kernel+0x1e0/0x39c) THis is because the Exynos u-boot, for example on the Chromebooks, doesn't set up the CNTFRQ register as expected by arch_timer. Instead, we have to specify the frequency in the device tree like this. Signed-off-by:
Yuvaraj Kumar C D <yuvaraj.cd@samsung.com> [olof: Changed subject, added comment, elaborated on commit message] Signed-off-by:
Olof Johansson <olof@lixom.net>
-
Helge Deller authored
Signed-off-by:
Helge Deller <deller@gmx.de>
-
John David Anglin authored
The attached change defers the initialization of the variables tsk, mm and flags until they are needed. As a result, the code won't crash if a kernel probe is done with a corrupt context and the code will be better optimized. Signed-off-by:
John David Anglin <dave.anglin@bell.net> Signed-off-by:
Helge Deller <deller@gmx.de>
-
Helge Deller authored
Running an "echo t > /proc/sysrq-trigger" crashes the parisc kernel. The problem is, that in print_worker_info() we try to read the workqueue info via the probe_kernel_read() functions which use pagefault_disable() to avoid crashes like this: probe_kernel_read(&pwq, &worker->current_pwq, sizeof(pwq)); probe_kernel_read(&wq, &pwq->wq, sizeof(wq)); probe_kernel_read(name, wq->name, sizeof(name) - 1); The problem here is, that the first probe_kernel_read(&pwq) might return zero in pwq and as such the following probe_kernel_reads() try to access contents of the page zero which is read protected and generate a kernel segfault. With this patch we fix the interruption handler to call parisc_terminate() directly only if pagefault_disable() was not called (in which case preempt_count()==0). Otherwise we hand over to the pagefault handler which will try to look up the faulting address in the fixup tables. Signed-off-by:
Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v3.0+ Signed-off-by:
John David Anglin <dave.anglin@bell.net> Signed-off-by:
Helge Deller <deller@gmx.de>
-
Helge Deller authored
Signed-off-by:
Helge Deller <deller@gmx.de>
-
Helge Deller authored
Signed-off-by:
Helge Deller <deller@gmx.de>
-
Jiang Liu authored
Commit 9a46ad6d "smp: make smp_call_function_many() use logic similar to smp_call_function_single()" has unified the way to handle single and multiple cross-CPU function calls. Now only one interrupt is needed for architecture specific code to support generic SMP function call interfaces, so kill the redundant single function call interrupt. Signed-off-by:
Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Signed-off-by:
Helge Deller <deller@gmx.de>
-
Geert Uytterhoeven authored
ERROR: "flush_cache_page" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined! Signed-off-by:
Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by:
Helge Deller <deller@gmx.de>
-
- 12 Oct, 2013 1 commit
-
-
Jonathan Austin authored
This patch adds support for running Cortex-A7 guests on Cortex-A7 hosts. As Cortex-A7 is architecturally compatible with A15, this patch is largely just generalising existing code. Areas where 'implementation defined' behaviour is identical for A7 and A15 is moved to allow it to be used by both cores. The check to ensure that coprocessor register tables are sorted correctly is also moved in to 'common' code to avoid each new cpu doing its own check (and possibly forgetting to do so!) Signed-off-by:
Jonathan Austin <jonathan.austin@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@linaro.org>
-