- May 22, 2011
-
-
OGAWA Hirofumi authored
Like the following, mmu_notifier can be called after registering immediately. So, kvm have to initialize kvm->mmu_lock before it. BUG: spinlock bad magic on CPU#0, kswapd0/342 lock: ffff8800af8c4000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 Pid: 342, comm: kswapd0 Not tainted 2.6.39-rc5+ #1 Call Trace: [<ffffffff8118ce61>] spin_bug+0x9c/0xa3 [<ffffffff8118ce91>] do_raw_spin_lock+0x29/0x13c [<ffffffff81024923>] ? flush_tlb_others_ipi+0xaf/0xfd [<ffffffff812e22f3>] _raw_spin_lock+0x9/0xb [<ffffffffa0582325>] kvm_mmu_notifier_clear_flush_young+0x2c/0x66 [kvm] [<ffffffff810d3ff3>] __mmu_notifier_clear_flush_young+0x2b/0x57 [<ffffffff810c8761>] page_referenced_one+0x88/0xea [<ffffffff810c89bf>] page_referenced+0x1fc/0x256 [<ffffffff810b2771>] shrink_page_list+0x187/0x53a [<ffffffff810b2ed7>] shrink_inactive_list+0x1e0/0x33d [<ffffffff810acf95>] ? determine_dirtyable_memory+0x15/0x27 [<ffffffff812e90ee>] ? call_function_single_interrupt+0xe/0x20 [<ffffffff810b3356>] shrink_zone+0x322/0x3de [<ffffffff810a9587>] ? zone_watermark_ok_safe+0xe2/0xf1 [<ffffffff810b3928>] kswapd+0x516/0x818 [<ffffffff810b3412>] ? shrink_zone+0x3de/0x3de [<ffffffff81053d17>] kthread+0x7d/0x85 [<ffffffff812e9394>] kernel_thread_helper+0x4/0x10 [<ffffffff81053c9a>] ? __init_kthread_worker+0x37/0x37 [<ffffffff812e9390>] ? gs_change+0xb/0xb Signed-off-by:
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
This way, we can avoid checking the user space address many times when we read the guest memory. Although we can do the same for write if we check which slots are writable, we do not care write now: reading the guest memory happens more often than writing. [avi: change VERIFY_READ to VERIFY_WRITE] Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Liu Yuan authored
Function ioapic_debug() in the ioapic_deliver() misnames one filed by reference. This patch correct it. Signed-off-by:
Liu Yuan <tailai.ly@taobao.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- May 11, 2011
-
-
Xiao Guangrong authored
We can get memslot id from memslot->id directly Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Apr 06, 2011
-
-
Gleb Natapov authored
If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to avoid sleeping on IO. Check for hwpoison is done at the same time, otherwise check_user_page_hwpoison() will call GUP again and will put vcpu to sleep. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Michael S. Tsirkin authored
irqfd in kvm used flush_work incorrectly: it assumed that work scheduled previously can't run after flush_work, but since kvm uses a non-reentrant workqueue (by means of schedule_work) we need flush_work_sync to get that guarantee. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reported-by:
Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr> Tested-by:
Jean-Philippe Menil <jean-philippe.menil@univ-nantes.fr> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Mar 31, 2011
-
-
Lucas De Marchi authored
Fixes generated by 'codespell' and manually reviewed. Signed-off-by:
Lucas De Marchi <lucas.demarchi@profusion.mobi>
-
- Mar 23, 2011
-
-
Akinobu Mita authored
As a preparation for removing ext2 non-atomic bit operations from asm/bitops.h. This converts ext2 non-atomic bit operations to little-endian bit operations. Signed-off-by:
Akinobu Mita <akinobu.mita@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Akinobu Mita authored
asm-generic/bitops/le.h is only intended to be included directly from asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h which implements generic ext2 or minix bit operations. This stops including asm-generic/bitops/le.h directly and use ext2 non-atomic bit operations instead. It seems odd to use ext2_set_bit() on kvm, but it will replaced with __set_bit_le() after introducing little endian bit operations for all architectures. This indirect step is necessary to maintain bisectability for some architectures which have their own little-endian bit operations. Signed-off-by:
Akinobu Mita <akinobu.mita@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Rafael J. Wysocki authored
KVM uses a sysdev class and a sysdev for executing kvm_suspend() after interrupts have been turned off on the boot CPU (during system suspend) and for executing kvm_resume() before turning on interrupts on the boot CPU (during system resume). However, since both of these functions ignore their arguments, the entire mechanism may be replaced with a struct syscore_ops object which is simpler. Signed-off-by:
Rafael J. Wysocki <rjw@sisk.pl> Acked-by:
Avi Kivity <avi@redhat.com>
-
- Mar 17, 2011
-
-
Michael S. Tsirkin authored
The RCU use in kvm_irqfd_deassign is tricky: we have rcu_assign_pointer but no synchronize_rcu: synchronize_rcu is done by kvm_irq_routing_update which we share a spinlock with. Fix up a comment in an attempt to make this clearer. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Jan Kiszka authored
Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Rik van Riel authored
Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic slowdowns of certain workloads, we instead use yield_to to get another VCPU in the same KVM guest to run sooner. This seems to give a 10-15% speedup in certain workloads. Signed-off-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Rik van Riel authored
Keep track of which task is running a KVM vcpu. This helps us figure out later what task to wake up if we want to boost a vcpu that got preempted. Unfortunately there are no guarantees that the same task always keeps the same vcpu, so we can only track the task across a single "run" of the vcpu. Signed-off-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Huang Ying authored
is_hwpoison_address only checks whether the page table entry is hwpoisoned, regardless the memory page mapped. While __get_user_pages will check both. QEMU will clear the poisoned page table entry (via unmap/map) to make it possible to allocate a new memory page for the virtual address across guest rebooting. But it is also possible that the underlying memory page is kept poisoned even after the corresponding page table entry is cleared, that is, a new memory page can not be allocated. __get_user_pages can catch these situations. Signed-off-by:
Huang Ying <ying.huang@intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiao Guangrong authored
Now, we have 'vcpu->mode' to judge whether need to send ipi to other cpus, this way is very exact, so checking request bit is needless, then we can drop the spinlock let it's collateral Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Heiko Carstens authored
Get rid of this warning: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used The only caller of the function is within a !CONFIG_S390 section, so add the same ifdef around kvm_create_dirty_bitmap() as well. Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Avi Kivity authored
Instead, drop large mappings, which were the reason we dropped shadow. Signed-off-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Jan 13, 2011
-
-
Andrea Arcangeli authored
Cleanup some code with common compound_trans_head helper. Signed-off-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
For GRU and EPT, we need gup-fast to set referenced bit too (this is why it's correct to return 0 when shadow_access_mask is zero, it requires gup-fast to set the referenced bit). qemu-kvm access already sets the young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow paging EPT minor fault we relay on gup-fast to signal the page is in use... We also need to check the young bits on the secondary pagetables for NPT and not nested shadow mmu as the data may never get accessed again by the primary pte. Without this closer accuracy, we'd have to remove the heuristic that avoids collapsing hugepages in hugepage virtual regions that have not even a single subpage in use. ->test_young is full backwards compatible with GRU and other usages that don't have young bits in pagetables set by the hardware and that should nuke the secondary mmu mappings when ->clear_flush_young runs just like EPT does. Removing the heuristic that checks the young bit in khugepaged/collapse_huge_page completely isn't so bad either probably but I thought it was worth it and this makes it reliable. Signed-off-by:
Andrea Arcangeli <aarcange@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrea Arcangeli authored
This should work for both hugetlbfs and transparent hugepages. [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability] Signed-off-by:
Andrea Arcangeli <aarcange@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 12, 2011
-
-
Avi Kivity authored
Since vmx blocks INIT signals, we disable virtualization extensions during reboot. This leads to virtualization instructions faulting; we trap these faults and spin while the reboot continues. Unfortunately spinning on a non-preemptible kernel may block a task that reboot depends on; this causes the reboot to hang. Fix by skipping over the instruction and hoping for the best. Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Quote from Avi: | I don't think we need to flush immediately; set a "tlb dirty" bit somewhere | that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page() | can consult the bit and force a flush if set. Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Michael S. Tsirkin authored
Store irq routing table pointer in the irqfd object, and use that to inject MSI directly without bouncing out to a kernel thread. While we touch this structure, rearrange irqfd fields to make fastpath better packed for better cache utilization. This also adds some comments about locking rules and rcu usage in code. Some notes on the design: - Use pointer into the rt instead of copying an entry, to make it possible to use rcu, thus side-stepping locking complexities. We also save some memory this way. - Old workqueue code is still used for level irqs. I don't think we DTRT with level anyway, however, it seems easier to keep the code around as it has been thought through and debugged, and fix level later than rip out and re-instate it later. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Acked-by:
Marcelo Tosatti <mtosatti@redhat.com> Acked-by:
Gregory Haskins <ghaskins@novell.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
The naming convension of hardware_[dis|en]able family is little bit confusing because only hardware_[dis|en]able_all are using _nolock suffix. Renaming current hardware_[dis|en]able() to *_nolock() and using hardware_[dis|en]able() as wrapper functions which take kvm_lock for them reduces extra confusion. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Takuya Yoshikawa authored
In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock. This patch adds missing protection for CPU_DYING case. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
Any arch not supporting device assigment will also not build assigned-dev.c. So testing for KVM_CAP_DEVICE_DEASSIGNMENT is pointless. KVM_CAP_ASSIGN_DEV_IRQ is unconditinally set. Moreover, add a default case for dispatching the ioctl. Acked-by:
Alex Williamson <alex.williamson@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
The guest may change states that pci_reset_function does not touch. So we better save/restore the assigned device across guest usage. Acked-by:
Alex Williamson <alex.williamson@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
Cosmetic change, but it helps to correlate IRQs with PCI devices. Acked-by:
Alex Williamson <alex.williamson@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
This improves the IRQ forwarding for assigned devices: By using the kernel's threaded IRQ scheme, we can get rid of the latency-prone work queue and simplify the code in the same run. Moreover, we no longer have to hold assigned_dev_lock while raising the guest IRQ, which can be a lenghty operation as we may have to iterate over all VCPUs. The lock is now only used for synchronizing masking vs. unmasking of INTx-type IRQs, thus is renames to intx_lock. Acked-by:
Alex Williamson <alex.williamson@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
When we deassign a guest IRQ, clear the potentially asserted guest line. There might be no chance for the guest to do this, specifically if we switch from INTx to MSI mode. Acked-by:
Alex Williamson <alex.williamson@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Jan Kiszka authored
IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Jan Kiszka authored
Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
In kvm_async_pf_wakeup_all(), we add a dummy apf to vcpu->async_pf.done without holding vcpu->async_pf.lock, it will break if we are handling apfs at this time. Also use 'list_empty_careful()' instead of 'list_empty()' Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Xiao Guangrong authored
If it's no need to inject async #PF to PV guest we can handle more completed apfs at one time, so we can retry guest #PF as early as possible Signed-off-by:
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Takuya Yoshikawa authored
Let's use newly introduced vzalloc(). Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Jesper Juhl <jj@chaosbits.net> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Heiko Carstens authored
Fixes this: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_dev_ioctl_create_vm': arch/s390/kvm/../../../virt/kvm/kvm_main.c:1828:10: warning: unused variable 'r' Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Heiko Carstens authored
Fixes this: CC arch/s390/kvm/../../../virt/kvm/kvm_main.o arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page': arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int' Signed-off-by:
Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Takuya Yoshikawa authored
Currently we are using vmalloc() for all dirty bitmaps even if they are small enough, say less than K bytes. We use kmalloc() if dirty bitmap size is less than or equal to PAGE_SIZE so that we can avoid vmalloc area usage for VGA. This will also make the logging start/stop faster. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-