1. 27 Jan, 2013 1 commit
    • Frederic Weisbecker's avatar
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker authored
      
      
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
  2. 06 Jan, 2013 1 commit
  3. 03 Jan, 2013 1 commit
    • Greg Kroah-Hartman's avatar
      POWERPC: drivers: remove __dev* attributes. · cad5cef6
      Greg Kroah-Hartman authored
      
      
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad5cef6
  4. 02 Jan, 2013 2 commits
    • Gabor Juhos's avatar
      powerpc: Add missing NULL terminator to avoid boot panic on PPC40x · e6449c9b
      Gabor Juhos authored
      The missing NULL terminator can cause a panic on
      PPC405 boards during boot:
      
        Linux/PowerPC load: console=ttyS0,115200 root=/dev/mtdblock1 rootfstype=squashfs,jffs2 noinitrd init=/etc/preinit
        Finalizing device tree... flat tree at 0x6a5160
        bootconsole [udbg0] enabled
        Page fault in user mode with in_atomic() = 1 mm = (null)
        NIP = c0275f50  MSR = fffffffe
        Oops: Weird page fault, sig: 11 [#1]
        PowerPC 40x Platform
        Modules linked in:
        NIP: c0275f50 LR: c0275f60 CTR: c0280000
        REGS: c0275eb0 TRAP: 636f7265   Not tainted  (3.7.1)
        MSR: fffffffe <VEC,VSX,EE,PR,FP,ME,SE,BE,IR,DR,PMM,RI> CR: c06a6190  XER: 00000001
        TASK = c02662a8[0] 'swapper' THREAD: c0274000
        GPR00: c0275ec0 c000c658 c027c4bf 00000000 c0275ee0 c000a0ec c020a1a8 c020a1f0
        GPR08: c020f631 c020f404 c025f078 c025f080 c0275f10
         Call Trace:
         ---[ end trace 31fd0ba7d8756001 ]---
      
        Kernel panic - not syncing: Attempted to kill the idle task!
      
      The panic happens since commit 9597abe0
      (sections: fix section conflicts in arch/powerpc), however the root
      cause of this is that the NULL terminator were not added in commit
      a4f740cf
      
       (of/flattree: Add of_flat_dt_match()
      helper function).
      
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGabor Juhos <juhosg@openwrt.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e6449c9b
    • Shan Hai's avatar
      powerpc/vdso: Remove redundant locking in update_vsyscall_tz() · ce73ec6d
      Shan Hai authored
      The locking in update_vsyscall_tz() is not only unnecessary because the vdso
      code copies the data unproteced in __kernel_gettimeofday() but also
      introduces a hard to reproduce race condition between update_vsyscall()
      and update_vsyscall_tz(), which causes user space process to loop
      forever in vdso code.
      
      The following patch removes the locking from update_vsyscall_tz().
      
      Locking is not only unnecessary because the vdso code copies the data
      unprotected in __kernel_gettimeofday() but also erroneous because updating
      the tb_update_count is not atomic and introduces a hard to reproduce race
      condition between update_vsyscall() and update_vsyscall_tz(), which further
      causes user space process to loop forever in vdso code.
      
      The below scenario describes the race condition,
      x==0	Boot CPU			other CPU
      	proc_P: x==0
      	    timer interrupt
      		update_vsyscall
      x==1		    x++;sync		settimeofday
      					    update_vsyscall_tz
      x==2						x++;sync
      x==3		    sync;x++
      						sync;x++
      	proc_P: x==3 (loops until x becomes even)
      
      Because the ++ operator would be implemented as three instructions and not
      atomic on powerpc.
      
      A similar change was made for x86 in commit 6c260d58
      
      
      ("x86: vdso: Remove bogus locking in update_vsyscall_tz")
      Signed-off-by: default avatarShan Hai <shan.hai@windriver.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ce73ec6d
  5. 20 Dec, 2012 2 commits
  6. 19 Dec, 2012 2 commits
  7. 17 Dec, 2012 2 commits
  8. 13 Dec, 2012 1 commit
  9. 12 Dec, 2012 1 commit
  10. 11 Dec, 2012 2 commits
  11. 06 Dec, 2012 1 commit
  12. 05 Dec, 2012 24 commits
    • Mihai Caraman's avatar
      KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface · 352df1de
      Mihai Caraman authored
      
      
      Implement ONE_REG interface for EPCR register adding KVM_REG_PPC_EPCR to
      the list of ONE_REG PPC supported registers.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      [agraf: remove HV dependency, use get/put_user]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      352df1de
    • Mihai Caraman's avatar
      KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation · 38f98824
      Mihai Caraman authored
      
      
      Add EPCR support in booke mtspr/mfspr emulation. EPCR register is defined only
      for 64-bit and HV categories, we will expose it at this point only to 64-bit
      virtual processors running on 64-bit HV hosts.
      Define a reusable setter function for vcpu's EPCR.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      [agraf: move HV dependency in the code]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      38f98824
    • Mihai Caraman's avatar
      KVM: PPC: bookehv: Add guest computation mode for irq delivery · 95e90b43
      Mihai Caraman authored
      
      
      When delivering guest IRQs, update MSR computation mode according to guest
      interrupt computation mode found in EPCR.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      [agraf: remove HV dependency in the code]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      95e90b43
    • Alexander Graf's avatar
      KVM: PPC: Make EPCR a valid field for booke64 and bookehv · 62b4db00
      Alexander Graf authored
      
      
      In BookE, EPCR is defined and valid when either the HV or the 64bit
      category are implemented. Reflect this in the field definition.
      
      Today the only KVM target on 64bit is HV enabled, so there is no
      change in actual source code, but this keeps the code closer to the
      spec and doesn't build up artificial road blocks for a PR KVM
      on 64bit.
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      62b4db00
    • Mihai Caraman's avatar
      KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit · e9666ea1
      Mihai Caraman authored
      
      
      Extend MAS2 EPN mask to retain most significant bits on 64-bit hosts.
      Use this mask in tlb effective address accessor.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e9666ea1
    • Mihai Caraman's avatar
      KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation · 9e2fa646
      Mihai Caraman authored
      
      
      Mask high 32 bits of MAS2's effective page number in tlbwe emulation for guests
      running in 32-bit mode.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      9e2fa646
    • Mihai Caraman's avatar
      KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation · 8823a8fd
      Mihai Caraman authored
      
      
      Mask high 32 bits of effective address in emulation layer for guests running
      in 32-bit mode.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      [agraf: fix indent]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      8823a8fd
    • Mihai Caraman's avatar
      KVM: PPC: e500: Add emulation helper for getting instruction ea · 7cdd7a95
      Mihai Caraman authored
      
      
      Add emulation helper for getting instruction ea and refactor tlb instruction
      emulation to use it.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      [agraf: keep rt variable around]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      7cdd7a95
    • Mihai Caraman's avatar
      KVM: PPC: bookehv64: Add support for interrupt handling · e51f8f32
      Mihai Caraman authored
      
      
      Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit
      implementations using a common stack layout and a common execution flow starting
      from kvm_handler_common macro. Update documentation for 64-bit input register
      values. This patch only address the bolted TLB miss exception handlers version.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      e51f8f32
    • Mihai Caraman's avatar
      KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler · ff594746
      Mihai Caraman authored
      
      
      GET_VCPU define will not be implemented for 64-bit for performance reasons
      so get rid of it also on 32-bit.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      ff594746
    • Mihai Caraman's avatar
      KVM: PPC: booke: Fix get_tb() compile error on 64-bit · b50df19c
      Mihai Caraman authored
      
      
      Include header file for get_tb() declaration.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      b50df19c
    • Mihai Caraman's avatar
      KVM: PPC: e500: Silence bogus GCC warning in tlb code · 910040b8
      Mihai Caraman authored
      
      
      64-bit GCC 4.5.1 warns about an uninitialized variable which was guarded
      by a flag. Initialize the variable to make it happy.
      Signed-off-by: default avatarMihai Caraman <mihai.caraman@freescale.com>
      [agraf: reword comment]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      910040b8
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking · b4072df4
      Paul Mackerras authored
      
      
      Currently, if a machine check interrupt happens while we are in the
      guest, we exit the guest and call the host's machine check handler,
      which tends to cause the host to panic.  Some machine checks can be
      triggered by the guest; for example, if the guest creates two entries
      in the SLB that map the same effective address, and then accesses that
      effective address, the CPU will take a machine check interrupt.
      
      To handle this better, when a machine check happens inside the guest,
      we call a new function, kvmppc_realmode_machine_check(), while still in
      real mode before exiting the guest.  On POWER7, it handles the cases
      that the guest can trigger, either by flushing and reloading the SLB,
      or by flushing the TLB, and then it delivers the machine check interrupt
      directly to the guest without going back to the host.  On POWER7, the
      OPAL firmware patches the machine check interrupt vector so that it
      gets control first, and it leaves behind its analysis of the situation
      in a structure pointed to by the opal_mc_evt field of the paca.  The
      kvmppc_realmode_machine_check() function looks at this, and if OPAL
      reports that there was no error, or that it has handled the error, we
      also go straight back to the guest with a machine check.  We have to
      deliver a machine check to the guest since the machine check interrupt
      might have trashed valid values in SRR0/1.
      
      If the machine check is one we can't handle in real mode, and one that
      OPAL hasn't already handled, or on PPC970, we exit the guest and call
      the host's machine check handler.  We do this by jumping to the
      machine_check_fwnmi label, rather than absolute address 0x200, because
      we don't want to re-execute OPAL's handler on POWER7.  On PPC970, the
      two are equivalent because address 0x200 just contains a branch.
      
      Then, if the host machine check handler decides that the system can
      continue executing, kvmppc_handle_exit() delivers a machine check
      interrupt to the guest -- once again to let the guest know that SRR0/1
      have been modified.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      [agraf: fix checkpatch warnings]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      b4072df4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations · 1b400ba0
      Paul Mackerras authored
      
      
      When we change or remove a HPT (hashed page table) entry, we can do
      either a global TLB invalidation (tlbie) that works across the whole
      machine, or a local invalidation (tlbiel) that only affects this core.
      Currently we do local invalidations if the VM has only one vcpu or if
      the guest requests it with the H_LOCAL flag, though the guest Linux
      kernel currently doesn't ever use H_LOCAL.  Then, to cope with the
      possibility that vcpus moving around to different physical cores might
      expose stale TLB entries, there is some code in kvmppc_hv_entry to
      flush the whole TLB of entries for this VM if either this vcpu is now
      running on a different physical core from where it last ran, or if this
      physical core last ran a different vcpu.
      
      There are a number of problems on POWER7 with this as it stands:
      
      - The TLB invalidation is done per thread, whereas it only needs to be
        done per core, since the TLB is shared between the threads.
      - With the possibility of the host paging out guest pages, the use of
        H_LOCAL by an SMP guest is dangerous since the guest could possibly
        retain and use a stale TLB entry pointing to a page that had been
        removed from the guest.
      - The TLB invalidations that we do when a vcpu moves from one physical
        core to another are unnecessary in the case of an SMP guest that isn't
        using H_LOCAL.
      - The optimization of using local invalidations rather than global should
        apply to guests with one virtual core, not just one vcpu.
      
      (None of this applies on PPC970, since there we always have to
      invalidate the whole TLB when entering and leaving the guest, and we
      can't support paging out guest memory.)
      
      To fix these problems and simplify the code, we now maintain a simple
      cpumask of which cpus need to flush the TLB on entry to the guest.
      (This is indexed by cpu, though we only ever use the bits for thread
      0 of each core.)  Whenever we do a local TLB invalidation, we set the
      bits for every cpu except the bit for thread 0 of the core that we're
      currently running on.  Whenever we enter a guest, we test and clear the
      bit for our core, and flush the TLB if it was set.
      
      On initial startup of the VM, and when resetting the HPT, we set all the
      bits in the need_tlb_flush cpumask, since any core could potentially have
      stale TLB entries from the previous VM to use the same LPID, or the
      previous contents of the HPT.
      
      Then, we maintain a count of the number of online virtual cores, and use
      that when deciding whether to use a local invalidation rather than the
      number of online vcpus.  The code to make that decision is extracted out
      into a new function, global_invalidates().  For multi-core guests on
      POWER7 (i.e. when we are using mmu notifiers), we now never do local
      invalidations regardless of the H_LOCAL flag.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      1b400ba0
    • Paul Mackerras's avatar
      KVM: PPC: Book3S PR: MSR_DE doesn't exist on Book 3S · 3a2e7b0d
      Paul Mackerras authored
      
      
      The mask of MSR bits that get transferred from the guest MSR to the
      shadow MSR included MSR_DE.  In fact that bit only exists on Book 3E
      processors, and it is assigned the same bit used for MSR_BE on Book 3S
      processors.  Since we already had MSR_BE in the mask, this just removes
      MSR_DE.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      3a2e7b0d
    • Paul Mackerras's avatar
      KVM: PPC: Book3S PR: Fix VSX handling · 28c483b6
      Paul Mackerras authored
      
      
      This fixes various issues in how we were handling the VSX registers
      that exist on POWER7 machines.  First, we were running off the end
      of the current->thread.fpr[] array.  Ultimately this was because the
      vcpu->arch.vsr[] array is sized to be able to store both the FP
      registers and the extra VSX registers (i.e. 64 entries), but PR KVM
      only uses it for the extra VSX registers (i.e. 32 entries).
      
      Secondly, calling load_up_vsx() from C code is a really bad idea,
      because it jumps to fast_exception_return at the end, rather than
      returning with a blr instruction.  This was causing it to jump off
      to a random location with random register contents, since it was using
      the largely uninitialized stack frame created by kvmppc_load_up_vsx.
      
      In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx,
      since giveup_fpu and load_up_fpu handle the extra VSX registers as well
      as the standard FP registers on machines with VSX.  Also, since VSX
      instructions can access the VMX registers and the FP registers as well
      as the extra VSX registers, we have to load up the FP and VMX registers
      before we can turn on the MSR_VSX bit for the guest.  Conversely, if
      we save away any of the VSX or FP registers, we have to turn off MSR_VSX
      for the guest.
      
      To handle all this, it is more convenient for a single call to
      kvmppc_giveup_ext() to handle all the state saving that needs to be done,
      so we make it take a set of MSR bits rather than just one, and the switch
      statement becomes a series of if statements.  Similarly kvmppc_handle_ext
      needs to be able to load up more than one set of registers.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      28c483b6
    • Paul Mackerras's avatar
      KVM: PPC: Book3S PR: Emulate PURR, SPURR and DSCR registers · b0a94d4e
      Paul Mackerras authored
      
      
      This adds basic emulation of the PURR and SPURR registers.  We assume
      we are emulating a single-threaded core, so these advance at the same
      rate as the timebase.  A Linux kernel running on a POWER7 expects to
      be able to access these registers and is not prepared to handle a
      program interrupt on accessing them.
      
      This also adds a very minimal emulation of the DSCR (data stream
      control register).  Writes are ignored and reads return zero.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      b0a94d4e
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages · 1cc8ed0b
      Paul Mackerras authored
      
      
      Currently, if the guest does an H_PROTECT hcall requesting that the
      permissions on a HPT entry be changed to allow writing, we make the
      requested change even if the page is marked read-only in the host
      Linux page tables.  This is a problem since it would for instance
      allow a guest to modify a page that KSM has decided can be shared
      between multiple guests.
      
      To fix this, if the new permissions for the page allow writing, we need
      to look up the memslot for the page, work out the host virtual address,
      and look up the Linux page tables to get the PTE for the page.  If that
      PTE is read-only, we reduce the HPTE permissions to read-only.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      1cc8ed0b
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Report correct HPT entry index when reading HPT · 05dd85f7
      Paul Mackerras authored
      
      
      This fixes a bug in the code which allows userspace to read out the
      contents of the guest's hashed page table (HPT).  On the second and
      subsequent passes through the HPT, when we are reporting only those
      entries that have changed, we were incorrectly initializing the index
      field of the header with the index of the first entry we skipped
      rather than the first changed entry.  This fixes it.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      05dd85f7
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Reset reverse-map chains when resetting the HPT · a64fd707
      Paul Mackerras authored
      
      
      With HV-style KVM, we maintain reverse-mapping lists that enable us to
      find all the HPT (hashed page table) entries that reference each guest
      physical page, with the heads of the lists in the memslot->arch.rmap
      arrays.  When we reset the HPT (i.e. when we reboot the VM), we clear
      out all the HPT entries but we were not clearing out the reverse
      mapping lists.  The result is that as we create new HPT entries, the
      lists get corrupted, which can easily lead to loops, resulting in the
      host kernel hanging when it tries to traverse those lists.
      
      This fixes the problem by zeroing out all the reverse mapping lists
      when we zero out the HPT.  This incidentally means that we are also
      zeroing our record of the referenced and changed bits (not the bits
      in the Linux PTEs, used by the Linux MM subsystem, but the bits used
      by the KVM_GET_DIRTY_LOG ioctl, and those used by kvm_age_hva() and
      kvm_test_age_hva()).
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a64fd707
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT · a2932923
      Paul Mackerras authored
      
      
      A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor.  Reads on
      this fd return the contents of the HPT (hashed page table), writes
      create and/or remove entries in the HPT.  There is a new capability,
      KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl.  The ioctl
      takes an argument structure with the index of the first HPT entry to
      read out and a set of flags.  The flags indicate whether the user is
      intending to read or write the HPT, and whether to return all entries
      or only the "bolted" entries (those with the bolted bit, 0x10, set in
      the first doubleword).
      
      This is intended for use in implementing qemu's savevm/loadvm and for
      live migration.  Therefore, on reads, the first pass returns information
      about all HPTEs (or all bolted HPTEs).  When the first pass reaches the
      end of the HPT, it returns from the read.  Subsequent reads only return
      information about HPTEs that have changed since they were last read.
      A read that finds no changed HPTEs in the HPT following where the last
      read finished will return 0 bytes.
      
      The format of the data provides a simple run-length compression of the
      invalid entries.  Each block of data starts with a header that indicates
      the index (position in the HPT, which is just an array), the number of
      valid entries starting at that index (may be zero), and the number of
      invalid entries following those valid entries.  The valid entries, 16
      bytes each, follow the header.  The invalid entries are not explicitly
      represented.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      [agraf: fix documentation]
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      a2932923
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Make a HPTE removal function available · 6b445ad4
      Paul Mackerras authored
      
      
      This makes a HPTE removal function, kvmppc_do_h_remove(), available
      outside book3s_hv_rm_mmu.c.  This will be used by the HPT writing
      code.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      6b445ad4
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Add a mechanism for recording modified HPTEs · 44e5f6be
      Paul Mackerras authored
      
      
      This uses a bit in our record of the guest view of the HPTE to record
      when the HPTE gets modified.  We use a reserved bit for this, and ensure
      that this bit is always cleared in HPTE values returned to the guest.
      
      The recording of modified HPTEs is only done if other code indicates
      its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
      The reason for this is that when later commits add facilities for
      userspace to read the HPT, the first pass of reading the HPT will be
      quicker if there are no (or very few) HPTEs marked as modified,
      rather than having most HPTEs marked as modified.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      44e5f6be
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix bug causing loss of page dirty state · 4879f241
      Paul Mackerras authored
      
      
      This fixes a bug where adding a new guest HPT entry via the H_ENTER
      hcall would lose the "changed" bit in the reverse map information
      for the guest physical page being mapped.  The result was that the
      KVM_GET_DIRTY_LOG could return a zero bit for the page even though
      the page had been modified by the guest.
      
      This fixes it by only modifying the index and present bits in the
      reverse map entry, thus preserving the reference and change bits.
      We were also unnecessarily setting the reference bit, and this
      fixes that too.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAlexander Graf <agraf@suse.de>
      4879f241