1. 01 Sep, 2014 1 commit
    • Xin Tong's avatar
      implementing victim TLB for QEMU system emulated TLB · 88e89a57
      Xin Tong authored
      QEMU system mode page table walks are expensive. Taken by running QEMU
      qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a
      4-level page tables in guest Linux OS takes ~450 X86 instructions on
      QEMU system mode TLB is implemented using a directly-mapped hashtable.
      This structure suffers from conflict misses. Increasing the
      associativity of the TLB may not be the solution to conflict misses as
      all the ways may have to be walked in serial.
      A victim TLB is a TLB used to hold translations evicted from the
      primary TLB upon replacement. The victim TLB lies between the main TLB
      and its refill path. Victim TLB is of greater associativity (fully
      associative in this patch). It takes longer to lookup the victim TLB,
      but its likely better than a full page table walk. The memory
      translation path is changed as follows :
      Before Victim TLB:
      1. Inline TLB lookup
      2. Exit code cache on TLB miss.
      3. Check for unaligned, IO accesses
      4. TLB refill.
      5. Do the memory access.
      6. Return to code cache.
      After Victim TLB:
      1. Inline TLB lookup
      2. Exit code cache on TLB miss.
      3. Check for unaligned, IO accesses
      4. Victim TLB lookup.
      5. If victim TLB misses, TLB refill
      6. Do the memory access.
      7. Return to code cache
      The advantage is that victim TLB can offer more associativity to a
      directly mapped TLB and thus potentially fewer page table walks while
      still keeping the time taken to flush within reasonable limits.
      However, placing a victim TLB before the refill path increase TLB
      refill path as the victim TLB is consulted before the TLB refill. The
      performance results demonstrate that the pros outweigh the cons.
      some performance results taken on SPECINT2006 train
      datasets and kernel boot and qemu configure script on an
      Intel(R) Xeon(R) CPU  E5620  @ 2.40GHz Linux machine are shown in the
      Google Doc link below.
      In summary, victim TLB improves the performance of qemu-system-x86_64 by
      11% on average on SPECINT2006, kernelboot and qemu configscript and with
      highest improvement of in 26% in 456.hmmer. And victim TLB does not result
      in any performance degradation in any of the measured benchmarks. Furthermore,
      the implemented victim TLB is architecture independent and is expected to
      benefit other architectures in QEMU as well.
      Although there are measurement fluctuations, the performance
      improvement is very significant and by no means in the range of
      Signed-off-by: default avatarXin Tong <trent.tong@gmail.com>
      Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com
      Reviewed-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
  2. 05 Jun, 2014 4 commits
  3. 13 Mar, 2014 8 commits
  4. 11 Feb, 2014 2 commits
  5. 13 Jan, 2014 5 commits
  6. 23 Dec, 2013 2 commits
  7. 07 Oct, 2013 1 commit
  8. 03 Sep, 2013 1 commit
  9. 09 Jul, 2013 1 commit
  10. 04 Jul, 2013 2 commits
  11. 28 Jun, 2013 1 commit
  12. 20 Jun, 2013 1 commit
    • Jan Kiszka's avatar
      exec: Resolve subpages in one step except for IOTLB fills · 90260c6c
      Jan Kiszka authored
      Except for the case of setting the IOTLB entry in TCG mode, we can avoid
      the subpage dispatching handlers and do the resolution directly on
      address_space_lookup_region. An IOTLB entry describes a full page, not
      only the region that the first access to a sub-divided page may return.
      This patch therefore introduces a special translation function,
      address_space_translate_for_iotlb, that avoids the subpage resolutions.
      In contrast, callers of the existing address_space_translate service
      will now always receive the terminal memory region section. This will be
      important for breaking the BQL and for enabling unaligned memory region.
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
  13. 14 Jun, 2013 1 commit
  14. 29 May, 2013 2 commits
  15. 16 Feb, 2013 1 commit
  16. 19 Dec, 2012 1 commit
  17. 23 Oct, 2012 1 commit
    • Avi Kivity's avatar
      Rename target_phys_addr_t to hwaddr · a8170e5e
      Avi Kivity authored
      target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
      reserved) and its purpose doesn't match the name (most target_phys_addr_t
      addresses are not target specific).  Replace it with a finger-friendly,
      standards conformant hwaddr.
      Outstanding patchsets can be fixed up with the command
        git rebase -i --exec 'find -name "*.[ch]"
                              | xargs s/target_phys_addr_t/hwaddr/g' origin
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
      Signed-off-by: default avatarAnthony Liguori <aliguori@us.ibm.com>
  18. 22 Oct, 2012 1 commit
    • Avi Kivity's avatar
      memory: per-AddressSpace dispatch · ac1970fb
      Avi Kivity authored
      Currently we use a global radix tree to dispatch memory access.  This only
      works with a single address space; to support multiple address spaces we
      make the radix tree a member of AddressSpace (via an intermediate structure
      AddressSpaceDispatch to avoid exposing too many internals).
      A side effect is that address_space_io also gains a dispatch table.  When
      we remove all the pre-memory-API I/O registrations, we can use that for
      dispatching I/O and get rid of the original I/O dispatch.
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
  19. 15 Oct, 2012 1 commit
  20. 15 Sep, 2012 1 commit
  21. 15 Aug, 2012 1 commit
  22. 12 May, 2012 1 commit