Commit 03cbc732 authored by Wanpeng Li's avatar Wanpeng Li Committed by Ingo Molnar

sched/cputime: Resync steal time when guest & host lose sync


  57430218 ("sched/cputime: Count actually elapsed irq & softirq time")

... fixed a bug but also triggered a regression:

On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
on host one by one until there is only one left, then observe CPU utilization
via 'top' in the guest, it shows:

  100% st for cpu0(housekeeping)
   75% st for other CPUs (nohz full mode)

However, w/o this commit it shows the correct 75% for all four CPUs.

When a guest is interrupted for a longer amount of time, missed clock ticks
are not redelivered later. Because of that, we should not limit the amount
of steal time accounted to the amount of time that the calling functions
think have passed.

However, the interval returned by account_other_time() is NOT rounded down
to the nearest jiffy, while the base interval in get_vtime_delta() it is
subtracted from is, so the max cputime limit is required to avoid underflow.

This patch fixes the regression by limiting the account_other_time() from
get_vtime_delta() to avoid underflow, and lets the other three call sites
(in account_other_time() and steal_account_process_time()) account however
much steal time the host told us elapsed.
Suggested-by: default avatarRik van Riel <>
Suggested-by: default avatarPaolo Bonzini <>
Signed-off-by: default avatarWanpeng Li <>
Reviewed-by: default avatarRik van Riel <>
Cc: Frederic Weisbecker <>
Cc: Linus Torvalds <>
Cc: Mike Galbraith <>
Cc: Peter Zijlstra <>
Cc: Radim Krcmar <>
Cc: Thomas Gleixner <>
[ Improved the changelog. ]
Signed-off-by: default avatarIngo Molnar <>
parent 173be9a1
......@@ -263,6 +263,11 @@ void account_idle_time(cputime_t cputime)
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
* When a guest is interrupted for a longer amount of time, missed clock
* ticks are not redelivered later. Due to that, this function may on
* occasion account more time than the calling functions think elapsed.
static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
......@@ -371,7 +376,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
* idle, or potentially user or system time. Due to rounding,
* other time can exceed ticks occasionally.
other = account_other_time(cputime);
other = account_other_time(ULONG_MAX);
if (other >= cputime)
cputime -= other;
......@@ -486,7 +491,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
cputime = cputime_one_jiffy;
steal = steal_account_process_time(cputime);
steal = steal_account_process_time(ULONG_MAX);
if (steal >= cputime)
......@@ -516,7 +521,7 @@ void account_idle_ticks(unsigned long ticks)
cputime = jiffies_to_cputime(ticks);
steal = steal_account_process_time(cputime);
steal = steal_account_process_time(ULONG_MAX);
if (steal >= cputime)
......@@ -699,6 +704,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
unsigned long now = READ_ONCE(jiffies);
cputime_t delta, other;
* Unlike tick based timing, vtime based timing never has lost
* ticks, and no need for steal time accounting to make up for
* lost ticks. Vtime accounts a rounded version of actual
* elapsed time. Limit account_other_time to prevent rounding
* errors from causing elapsed vtime to go negative.
delta = jiffies_to_cputime(now - tsk->vtime_snap);
other = account_other_time(delta);
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment