Commit 5181f4a4 authored by Steven Rostedt's avatar Steven Rostedt Committed by Ingo Molnar
Browse files

sched: Use pushable_tasks to determine next highest prio

Hillf Danton proposed a patch (see link) that cleaned up the
sched_rt code that calculates the priority of the next highest priority
task to be used in finding run queues to pull from.

His patch removed the calculating of the next prio to just use the current
prio when deteriming if we should examine a run queue to pull from. The problem
with his patch was that it caused more false checks. Because we check a run
queue for pushable tasks if the current priority of that run queue is higher
in priority than the task about to run on our run queue. But after grabbing
the locks and doing the real check, we find that there may not be a task
that has a higher prio task to pull. Thus the locks were taken with nothing to

I added some trace_printks() to record when and how many times the run queue
locks were taken to check for pullable tasks, compared to how many times we
pulled a task.

With the current method, it was:

  3806 locks taken vs 2812 pulled tasks

With Hillf's patch:

  6728 locks taken vs 2804 pulled tasks

The number of times locks were taken to pull a task went up almost double with
no more success rate.

But his patch did get me thinking. When we look at the priority of the highest
task to consider taking the locks to do a pull, a failure to pull can be one
of the following: (in order of most likely)

 o RT task was pushed off already between the check and taking the lock
 o Waiting RT task can not be migrated
 o RT task's CPU affinity does not include the target run queue's CPU
 o RT task's priority changed between the check and taking the lock

And with Hillf's patch, the thing that caused most of the failures, is
the RT task to pull was not at the right priority to pull (not greater than
the current RT task priority on the target run queue).

Most of the above cases we can't help. But the current method does not check
if the next highest prio RT task can be migrated or not, and if it can not,
we still grab the locks to do the test (we don't find out about this fact until
after we have the locks). I thought about this case, and realized that the
pushable task plist that is maintained only holds RT tasks that can migrate.
If we move the calculating of the next highest prio task from the inc/dec_rt_task()
functions into the queuing of the pushable tasks, then we only measure the
priorities of those tasks that we push, and we get this basically for free.

Not only does this patch make the code a little more efficient, it cleans it
up and makes it a little simpler.

Thanks to Hillf Danton for inspiring me on this patch.
Signed-off-by: default avatarSteven Rostedt <>
Signed-off-by: default avatarPeter Zijlstra <>
Cc: Hillf Danton <>
Cc: Gregory Haskins <>

Signed-off-by: default avatarIngo Molnar <>
parent c37495fd
......@@ -124,21 +124,33 @@ static void dec_rt_migration(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
static inline int has_pushable_tasks(struct rq *rq)
return !plist_head_empty(&rq->rt.pushable_tasks);
static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
plist_node_init(&p->pushable_tasks, p->prio);
plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);
/* Update the highest prio pushable task */
if (p->prio < rq->
rq-> = p->prio;
static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
static inline int has_pushable_tasks(struct rq *rq)
return !plist_head_empty(&rq->rt.pushable_tasks);
/* Update the new highest prio pushable task */
if (has_pushable_tasks(rq)) {
p = plist_first_entry(&rq->rt.pushable_tasks,
struct task_struct, pushable_tasks);
rq-> = p->prio;
} else
rq-> = MAX_RT_PRIO;
......@@ -698,47 +710,13 @@ static void update_curr_rt(struct rq *rq)
#if defined CONFIG_SMP
static struct task_struct *pick_next_highest_task_rt(struct rq *rq, int cpu);
static inline int next_prio(struct rq *rq)
struct task_struct *next = pick_next_highest_task_rt(rq, rq->cpu);
if (next)
return next->prio;
return MAX_RT_PRIO;
static void
inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
struct rq *rq = rq_of_rt_rq(rt_rq);
if (prio < prev_prio) {
* If the new task is higher in priority than anything on the
* run-queue, we know that the previous high becomes our
* next-highest.
rt_rq-> = prev_prio;
if (rq->online)
cpupri_set(&rq->rd->cpupri, rq->cpu, prio);
} else if (prio == rt_rq->highest_prio.curr)
* If the next task is equal in priority to the highest on
* the run-queue, then we implicitly know that the next highest
* task cannot be any lower than current
rt_rq-> = prio;
else if (prio < rt_rq->
* Otherwise, we need to recompute next-highest
rt_rq-> = next_prio(rq);
if (rq->online && prio < prev_prio)
cpupri_set(&rq->rd->cpupri, rq->cpu, prio);
static void
......@@ -746,9 +724,6 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
struct rq *rq = rq_of_rt_rq(rt_rq);
if (rt_rq->rt_nr_running && (prio <= rt_rq->
rt_rq-> = next_prio(rq);
if (rq->online && rt_rq->highest_prio.curr != prev_prio)
cpupri_set(&rq->rd->cpupri, rq->cpu, rt_rq->highest_prio.curr);
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment