
From: Steven Rostedt <rostedt@goodmis.org>

According to the comments in include/linux/sched.h

/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL tasks are
* in the range MAX_RT_PRIO..MAX_PRIO-1. Priority values
* are inverted: lower p->prio value means higher priority.
*
* The MAX_USER_RT_PRIO value allows the actual maximum
* RT priority to be separate from the value exported to
* user-space.  This allows kernel threads to set their
* priority to a value higher than any user task. Note:
* MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
*/

This makes it look like the priority goes as follows:

prio: 0 .. MAX_RT_PRIO .. MAX_USER_RT_PRIO .. MAX_PRIO

where 0 is of highest priority

but in reality we have:

prio: 0 .. MAX_USER_RT_PRIO .. MAX_RT_PRIO .. MAX_PRIO

The comments say that MAX_RT_PRIO must not be smaller than
MAX_USER_RT_PRIO, but if it is bigger (thinking bigger means greater
than) then the system will crash on a SMP machine.

Here's how it works.  The migration_thread sets the priority of its
thread to MAX_RT_PRIO-1 via:

__setscheduler(p, SCHED_FIFO, MAX_RT_PRIO-1);

Now looking at __setscheduler

static void __setscheduler(struct task_struct *p, int policy, int prio)
{
        BUG_ON(p->array);
        p->policy = policy;
        p->rt_priority = prio;
        if (policy != SCHED_NORMAL)
                p->prio = MAX_USER_RT_PRIO-1 - p->rt_priority;
        else
                p->prio = p->static_prio;
}

If we have MAX_USER_RT_PRIO = 99 and MAX_RT_PRIO = 100 then we would get

  p->prio = 99-1 - 100-1 = -1;

This would be very bad when it comes time to schedule.  Not to mention
that kstop_machine uses MAX_RT_PRIO and then calls
sys_sched_setscheduler, which would fail if MAX_RT_PRIO >
MAX_USER_RT_PRIO. Below is a patch that makes MAX_RT_PRIO work if it is
greater than MAX_USER_RT_PRIO on a SMP machine.  The p->mm is to allow
kstop_machine to work and any other kernel threads.

I tested the patch on an SMP machine where MAX_RT_PRIO = 100 and
MAX_USER_RT_PRIO = 99. Without the patch, the system crashes with a
reboot.

Funny, back in July 2002, this was noticed by an Anton Wilson and he was
just lost in the noise!
http://seclists.org/lists/linux-kernel/2002/Jul/1695.html

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 arch/ia64/sn/kernel/xpc_main.c |    2 +-
 kernel/sched.c                 |    5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff -puN arch/ia64/sn/kernel/xpc_main.c~max_user_rt_prio-and-max_rt_prio-are-wrong arch/ia64/sn/kernel/xpc_main.c
--- 25/arch/ia64/sn/kernel/xpc_main.c~max_user_rt_prio-and-max_rt_prio-are-wrong	2005-06-25 01:17:13.000000000 -0700
+++ 25-akpm/arch/ia64/sn/kernel/xpc_main.c	2005-06-25 01:17:13.000000000 -0700
@@ -420,7 +420,7 @@ xpc_activating(void *__partid)
 	partid_t partid = (u64) __partid;
 	struct xpc_partition *part = &xpc_partitions[partid];
 	unsigned long irq_flags;
-	struct sched_param param = { sched_priority: MAX_USER_RT_PRIO - 1 };
+	struct sched_param param = { sched_priority: MAX_RT_PRIO - 1 };
 	int ret;
 
 
diff -puN kernel/sched.c~max_user_rt_prio-and-max_rt_prio-are-wrong kernel/sched.c
--- 25/kernel/sched.c~max_user_rt_prio-and-max_rt_prio-are-wrong	2005-06-25 01:17:13.000000000 -0700
+++ 25-akpm/kernel/sched.c	2005-06-25 01:17:13.000000000 -0700
@@ -3527,7 +3527,7 @@ static void __setscheduler(struct task_s
 	p->policy = policy;
 	p->rt_priority = prio;
 	if (policy != SCHED_NORMAL)
-		p->prio = MAX_USER_RT_PRIO-1 - p->rt_priority;
+		p->prio = MAX_RT_PRIO-1 - p->rt_priority;
 	else
 		p->prio = p->static_prio;
 }
@@ -3559,7 +3559,8 @@ recheck:
 	 * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL is 0.
 	 */
 	if (param->sched_priority < 0 ||
-	    param->sched_priority > MAX_USER_RT_PRIO-1)
+	    (p->mm &&  param->sched_priority > MAX_USER_RT_PRIO-1) ||
+	    (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
 		return -EINVAL;
 	if ((policy == SCHED_NORMAL) != (param->sched_priority == 0))
 		return -EINVAL;
_
