[PATCH] 2.3.41 scheduler change
Rik van Riel
riel en nl.linux.org
Sab Ene 29 17:19:23 CST 2000
Hi,
since the biggest penalty with scheduling is the repopulation
of the cache with (user) pages from the new process, I've
written a scheduler patch for 2.3.41 that tries to optimise
for that.
The patch doesn't use p->counter for priority purposes, this
means that we should incurr less cache misses in the following
(quite common?) situation.
- you have a number of cpus, N (N = 1, 2, whatever)
- you have N+1 processes using a fair amount of CPU
- you have vi, pine, mutt, whatever using very little CPU
When you press a key in vi and the currently running runnable
process gets descheduled, it will have a lower priority than
the other process (because it's used part of its timeslice)
and immediately after vi the _other_ running process gets
scheduled. This could result in unneeded cache thrashing.
My patch calculates a slower-moving priority so that short-term
use of CPU (within one timeslice) doesn't influence priority.
This means that the _same_ running process will be rescheduled
after vi goes back to sleep, this should reduce cache thrashing
and increase efficiency.
I haven't done a lot of performance tests with this scheduler
yet, but xmms seems to use about 20% less CPU with this patch
than without ... however, this is on a dual pentium with shared
L2 cache. I suspect that a dual celeron or Xeon will see the
biggest benefits of this patch.
I'm interested in any comments on and measurements on this patch.
p->counter and p->priority have been changed into shorts because
the short->int conversion most probably is much cheaper than an
extra cache miss ... any numbers on this are very much welcome too.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
--- linux-2.3.41/fs/proc/array.c.orig Sat Jan 29 03:55:49 2000
+++ linux-2.3.41/fs/proc/array.c Sat Jan 29 14:47:58 2000
@@ -317,7 +317,7 @@
/* scale priority and nice values from timeslices to -20..20 */
/* to make it look like a "normal" Unix priority/nice value */
- priority = task->counter;
+ priority = (task->dynpriority >> (SHIFT_PRIORITY - 1));
priority = 20 - (priority * 10 + DEF_PRIORITY / 2) / DEF_PRIORITY;
nice = task->priority;
nice = 20 - (nice * 20 + DEF_PRIORITY / 2) / DEF_PRIORITY;
--- linux-2.3.41/kernel/sched.c.orig Sat Jan 29 02:52:17 2000
+++ linux-2.3.41/kernel/sched.c Sat Jan 29 15:42:46 2000
@@ -110,7 +110,7 @@
static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm)
{
- int weight;
+ int weight = 0;
/*
* Realtime process, select the first one on the
@@ -123,16 +123,17 @@
}
/*
- * Give the process a first-approximation goodness value
- * according to the number of clock-ticks it has left.
- *
- * Don't do any other calculations if the time slice is
- * over..
+ * Don't do any calculations if the time slice is over..
*/
- weight = p->counter;
- if (!weight)
+ if (!p->counter)
goto out;
+ /*
+ * Our basic priority, we can get some bonusses later.
+ * All of this is done to be cache friendly.
+ */
+ weight = p->dynpriority >> SHIFT_PRIORITY;
+
#ifdef __SMP__
/* Give a largish advantage to the same processor... */
/* (this is equivalent to penalizing other processors) */
@@ -141,9 +142,8 @@
#endif
/* .. and a slight advantage to the current MM */
- if (p->mm == this_mm)
+ if (p->active_mm == this_mm)
weight += 1;
- weight += p->priority;
out:
return weight;
@@ -173,7 +173,7 @@
*/
static inline int preemption_goodness(struct task_struct * prev, struct task_struct * p, int cpu)
{
- return goodness(p, cpu, prev->mm) - goodness(prev, cpu, prev->mm);
+ return goodness(p, cpu, prev->active_mm) - goodness(prev, cpu, prev->active_mm);
}
/*
@@ -609,7 +609,11 @@
spin_unlock_irq(&runqueue_lock);
read_lock(&tasklist_lock);
for_each_task(p)
- p->counter = (p->counter >> 1) + p->priority;
+ /* we won't dirty the cacheline when we don't have to */
+ if (p->dynpriority != (p->priority << SHIFT_PRIORITY) || p->counter != p->priority) {
+ p->dynpriority = p->dynpriority - (p->dynpriority >> SHIFT_PRIORITY) + p->counter;
+ p->counter = p->priority;
+ }
read_unlock(&tasklist_lock);
spin_lock_irq(&runqueue_lock);
}
--- linux-2.3.41/kernel/fork.c.orig Sat Jan 29 13:04:43 2000
+++ linux-2.3.41/kernel/fork.c Sat Jan 29 15:19:08 2000
@@ -714,6 +714,8 @@
*/
current->counter >>= 1;
p->counter = current->counter;
+ current->dynpriority -= (current->dynpriority >> SHIFT_PRIORITY);
+ p->dynpriority = current->dynpriority;
/*
* Ok, add it to the run-queues and make it
--- linux-2.3.41/include/linux/sched.h.orig Sat Jan 29 02:46:03 2000
+++ linux-2.3.41/include/linux/sched.h Sat Jan 29 04:46:01 2000
@@ -273,8 +273,9 @@
cycles_t avg_slice;
int lock_depth; /* Lock depth. We can context switch in and out of holding a syscall kernel lock... */
/* begin intel cache line */
- long counter;
- long priority;
+ short counter;
+ short priority;
+ long dynpriority;
unsigned long policy;
/* memory management info */
struct mm_struct *mm, *active_mm;
@@ -385,6 +386,7 @@
#define _STK_LIM (8*1024*1024)
#define DEF_PRIORITY (20*HZ/100) /* 200 ms time slices */
+#define SHIFT_PRIORITY 6 /* priority calculation falloff */
/*
* INIT_TASK is used to set up the first task table, touch at
@@ -393,7 +395,7 @@
#define INIT_TASK(name) \
/* state etc */ { 0,0,0,KERNEL_DS,&default_exec_domain,0, \
/* avg_slice */ 0, -1, \
-/* counter */ DEF_PRIORITY,DEF_PRIORITY,SCHED_OTHER, \
+/* counter */ DEF_PRIORITY,DEF_PRIORITY,(DEF_PRIORITY << SHIFT_PRIORITY),SCHED_OTHER, \
/* mm */ NULL, &init_mm, \
/* has_cpu */ 0,0, \
/* run_list */ LIST_HEAD_INIT(init_task.run_list), \
--- linux-2.3.41/arch/i386/kernel/process.c.orig Sat Jan 29 13:31:30 2000
+++ linux-2.3.41/arch/i386/kernel/process.c Sat Jan 29 13:33:08 2000
@@ -90,6 +90,7 @@
init_idle();
current->priority = 0;
current->counter = -100;
+ current->dynpriority = 0;
while (1) {
void (*idle)(void) = acpi_idle;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda