CPU swapping on Linux

Lun Ene 24 04:29:27 CST 2000

On Fri, 21 Jan 2000, Mark Hahn wrote:

> > > My tests on one of those dual celerons show that in the extreme case of a set
> > > of processes that use all the available cache Linux is still far too eager to
> > > swap processes across CPU's.
> 
> this is unsurprising, since AFAIKS, the cost calculation is quite wrong.

Indeed, you're completely right. This is a bug in the kernel
that explains some strange results I've even been able to see
on my dual P120...

> current machines deliver around 350 MB/s of dram bandwidth [*],
> which means that the machine I'm sitting at (dual celeron 450) will
> take around 375 us to load or flush the cache.  but the scheduler is 
> using an estimate of 25 us!

*nod*  The constant used is erroneous too :)

> in arch/i386/kernel/smpboot.c:
> 
> 	cacheflush_time = cpu_hz/1024*cachesize/5000;
> 
> which, if I'm reading it right, assumes ~5 MB/s dram bandwidth.
> the calculation should be
> 
> 	cacheflushInCycles = cpuHz * cachesizeBytes / bytesPerSecond;
> 
> a better calculation would be:
> 
> 	static const unsigned bandwidth = 350000000;  
> 	cacheflush_time = cpu_hz * (cachesize*1024) / bandwidth;
> 
> or, overflow-obscurified:
> 	cacheflush_time = (cpu_hz>>20) * (cachesize<<10) / (bandwidth>>20)

I think we should fix this immediately. Ingo, what do you
say? (Ingo wrote the original code, maybe he had some things
in mind with it that we can't see by just looking at it)

I've done the calculation by hand and, assuming 150MB/sec
(which is very optimistic for my machine), I arrive at a
cacheflush_time of 5.3msec, instead of the 0.3usec (whatever,
I'm testing 2.2 now) the kernel used before.

This >1000 fold difference might well explain the strange
variances in CPU usage I've seen in programs like xmms :)

> with this fix, you should see a little more reluctance to bounce 
> processes across CPUs.  in fact, the heuristic itself should probably
> be applied to goodness() in general, even for uniprocessors, since 
> this heuristic is currently only used in the SMP reschedule_idle.

This may be a good point too. It might be worthwhile to
schedule the processes with a short timeslice in bunches
and in such a way that they don't interfere as much with
the longer timesliced processes.

I don't know if this would be worth it though...

> tweaking of PROC_CHANGE_PENALTY and p->mm==this_mm weights might
> help too.

Definately. Maybe we want to calculate PROC_CHANGE_PENALTY
and/or the TLB_FLUSH weight on boottime?  We don't have to
worry much about the extra cache access that will give because
we'll either have a small runqueue or we'll use those two
values over and over again :)

regards,

Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/