Auto-Adaptive scheduler - Final chapter ( the numbers ) ...
Larry McVoy
lm en bitmover.com
Mie Ene 26 23:20:46 CST 2000
: The RQ = 2 case give me :
:
: Old = 759000 switches / sec = 1.317 us
: New = 735000 switches / sec = 1.360 us
I certainly hope that noone would use this as a basis for accepting or
rejecting this patch. For a number of reasons:
. this shows a 3% difference. My experience with linux and context
switching is that the lack of page coloring can cause different
runs of the same test to vary more than this, so these numbers
may be right and then again, they may not. It's pretty hard to
know for sure because you don't know how the OS placed the pages.
The page placement can make all the difference in whether you
collide or fit in the cache.
. the real measure of any change is whether or not it increases or
decreases the amount of code in the icache, the number of mispredicted
branches in the icache, the amount of data in the dcache, and finally,
any changes in the number of cache misses.
. a toy benchmark which doesn't do anything but context switch
will never shed any light. The reason is this: suppose most
high context switch applications have a cache footprint of size
A and the context switch path has a cache footprint of size C.
The critical point is where A + C == sizeof(L1 cache). An application
which is larger than sizeof(L1 cache) has ``fallen out of the cache''
and has dramatically worse performance.
I think people will agree that for all real workloads the "A" part of
the equation is much greater than the A part of the equation in the
typical context switch benchmark. A trivial context switch benchmark
can easily fit in a 4K cache, probably fits in a 1K cache.
The point is that the benchmark eats up very little of the cache, the
code path in Linux has been kept very small, so a 16K or 32K cache
actually has some room left.
OK, so you make your change and you benchmark it. As long as the
benchmark size A plus the context switch path C is less or equal
to the cache size, your change will be essentially invisible. If
you can see any difference in the two process case, that's bad.
if we trust the numbers above (I most certainly do not, I'd want
to see cycle counters), then we are seeing 43 nanosecond difference
per context switch. If the numbers were accurate, this would be
looking a lot like a cache miss.
I'm sure many of you think I'm a raving lunatic to care about one stinking
cache miss. I'm sorry, but the only way you prevent an OS from becoming
bloated is to care about each and every cache miss, each and every cache
line, and actually weigh the cost vs. the benefit.
In this case, I certainly don't see these numbers as conclusive, for all I
know, the new code could be faster rather than slower - we need to look at
the cycle counters to find out. If Richard is reading this, I'll bet he
can tell us how to do that, I think he did a patch for that.
--lm
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda