Auto-Adaptive scheduler - Final chapter ( the numbers ) ...

Larry McVoy lm en bitmover.com
Mie Ene 26 23:20:46 CST 2000


: The RQ = 2 case give me :
: 
: Old = 759000 switches / sec = 1.317 us
: New = 735000 switches / sec = 1.360 us

I certainly hope that noone would use this as a basis for accepting or
rejecting this patch.  For a number of reasons:

    . this shows a 3% difference.  My experience with linux and context
      switching is that the lack of page coloring can cause different
      runs of the same test to vary more than this, so these numbers 
      may be right and then again, they may not.  It's pretty hard to
      know for sure because you don't know how the OS placed the pages.
      The page placement can make all the difference in whether you
      collide or fit in the cache.
    
    . the real measure of any change is whether or not it increases or
      decreases the amount of code in the icache, the number of mispredicted
      branches in the icache, the amount of data in the dcache, and finally,
      any changes in the number of cache misses.
    
    . a toy benchmark which doesn't do anything but context switch
      will never shed any light.  The reason is this:  suppose most
      high context switch applications have a cache footprint of size
      A and the context switch path has a cache footprint of size C.
      The critical point is where A + C == sizeof(L1 cache).  An application
      which is larger than sizeof(L1 cache) has ``fallen out of the cache''
      and has dramatically worse performance.

      I think people will agree that for all real workloads the "A" part of
      the equation is much greater than the A part of the equation in the
      typical context switch benchmark.  A trivial context switch benchmark
      can easily fit in a 4K cache, probably fits in a 1K cache.

      The point is that the benchmark eats up very little of the cache, the
      code path in Linux has been kept very small, so a 16K or 32K cache
      actually has some room left.

      OK, so you make your change and you benchmark it.  As long as the
      benchmark size A plus the context switch path C is less or equal
      to the cache size, your change will be essentially invisible.  If
      you can see any difference in the two process case, that's bad.
      if we trust the numbers above (I most certainly do not, I'd want
      to see cycle counters), then we are seeing 43 nanosecond difference
      per context switch.  If the numbers were accurate, this would be
      looking a lot like a cache miss.

I'm sure many of you think I'm a raving lunatic to care about one stinking
cache miss.  I'm sorry, but the only way you prevent an OS from becoming
bloated is to care about each and every cache miss, each and every cache
line, and actually weigh the cost vs. the benefit.

In this case, I certainly don't see these numbers as conclusive, for all I
know, the new code could be faster rather than slower - we need to look at
the cycle counters to find out.  If Richard is reading this, I'll bet he
can tell us how to do that, I think he did a patch for that.

--lm

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda