Auto-Adaptive scheduler - Final chapter ( the numbers ) ...

Davide Libenzi davidel en maticad.it
Jue Ene 27 07:37:04 CST 2000


I want to excuse me to You all guys if my english is not so good to explain
my feels
but has someone realized that I _want_ to include this patch into the kernel
?
I've reported only code and numbers, to be evaluated together with guys that
have a experience
greater than mine, these are only code and numbers.
I _want_ exactly nothing.

So Larry, don't mine my education to leave place for flames that not are
well suited in this
techincal list, and please stop Your personal crusade against guys that
_want_ to change the
scheduler and want childs to have bad nightmares thinking about kernel code.


> : The RQ = 2 case give me :
> :
> : Old = 759000 switches / sec = 1.317 us
> : New = 735000 switches / sec = 1.360 us

> I certainly hope that noone would use this as a basis for accepting or
> rejecting this patch.  For a number of reasons:

>     . this shows a 3% difference.

The mathematic is mathematic, and its beauty is that leave no space to
politicants ;)
If You think that an RQ = 2 can result in 759000 switch / sec, then You're
right this is
a 3 % difference. But even heavy benchmarks ( SpecWebXX, etc .. ) shows that
plotting
in a XY graphic the switch / sec versus the RQ length, the curve intersect
the line X = 2
in a range for 1200 to 2000. So taking the higher one, results that the
patch will give
a 0.0086 % performance loose for RQ = 2. And this is mathematic plus real
experience.


>       My experience with linux and context
>       switching is that the lack of page coloring can cause different
>       runs of the same test to vary more than this, so these numbers
>       may be right and then again, they may not.  It's pretty hard to
>       know for sure because you don't know how the OS placed the pages.
>       The page placement can make all the difference in whether you
>       collide or fit in the cache.

As Your great experience taught me these are the lower limit of a 16 test
set.


>     . the real measure of any change is whether or not it increases or
>       decreases the amount of code in the icache, the number of
mispredicted
>       branches in the icache, the amount of data in the dcache, and
finally,
>       any changes in the number of cache misses.

This is the code ( sorry I forgot to include the bench software ) used to
test :

<CODE>
/*
 * Original version by Willy Terreau
 * Modified by Davide Libenzi
 */
void            oneatwork(int thr)
{
    long long int counter = 0;
    while (!stop) /* Is writed only at the end of the test */
    {
        if (count) /* Is writed only at the end of the test */
            ++counter;
        syscall(158); /* sys_sched_yield() */
    }
    totalwork[thr] = counter;
    pthread_exit(0);
}
</CODE>

This is a pure switching time test.
It is a _real_ switching test, that can be used to test switching times
under _certain_
RQ loads :

vmstat of a "lat_ctx -s 0 20" give no more than 5-6 tasks in RQ

vmstat of a "threads 20 xx" give always an RQ = 20 ( with no need of cost
compensation )

Now, as even my sister ( that works in business ) can realize this code is
stored
entirely in cache, and the write to the counter is onto the process stack.
I _want_ this behaviour coz I _want_ a measure of clean switching times by
keeping
cache issues ot of the test.


>     . a toy benchmark which doesn't do anything but context switch
>       will never shed any light.  The reason is this:  suppose most
>       high context switch applications have a cache footprint of size
>       A and the context switch path has a cache footprint of size C.
>       The critical point is where A + C == sizeof(L1 cache).  An
application
>       which is larger than sizeof(L1 cache) has ``fallen out of the
cache''
>       and has dramatically worse performance.

>       I think people will agree that for all real workloads the "A" part
of
>       the equation is much greater than the A part of the equation in the
>       typical context switch benchmark.  A trivial context switch
benchmark
>       can easily fit in a 4K cache, probably fits in a 1K cache.

>       The point is that the benchmark eats up very little of the cache,
the
>       code path in Linux has been kept very small, so a 16K or 32K cache
>       actually has some room left.

>       OK, so you make your change and you benchmark it.  As long as the
>       benchmark size A plus the context switch path C is less or equal
>       to the cache size, your change will be essentially invisible.  If
>       you can see any difference in the two process case, that's bad.
>       if we trust the numbers above (I most certainly do not, I'd want
>       to see cycle counters), then we are seeing 43 nanosecond difference
>       per context switch.  If the numbers were accurate, this would be
>       looking a lot like a cache miss.

> I'm sure many of you think I'm a raving lunatic to care about one stinking
> cache miss.  I'm sorry, but the only way you prevent an OS from becoming
> bloated is to care about each and every cache miss, each and every cache
> line, and actually weigh the cost vs. the benefit.

I'm asking to You Larry ( and to others ) to prove the "dramatically worse
performance"
( with low RQ ) of this patch in the same way I've reported my "code and
numbers".
Or better if You've not time to do it, point me where I can find a test
suite that
can prove what are You saying. I'll do it for You ( but I don't have an SMP
machine :( ).
If tests will reports this "dramatically worse performance" of the patch
I'll be _really_ happy to give it to my cat for dinner.


> In this case, I certainly don't see these numbers as conclusive, for all I
> know, the new code could be faster rather than slower - we need to look at
> the cycle counters to find out.  If Richard is reading this, I'll bet he
> can tell us how to do that, I think he did a patch for that.

As the tons of question marks at the end of my message reveals,
and as I've even said, I'm all but sure about the convenience of the patch i
nclusion,
so We can agree here Larry.


PS:
I include the "instrument" used for my measures.


Always with respect,
Davide.

--
All this stuff is IMVHO

------------ próxima parte ------------
A non-text attachment was scrubbed...
Name: threads.c
Type: application/octet-stream
Size: 3234 bytes
Desc: no disponible
URL: <https://lists.srvr.mx/pipermail/ayuda/attachments/20000128/99b3451e/attachment.obj>


Más información sobre la lista de distribución Ayuda