Auto-Adaptive scheduler and semaphore patch ( 2.2.14 ) ...

Davide Libenzi davidel en maticad.it
Mar Ene 25 19:25:24 CST 2000


Tuesday, January 25, 2000 5:00 AM
Larry McVoy <lm en bitmover.com> wrote :
> :
> : >From my point of view this is a winner.  What do other people's numbers
> : look like?

> Umm, looking at your vmstat numbers, which say you are doing 88,000
> context switches/sec, or in other words, if the context switch itself
> took 0 time, each process is running all of 11.4 usecs before switching.
> Given that you are running loads of processes, it's not unreasonable to
> think that the context switch itself is taking 100% of that 11.4 usecs.

Larry we're testing switching times, what the benchmarks should do,
writing to serial ?

The _very_ interesting points in Ed benchmarks are :

1) We get 10% increase with only 4-5 tasks in RQ
2) The number on processes in RQ decrease with the patch installed.
    This _means_ that the semaphore release patch give a faster exit way to
    waiting tasks ( avoid flushing all threads into RQ, so it gets a shorter
one ).


> How many times does this need to be said?  This is an irrelevant
> data point.  I don't care if you see a 100% improvement in this case.
> It's a stupid test case.  Under no circumstances would I trade off even
> a 1% improvement in this case for some much as one more cache miss in
> the scheduler.  It's just silly.
> This game is like talking about how much smoother you can make your
> engine run at 110,000 RPM, while merrily ignoring that (a) you never
> run your engine at 110,000 RPM, and (b) the supposed improvemnt gives
> you worse gas mileage at 4,000 RPM, where you spend all your time.

Larry You've asked numbers and these are numbers, aren't they ?
As I've said into the first annouce on the new scheduler, it uses
the linear version ( current one ) for low workloads and switch to
cluster version under high workloads.
The entry point and the delay at which to new scheduler switch on/off
can be made software tunable via proc fs.
In this way one can tune the scheduler behaviour to adapt it to the
applications ( and charge ) running on his system.

More, I want to spend a word about the importance of a cache misses into
the scheduler with a 1-2 tasks in its RQ.
We can state that the number of processes into the RQ is proportional to
the number of processes running onto the system with a factor K which
depend on the applications behaviour.
A system with 20 processes in RQ will probably have 10 times the processes
of a system with 2 processes in RQ ( given K costant for the two systems ).
These 10 times of processes, if we exclude are doing "for (;;);", are
probably request some form of interaction with devices plugged onto the
system.
The method the devices has to advise the system of an IO completion is IRQs
that fires interrups that fires schedule().
Think about at the interrupts a SCSI card ( which can handle more than one
request at a time ), or a network ( systems can have more than one ) card
can
fire in a second when hurt by a lot of requests.
So 10 times of more processes will generate ( even if not 10x ) a lot of
more
interrupts ( read schedule() ) than the low charged one.
So giving the possibility to the scheduler to fast resolve the next process
election, will result in a better system behaviour and in a lower interrupt
latency.

So definitely one cache misses on a system with an RQ = 1 or 2 is _not_ a
big deal.
So definitely a longer schedule() exit path on heavy loaded system _is_ a
big deal.

If You've seen the patch You've noted the way in which it performs and
works,
so if You want to speak about the code, the patch is there and if You to
speak
about the numbers, the numbers are here.


I've also in mind another change that can, IMVHO, speed up even more the
cluster
scheduler().
Now the counter member of the task_struct is charged ( for normal priority )
with a value of 20.
When a task has exhausted its counter it'll wait until all the processes in
RQ
has exhausted its own and then a recharge loop is performed.
Now this value is kept low to avoid processes waiting at 0 goodness to
station
more than a bit in an inactive state.
This result in an relatively high recharge loops rate with all RQ tasks
restarting at
the same level ( for equal priority ) and fall down along this short path.
This kind of behaviour limit the performance the cluster scheduler can have
coz
it concentrates all tasks in a short interval of goodness ( similar tasks
can
have the same fall down and therefore share the same slot ).
The cluster scheduler has the cluster ( slot ) 0 populated by exhausted
processes
that, in this implementation are waiting for a charge.
My idea is to lengthen the counter charge value ( _not_ the time slice ) and
implement a single process recharge ( in a FIFO way ) for tasks in cluster
0.
This can lead to a more fine grained distribution along clusters as long as
a better ( or equal ) tasks responsiveness.

What do You think about ?



With respect,
    Davide.

--
All this stuff is IMVHO



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda