Auto-Adaptive scheduler - Final chapter ( the numbers ) ...
Marco Colombo
marco en esi.it
Lun Ene 31 13:39:51 CST 2000
On Sat, 29 Jan 2000, Jamie Lokier wrote:
[...]
> > > How many switch / sec will do a system with RQ << 1 ?
> > > Stay big say 500 ( keep this number in mind ).
> >
> > ??? where do this figure come from?
>
> It's an assumption, to demonstrate a technical point.
> The actual figure does not matter.
It's a wrong assumption. And it suggests another wrong assumption
behind it. The question "How many switch / sec will do a system
with RQ << 1?" makes no sense since there's no fixed relationship
between the two numbers.
> > [ And RQ is not the same of context switching rate, BTW ].
>
> Davide knows that RQ is different from switching rate.
Yes he does. He does not realize that he's assuming such an equivance
in his arguments.
> > Let me state this: switch rate has *nothing* to do with load, nor does
> > the RQ.
>
> which you follow by this counter-example:
>
> > you can have a server with 5 clients which shows a RQ of 5, if the
> > backend server is trivially implemented with the basic accept()/fork()
> > framework, or a RQ of 1 (and NEVER MORE, even with 1000 clients) if
> > implemented with a clever select() loop (there are clever ways to do
> > the fork() thing, too). Note that the load on the server is the same,
> > and (with 5 clients) also the performances will be nearly the same.
>
> As you can clearly see,
>
> - when the first server is busy, it has a RQ of 5 and switches often
> - when the second server is busy, it has a RQ of 1 and never switches
[quite unrelated]
? never switches ? - i'm not sure of this. I'd like to see real world
figures. Even the single process may sleep for net I/O, i think.
So you get some switches. Anyway, that's not the point here.
> Therefore in your example, RQ and switching rate are very closely related.
They are (in my example).
The example was made to show that "RQ" is not related to "load". Under
the same load, you can have very different RQ, depending (in my example)
on application design.
Also it shows that "switch rate" is not related to "load". Again, in my
example, it's related to application design instead.
The example was not made to show that RQ is not related to switch rate.
I can provide a easy one, but i don't think it's necessary, since I'm sure
you know there's no such a fixed relationship.
> Real servers are not so simple of course. But let us stick with the
> simple one.
Yes. Counter-examples may be simple and silly... B-)
Positive examples cannot not.
> > So please stop stating that:
> >
> > long RQ <=> high loads (1)
>
> Your own example just demonstrated that long RQ <=> high switching rate
Oh, well, Jamie... below you teach me Logic. I'm pretty sure you know
that an example does not (positively) "demonstrate" anything.
You can use examples only to demonstrate something is *false*.
And, anyway, i don't see the point. I wrote "long RQ <=> high loads".
This has nothing to do with "long RQ <=> high switching rate", as you
wrote. Unless you're assuming that "high loads <=> high switching rate".
My example was made exacly to make it clear they are two different
things...
> > And please don't go on with other arguments until you've somewhat proven
> > (1) is true for any real world case.
>
> I'll leave that to you...
>
> > That's 1 person [me] telling you that (1) is false.
> >
> > > Jamie say :
> > > > Real world problems do not have that kind of load. Real world problems
> > > > _do_ have cache issues.
> > > > The cache issue is so important for real applications that optimising
> > > > the scheduler under unusual RQ loads isn't worth doing.
> >
> > That's 2.
>
> Thanks for quoting me, but you have misunderstood it.
>
> I did not confirm the assertion (1). I said (and really I was only
> explaining others' words), that cache reload timing is more significant
> than scheduler switching time.
You're right. I've included you in the number because you quoted Rik and
seemed to agree with him.
> The corollary is that the both scheduler and application should be
> written to minimise cache loading first, and things like switching time
> second. And that means keeping the scheduler small, and optimising
> applications so they have small RQs.
Ok. That's pretty much (3). If you see RQ > 30, it's likely you see
high cache miss-rate. Which is, as you wrote, worse that the scheduler
overhead.
> I said nothing about the relationship between RQ length and switching
> rates, or system load.
Rik said "Real world problems do not have that kind of load". As i read it,
it means that real applications do generate high loads, but usually not
long RQs. "(1) is (usually) false" follows directly, it think.
> > I can't think of any RL example in which a long RQ does not come out of
> > a bad design (including the use of a underpowered HW in the first place,
> > of course). I've said REAL LIFE EXAMPLE... not just a kind of
> > benchmark.
>
> You may be right here. That is certainly what everyone on the "keep the
> scheduler simple" side has said.
>
> But IMO, Davide has cleverly pointed out that the "good applications
> don't have a large RQ" assertion isn't actually relevent to the optimisation.
>
> Why? Because all the *good* applications don't *care* if the scheduler
> is simple or not. Under load, good applications *don't schedule at all*!
Agreed.
> Now, that's not really true of web servers invoking CGI and copying the
> response through a pipe to a socket, etc. If you call that an
> interesting case, I agree. If you call that broken design, then we can
> assume the simple single-threaded server model. So let's assume that...
>
> In other words, you have taken A => B and deduced B => A.
> We both know that's a common logical mistake.
I don't follow you here. What is A and what is B?
Besides that, I do think that the CGI case is "broken design".
Do you fork()/exec() /usr/bin/perl everytime? That's bad design.
Use mod_perl or mod_php. Do you do SQL queries to your postgres server
on the same system, and that kills performance? Buy one more CPU,
or move it to another hw. Are you serving 10000 cuncurrent clients
with a 486SX/25 and 8MB of RAM? That's broken design anyway.
You may eventually hit against some limitations of the current *kernel*
design. True. But fix them, not the scheduler. If the limitations
you're hitting against are kernel ones, I agree, that's an "interesting"
case. Running an overloaded Web server on a 486 is not. Nor is
running an overloaded server on a quad Xeon, for that matter.
> > [snip variations on the theme "long RQ <=> broken application design"]
>
> I hope I have managed to explain why this point is not relevant to
> optimising the scheduler. (Assuming simple single-thread-is-busy
> applications). From this point of view, non-broken application designs
> just don't care about the scheduler at all.
So, what's the purpose of the patch? To support broken application designs?
If it comes at no costs, I agree. Davide's "no cost" argument is simply
based on the wrong assumpion that "short RQ" means "low switch rate".
> > Againg, you can have high workloads with RQ ~= 1!
>
> You can. When RQ <= 1, you don't care about the scheduler perfo
Why? You're assuming (like Davide) that small RQ means small switch rate...
> > You should be reading, in the first place...
> > The cache issue mainly cames into play when the system is LOADED... and:
> > - this can happen with RQ being small. The scheduler should be optimized
> > for that (and actually is), since this is the NORMAL condition under
> > high loads;
>
> Ah, but is the normal condition that the scheduler isn't entered at all
> under that kind of high load?
Again! Everytime I say "small RQ" you translate into "low switch rate"!!
[ "the scheduler isn't entered at all" - what do you mean?
I take it for "low switch rate". But I've written "small RQ".
Why do you keep mixing two unrelated things? ]
Yes, the scheduler *is* entered a lot of times under (certain kinds of)
heavy load, and that's a NORMAL condition. High switch rates MAY be
caused by real applications, even optimized ones. That's why the scheduler
should be (and is) optimized to be fast. Please note that under this
(common, sensible, right, desiderable, what else?) condition Davide's
patch fails to be effective because the RQ is small.
> I don't think it is, because real servers cobble together complicated
> stuff, but it is true in the select() based server example.
Also the select() server may switch somethimes...
but yes, I *agree* than in real world cases you can have high switch rates...
> > - by the time you get a long RQ, the system is so overloaded (in real cases)
> > that the cache issue is a MAJOR one anyway, no matter the scheduler...
>
> Now now, that isn't true if all the processes are scheduling around the
> same memory image. At that point we might even find that it's kernel
> stack cache conflicts that are most significant, in which case we might
> want to colour the stack entry position a little...
OK. The whole matter is for real world cases. I thought it was clear...
Do you think the case you illustraded is common in real world applications?
And BTW, if "the processes are scheduling around the same memory image",
i think you are referring to the i-cache. They are doing the same
"computation". Are they also processing the same data? If yes, can you
even imagine a real, useful application that comes close to that behavior
(all the CPUs doing the same thing on the same data)? And if not, it seems
clear to me that if you have N threads doing the same job, it's useless
to have N bigger than the number of the CPUs, in the general case,
unless they sleep for I/O every since and then. So the RQ is small anyway,
unless we're on a > 16 CPUs SMP...
> > The only thing you can do to get the scheduler play a major role, is
> > to have the threads do *nothing*, just release the CPU as soon as they
> > get scheduled.
>
> > I can't think of any real world application that will
> > come close to that behaviour, unless it's very badly designed (or very
> > badly used).
>
> If you have threads doing sendfile, or write from already mapped files,
> there will be relatively little icache footprint and it will be shared
> between the different threads.
>
> The dcache footprint will be identical whatever the server architecture.
Correct. It depends on the load (the requests).
> Threads or not. (Related request clustering not being considered
> here.., that's an orthogonal thing).
Server design has an impact on switch rate, though. It can be high or low.
Anyway, it says *nothing* on the RQ size. Let's assume the server is
overloaded, and thus the RQ is long.
I think you can shorten the RQ even it this case, optimizing the
application... and if you can't, either the hardware is overloaded
(and there's little you can do, software-side), or some other kernel
parts need to be optimized (assuming the application is not at fault).
It's not a scheduler problem anyway. So please don't change it. Give
us a better poll() , a clone_httpd_thread(), a do_apache_all(), or
whatever. But don't change the scheduler, which is not at fault.
> I have written a more coherent summary of the technical argument in
> another message, subject "On optimising the scheduler for large run
> queues".
Ok, I'll dive into it... B-)
But the matter is that operating a system with a large RQ it's usally
a mistake. So no need to optimize the scheduler for that... fix the problem
instead.
>
> be well,
> -- Jamie
>
TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo en ESI.it
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda