On optimising the scheduler for large run queues

Lun Ene 31 11:31:13 CST 2000

On Sat, 29 Jan 2000, Jamie Lokier wrote:

> I will summarise.
> 
>   Heavyweight kernel developers (no that doesn't mean >40 years old :-)
>   believe that, for all well designed real applications, scheduler
>   overhead is dominated by cache overhead.  Therefore optimising cache
>   overhead takes priority.
> 
> I agree.  Though I personally do not have figures to support it, I have
> the impression that others do.

I can't call myself a "kernel developer", and I don't have figures.
But i believe it anyway [someone told he measured it, and also gave
good explanations].

[...]
>   Multi-threaded I/O can be reduced to select() I/O.
> 
> This isn't true for Java at the application level.  You might call it a
> flaw in Java itself, and it probably is for now[1].  But there isn't a
> better alternative yet.  Rewriting Java "business objects" in C is not
> plausible.  User space threading is often a better idea in Java.  If
  ^^^^^^^^^
That's a Java problem. Not a Linux one.

> they worked perfectly run queues would not be an issue, but user space
> threads introduce their own overheads.[2]  Maybe dynamic compilers
> should try dynamically optimising thread switches outside the kernel.
> 
>   Real applications that are fully optimised for performance do not have
>   large run queues.
> 
> Apparently this is true.

They have many "good" properties. A good cache behaviour is one of the
most important. Avoiding unnecessary (kernel) schedules is another.
Both are difficult to obtain.

>   Therefore optimising the scheduler for large run queues, at a cost for
>   small run queues (in maintenance, footprint and overhead) is
>   counterproductive for the most critical cases.
> 
> Here is a logical reasoning error.[3]  By the kernel heavyweights.
> 
> The overhead of a scheduler changes is *only* relevant to high switching
> rate applications.  And that doesn't include purely select() based
> single-threaded monolithic servers.
> 
> No-one has shown any real, well designed, well tuned, short run queue
> applications that have a high switching rate!
>
> They certainly haven't used such examples in their arguments!  They've
> used other examples.  That's why it's a reasoning error: Those examples
> aren't relevant!

Good point. It's true for me, at last.

> Now, I expect there are examples of real, well designed, well tuned
> applications that switch very often.
> 
> Until someone demonstrates that *those* applications have small run
> queues, and only those, then we have to consider the large run queue
> patches seriously.  Remember the other applications, including

I'm sorry, Jamie, but this is *your* reasoning error. You should 
demonstrate that real, well designed, well tuned applications
that switch very often (provided they exist: they lack one of the 
"good" properties, low swiching rate, so they may be not "so well 
designed and well tuned") also have long RQs.

In other words, I won't call an application that has both high
switch rate and causes a long RQ "well designed, well tuned".
I can hardly think of such an application which has a good cache
behaviour at the same time (that's my impression). 

So I think that an application that has both high switch rate and
long RQ is NOT "well designed, well tuned", and you should optimize it.

[...]

> Dear experienced kernel developers, please give examples of real world
> application, properly written for performance (as you like), that would
> be adversely affected by the proposed scheduler changes.

That's a good point.

But:
you failed to show any real world application, properly written for
performance (as we ALL like), that would take advantage of the proposed
scheduler change.

If you show me a bad written one, I can do the same. B-)

Unless someone more experienced than me has something to say about it,
the point now is:

- we can't show any good application that will take advantage of the patch;
- we can't show any good application that will be adversely affected by it.

So, right now, IMHO there are no reasons to consider the patch...

[ this leaves out >16 CPUs systems... it should be considered for them,
  if numbers show some real improvement for RQ around 20 ]

TM.

-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo en ESI.it

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/