Interesting analysis of linux kernel threading by IBM

Dom Ene 23 17:09:05 CST 2000

On Sat, 22 Jan 2000, Sean Hunter wrote:
> > You can state that I don't have benfits in uniprocessor systems.
> > But I have in SMP, that is, IMHO, the future of computing technology.
> > You can find the needs of parallelism starting from CPUs executions units up
> > to
> > complex software systems up to daily work organization.
> > And if the OS is the bottleneck of parallelism we must try to improve it,
> > not to avoid multithreading.
> 
> No, yet again the _app_ design you propose is the bottleneck.

I'll respect Your opinion but the mine is different.

> A java guy contributed to this thread a while ago, going "Well, say
> you want to take advantage of 128 processors.  You'll want to have 128
> threads right there".  Err, not if you want good performance and not
> if your task doesn't suit that level of parallelism.  This is, to
> paraphrase Alan Cox "My programming sucks, fix the language. Err...
> my programming still sucks, fix the kernel"

The simplest way to discredit a technology is to push it at the extremes.

> You need to do the design by looking at nature of the task, rather
> than enforcing a preconcieved C-S paradigm (threads, or <insert
> buzzword here>) on it.  Look at how your task is structured.  In the
> above, the six tasks you have listed are not possible in parallel.

Are we speaking about the same thing ?

> For instance, frame output can't happen until the other tasks have all
> completed.  You can't do the ray-trace for the illumination until you
> know what the texture's like (bumpy, soft, clear, reflective etc)
> otherwise your illumination will be wrong.  You can't map a texture
> until you know what the thing you're mapping onto is structured like,
> so you can't do the texture until the first three tasks are
> completed. etc etc

First of all as I've said at least three times I'm speaking about scanline
renderers.
Ray tracing impose a continuos tree traversions to find objects intersections.
In a single quantum of pipeline is done :

1) 3D triangles production
2) 2D triangles production
3) Scan lines production
4) Texturing
5) Illumination
6) Frame output

in different processors.

> 
> As such, all the time you spend cloning off and synchronising those
> threads is pure waste.  You might as well chuck those cycles into the
> bin.  That's not the sort of performance compromise the high-perf guys 
> I know would want to live with.
> Secondly, the threads in your example would spend most of their lives
> blocked, waiting for the other threads to finish, which would not lead
> to the long runqueues you postulate.
> 

Processes are created at rendering startup and not fired continuosly during
the render processing.
Yes, You're right, this threads must be syncronized, but a single lock is
accessed ( read cache invalidates ) only by two tasks.

[STEP1] -- LOCK -- [STEP2] -- LOCK -- [STEP3] .....

Therefore only two processors caches are invalidated.

> Now, what you _could_ probably do, is divide the image up into regions
> or objects and do bits of the first four tasks you mentioned in
> parallel over those regions, then do the illumination in one step
> (this _could_ be seperated into parallell tasks, but you'd need to be
> pretty sure this was a win) when they've all completed (because
> otherwise you'd have problems with the boundaries of the regions), and
> output the frame.  These divisions would most likely be seperate
> _processes_ rather than threads, so they can run on different
> machines.  This gets rid of cache/processor affinity issues, and
> ensures that your boxes all have about load avg 1 (ie optimal work).
> I belive this is similar to what some "rendering farms" do.
> 
> There is _no_ benefit to just blindly forking off loads of threads,
> other than to spend most of your time synchronising them, scheduling
> them etc.  You'll totally trash your cache (even _if_ your task is
> massively parallel), and whatever scheduler design you choose, you may
> as well prompt the user for what task to schedule next for all the
> difference it'll make compared to a decent app design.
> 
> Poor design leads to poor performance.  Always has, always will.

To avoid repeating the same things read the answer to Larry.

Davide.

-- 
All this stuff is IMVHO

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/