Code optimization <LEA Instruction>
Jamie Lokier
lkd en tantalophile.demon.co.uk
Vie Ene 28 09:04:32 CST 2000
Richard B. Johnson wrote:
> The following program clearly shows that many more 'addl' instructions
> may be executed than 'leal' instructions within a given time.
Your test is wrong because the particular `leal' instructions you use
are dependent. The delay you are seeing is due to pipeline scheduling,
not the instruction by itself. If you use the instruction in a
different context, you will find it is fast.
There's a section on Address Generation Interlock stalls which affects
Pentiums and 486s. See _any_ 486 or Pentium optimisation book... AGI
is an important scheduling detail. I think modern GCC knows about it
and schedules accordingly. Certainly, PGCC does.
> "\tleal 2(%eax), %eax\n" /* 10 leals */
> "\tleal 2(%eax), %eax\n"
> "\tleal 2(%eax), %eax\n"
> "\tleal 2(%eax), %eax\n"
> [...]
On a 486, try writing
leal 2(%eax),%eax
leal 2(%ebx),%ebx
leal 2(%eax),%eax
leal 2(%ebx),%ebx
etc. instead.
On a Pentium, try writing
leal 2(%eax),%eax
leal 2(%ebx),%ebx
leal 2(%ecx),%ecx
leal 2(%edx),%edx
leal 2(%eax),%eax
leal 2(%ebx),%ebx
leal 2(%ecx),%ecx
leal 2(%edx),%edx
instead. Then go and read about AGIs.
You're right that `leal' is not always faster than `addl', because
`addl' doesn't have the AGI stall. But it does depend on context.
I think you should still apologise to Alan anyway :-)
And Alan should apologise for being so terse, because it does depend on
context and he didn't say so :-)
have a nice day
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda