Code optimization <LEA Instruction>

Jamie Lokier lkd en tantalophile.demon.co.uk
Vie Ene 28 09:04:32 CST 2000


Richard B. Johnson wrote:
> The following program clearly shows that many more 'addl' instructions
> may be executed than 'leal' instructions within a given time.

Your test is wrong because the particular `leal' instructions you use
are dependent.  The delay you are seeing is due to pipeline scheduling,
not the instruction by itself.  If you use the instruction in a
different context, you will find it is fast.

There's a section on Address Generation Interlock stalls which affects
Pentiums and 486s.  See _any_ 486 or Pentium optimisation book...  AGI
is an important scheduling detail.  I think modern GCC knows about it
and schedules accordingly.  Certainly, PGCC does.

> 	    "\tleal 2(%eax), %eax\n"   /* 10 leals */
> 	    "\tleal 2(%eax), %eax\n"
> 	    "\tleal 2(%eax), %eax\n"
> 	    "\tleal 2(%eax), %eax\n"
> [...]

On a 486, try writing

	leal 2(%eax),%eax
	leal 2(%ebx),%ebx
	leal 2(%eax),%eax
	leal 2(%ebx),%ebx

etc. instead.

On a Pentium, try writing

	leal 2(%eax),%eax
	leal 2(%ebx),%ebx
	leal 2(%ecx),%ecx
	leal 2(%edx),%edx
	leal 2(%eax),%eax
	leal 2(%ebx),%ebx
	leal 2(%ecx),%ecx
	leal 2(%edx),%edx

instead.  Then go and read about AGIs.

You're right that `leal' is not always faster than `addl', because
`addl' doesn't have the AGI stall.  But it does depend on context.

I think you should still apologise to Alan anyway :-)

And Alan should apologise for being so terse, because it does depend on
context and he didn't say so :-)

have a nice day
-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda