Code optimization <LEA Instruction>
Jamie Lokier
lkd en tantalophile.demon.co.uk
Sab Ene 29 04:30:16 CST 2000
Richard B. Johnson wrote:
> This is from GCC version 2.7.2.3, the exact utility that
> ../linux/Documentation/Changes requires.
>From another recent thread, we know that Linux 2.3.recent spits out a
few warnings about cache-aligned structures with GCC 2.7.2.3.
So I guess none of the people that write that code use GCC 2.7.2.3 :-)
> In most cases LEA is used where an offset is already known at compile-
> time. Note the stack operations using LEA and that the "pop" instruction
> uses the stack-pointer that has just been changed in value using LEA. This
> is a 3-clock penality.
>
> 333: 8d 65 e0 lea 0xffffffe0(%ebp),%esp
> 336: 5b pop %ebx
The AGI occurs when address calculation is dependent on the prior
instruction, in the pipeline. There is no AGI in this example.
In fact AGIs are precisely because address calculation is done earlier
than ALU arithmetic, so in this example the use of "lea" actually helps!
(Aside: %esp is treated specially anyway, so additional rules apply
where "push" and "pop" are concerned).
Quite apart from all that, the alternative is three instructions long and
two bytes longer:
0: 89 e5 mov %esp,%ebp
2: 83 c5 e0 add $0xffffffe0,%ebp
5: 5b pop %ebx
Everything tells me "lea" is better than "add" here...
> 667: 8d 65 e0 lea 0xffffffe0(%ebp),%esp
> 66a: 5b pop %ebx
etc., etc. all these are smaller and faster with "lea". We win :-)
> 00000a08 <schedule>:
> d78: 8d 1c 9d 0d 00 00 00 lea 0xd(,%ebx,4),%ebx
> d7f: 8d 0c dd 00 00 00 00 lea 0x0(,%ebx,8),%ecx
These are multiply-accumulate operations. GCC decided that's the
fastest way to say %ecx = 32 * %ebx + 104. There is an AGI stall as
there so it would be nice if appropriate scheduling would insert
something in the middle, but it's still faster than using adds and shifts.
> dad: 8d 76 00 lea 0x0(%esi),%esi ! NOP
It's as fast as a nop too, provided nobody wrote to %esi in the
preceding cycle. It's done that way because it's a faster way to say
"nop 3 bytes long" for jump target alignment.
> fda: 8d 5c 1e 01 lea 0x1(%esi,%ebx,1),%ebx
> fde: 89 5d cc mov %ebx,0xffffffcc(%ebp)
This means %ebx = %esi + %ebx + 1.
You'd have to use two instructions to get the same result with "add".
There is no AGI stall between the two instructions because the
instruction's address calculation is not dependent on the first. There
may be an AGI due to stalls between dependent pairs on a Pentium, but
you haven't shown enough context for me to say.
You show lots of these. An AGI due to "lea" occurs when you have the
lea _after_ the instruction that sets a register used in the address.
On a Pentium, due to pairing, there can be two instructions in between.
> 1115: 8d 76 00 lea 0x0(%esi),%esi
> 1118: 8b 71 38 mov 0x38(%ecx),%esi
That's ok. There's no AGI between these two because they're not dependent.
> 2e98: 8d 87 1a 03 00 00 lea 0x31a(%edi),%eax
> 2e9e: 50 push %eax
The first instruction's equivalent using "add" is two instructions long,
involving a move and an add. It would be larger code and in most cases
slower. That's because %edi and %eax are not the same register. "lea"
is the x86 way of saying "add A to CONSTANT plus B times POWEROF2 and
store in C". "add" can only do "add A to B".
> 2f24: 8d 90 08 fb ff ff lea 0xfffffb08(%eax),%edx
> 2f2a: 29 fa sub %edi,%edx
There's no AGI here. Max speed if the pairing state is working nicely.
> 2fe5: 8d 75 ec lea 0xffffffec(%ebp),%esi
> 2fe8: 56 push %esi
> 2fe9: 8d 87 ac 04 00 00 lea 0x4ac(%edi),%eax
> 2fef: 50 push %eax
>
> 2ff5: 8d 5d d8 lea 0xffffffd8(%ebp),%ebx
> 2ff8: 53 push %ebx
>
> 2ff9: 8d 87 b4 04 00 00 lea 0x4b4(%edi),%eax
> 2fff: 50 push %eax
>
> 304c: 8d 65 cc lea 0xffffffcc(%ebp),%esp
> 304f: 5b pop %ebx
No stalls in any of these either. We still win :-)
All this, and I believe GCC 2.7.2.3 doesn't even know about AGIs. From
your examples, it seems to work out nicely without considering them.
have a radiant day,
-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda