<linux/linkage.h> generates incorrect cache alignments for 486and above

Jamie Lokier lkd en tantalophile.demon.co.uk
Vie Ene 28 08:41:37 CST 2000


[Linus, you're Cc'ed because this refers to a previous discussion with
you, and because there's a suggested optimisation to i386 entry.S at the end]

Chris Sears wrote:
> So this is where the 16 comes from.  These ifetch blocks still get
> fetched from the cache.  Ya wanna see a bad example?  This is from entry.S
> 
> ENTRY(system_call)
>         ...
>         movl %eax,EAX(%esp)             # save the return value
>         ALIGN
>         .globl ret_from_sys_call
>         .globl ret_from_intr
> ret_from_sys_call:
>         movl SYMBOL_NAME(bh_mask),%eax
>         ...
> 
> This is the output from readelf -i 1 entry.o
> 
>         0x000000f4  movl        %eax,0x18(%esp)
>         0x000000f8  nop
>         0x000000f9  leal        0x0(%esi),%esi
>         0x00000100  movl        0x00000000,%eax
>         0x00000105  andl        0x00000000,%eax
> 
> What is this "leal" junk?  That there is one very large nop.
> The price of the alignment is two nop instructions.
> If ALIGN were set to the cache line size it would be the same
> because system_call to the ALIGN is very near two cache lines
> already.
> 
> This one is a tough call.  Two nops in the straightline code vs
> mis-alignment in the shared code.  Ok, one nop,
> they would be paired in the UV pipeline.  Probably leave it be.

Linus already gave his opinion on this months ago: the ALIGN should go
because syscalls are the most common path.  However, he didn't remove it.

On a Pentium as you say, the alignment takes only 1 cycle for the paired
nop.  It is probably less of a hit than the misaligned jump in the page
fault case.

Especially as the jump is to large instructions (10 bytes for the pair,
but I don't know if it makes any difference as I don't know if the
Pentium's decode can work with partial ifetch blocks).

I'd guess the two most common paths are system calls and page faults.
Some applications will fault often, other will call syscalls a lot.
That's my guess (I haven't measured anything).

So on balance I'd leave the ALIGN there.  Even though the exit code is
very small, most application do syscalls _and_ page faults, so even
duplicating a single cache line will sometimes mean one more cache line
of the application to reload later.  And we know that a cache line takes
longer to load than a paired nop takes to execute.

Having said that, a cycle or two can be shaved of the page_fault path by
making it the case that falls through to error_code instead of
divide_error.  I'm sure that page_fault is a lot more common than
divide_error, so that would seem a sensible tweak.

have a nice day,
-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda