shm bugs revisited
Manfred Spraul
manfreds en colorfullife.com
Jue Ene 27 19:32:59 CST 2000
Christoph Rohland wrote:
>
> Hi folks,
>
> I am still trying to hunt down the shm bugs I am experiencing under
> high smp load. I reported kernel oopses during swapping.
>
> But I now realized that errors occur without even swapping. My test
> program shows inconsistent data on rereading the segment.
>
I think I found a possible source: there's a tiny window where the tlb's
are not flushed properly. The window is about as large as the
"current->active_mm == NULL" window, ie. you could find it on your
8-way box:
CPU0: CPU1
[shm test program]
[page stealer]
switch_mm()
* changes %cr3
* sets "next_mm->cpu_vm_mask"
flush_tlb_page()
* resets "current->cpu_vm_mask"
* sends an IPI to all cpu's.
* the IPI arrives, but "current" still points to the old thread
* "next_mm->cpu_vm_mask" is not updates.
* switch_to() changes current.
* the cpu returns to user space.
flush_tlb_page()
* "current->cpu_vm_mask" is zero
(at least bit0 is zero)
no IPI.
* cpu0 is not flushed
* the cpu could use outdated tlb entries.
I'm working on a solution. I think the tlb flush interrupt must not
access "current->{active_,}mm", because %cr3 and %esp are not exchanged
in one atomic operation.
--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda