shm bugs revisited

Manfred Spraul manfreds en colorfullife.com
Jue Ene 27 19:32:59 CST 2000


Christoph Rohland wrote:
> 
> Hi folks,
> 
> I am still trying to hunt down the shm bugs I am experiencing under
> high smp load. I reported kernel oopses during swapping.
> 
> But I now realized that errors occur without even swapping. My test
> program shows inconsistent data on rereading the segment.
> 

I think I found a possible source: there's a tiny window where the tlb's
are not flushed properly. The window is about as large as the
"current->active_mm == NULL" window, ie. you could find it on your
8-way box:

CPU0:			CPU1
[shm test program]
			[page stealer]
switch_mm()
* changes %cr3
* sets "next_mm->cpu_vm_mask"

			flush_tlb_page()
			* resets "current->cpu_vm_mask"
			* sends an IPI to all cpu's.
* the IPI arrives, but "current" still points to the old thread
* "next_mm->cpu_vm_mask" is not updates.
* switch_to() changes current.
* the cpu returns to user space.

			flush_tlb_page()
			* "current->cpu_vm_mask" is zero
                          (at least bit0 is zero)
                          no IPI.
			* cpu0 is not flushed
* the cpu could use outdated tlb entries.

I'm working on a solution. I think the tlb flush interrupt must not
access "current->{active_,}mm", because %cr3 and %esp are not exchanged
in one atomic operation.

--
	Manfred


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda