Scheduler problem in 2.2.1[34..]

Andris Pavenis andris en stargate.astr.lu.lv
Vie Ene 28 04:41:31 CST 2000


I was getting kernel oopses in average some times per day when XFree86
was running (and much more seldom otherwise) in schedule(). 

It was for kernels 2.2.12, 2.2.13, 2.2.14 and also last prereleases of
2.2.15 (I'm built now 2.2.15pre5, but the data below are from 2.2.15pre4
with Rik's fix for mm/page_alloc.c (I'm getting also many gfp messages but 
they doesn't resolve to anything usefull). I have posted data about oopses
earlier this month. It seems that also compiler I'm using to build kernel
doesn't mater (these data are for kernel built with gcc-2.7.2.3, but I
have tried also egcs-1.1.2 and gcc-2.95.2 and didn't saw any differences)

After that I patched del_from_runqueue() to verify argument (similary as it was 
done in 2.0.3X kernels). Below is related output in log file:

Jan 27 21:21:05 hal kernel: del_from_runqueue(C03EE000) : Task not in run queue
Jan 27 21:21:05 hal kernel: prev_run=00000000  next_run=00000000  state=1  flags=0  nr_running=2
Jan 27 21:21:05 hal kernel: prev=c0db4000  next=c03b2000  pid=205
Jan 27 21:21:05 hal kernel: current=c03ee000  current->pid=205 current->state=1 current->flags=0
Jan 27 22:21:05 hal kernel: del_from_runqueue(C03EE000) : Task not in run queue
Jan 27 22:21:05 hal kernel: prev_run=00000000  next_run=00000000  state=1  flags=0  nr_running=2
Jan 27 22:21:05 hal kernel: prev=c0db4000  next=c03b2000  pid=205
Jan 27 22:21:05 hal kernel: current=c03ee000  current->pid=205 current->state=1 current->flags=0

It seems that schedule() tries to remove current task from runqueue twice
due to some problem (I don't know why). Without included patch I got oops.
This patch of course doesn't fix real problem but only avoids crashes.
Practically always affected process is maudio (from KDE-1.1.2) but
sometimes also kwm (also KDE-1.1.2). At least it seems that maudio is in
usable state after this thing happens.

At least after patching kernel/schedule.c I got no more oopses.

  PID TTY      STAT   TIME COMMAND
  205 ?        S      0:02 maudio -media 1
  
I'm including below patch for kernel/sched.c I used 

What I should to further to debug this problem? Is it possible to get
kernel stack trace without oops? 

Andris

PS. I'm not subscribed to kernel mailing list so please send answers 
    also to me 

======================================================================
*** linux-2.2.15pre4/kernel/sched.c~1	Tue Jan  4 20:12:25 2000
--- linux-2.2.15pre4/kernel/sched.c	Wed Jan 26 10:04:57 2000
***************
*** 380,385 ****
--- 380,397 ----
  	struct task_struct *next = p->next_run;
  	struct task_struct *prev = p->prev_run;
  
+ 	if (!prev || !next)
+ 	  {
+ 	      printk ("del_from_runqueue(%08X) : Task not in run queue\n",p);
+ 	      printk ("prev_run=%p  next_run=%p  state=%X  flags=%X  nr_running=%d\n",
+ 	              prev, next, p->state, p->flags, nr_running);
+ 	      printk ("prev=%p  next=%p  pid=%d\n",
+ 	              p->prev_task, p->next_task, (int) p->pid);
+ 	      printk ("current=%p  current->pid=%d current->state=%X current->flags=%X\n",
+ 	               current,current->pid,current->state,current->flags);
+ 	      return;
+ 	  }
+ 
  	nr_running--;
  	next->prev_run = prev;
  	prev->next_run = next;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda