Please review all modules for unload races
Keith Owens
kaos en ocs.com.au
Dom Ene 30 02:15:34 CST 2000
Would kernel developers and maintainers please review all of their code
that can be loaded as a module. There are two requirements for module
code which AFAICT have never been explicitly documented, consider this
mail to be that missing documentation. Unfortunately some code is
violating these requirements and is open to races which is why I need
everybody to review and verify that their module code is correct.
First the requirements for modules then the task list.
Coding requirements for modules to avoid unload races.
(1) All code and structures that are exported from a module must be
protected by the big kernel lock, either directly or indirectly.
The routines that load and unload modules run under the big kernel
lock and if the data they manipulate is not similarly protected
then modules can be unloaded because they appear to be not in use
even when they are being used.
(a) Code and structures can be explicitly exported via
EXPORT_SYMBOL. Access to these symbols is controlled by the
sys_init_module() and sys_delete_module() routines, both of
which hold the big kernel lock while they adjust symbol
reference counts. Anything accessed via EXPORT_SYMBOL is
automatically safe, provided that the higher level modules that
access those symbols are also safe.
(b) Code and structures can also be implicitly exported when they
are passed to a register_service() routine. For these symbols,
there is no automatic guarantee of locking, instead we have to
rely on correct coding techniques to preserve code and data
integrity.
The big kernel lock protection can be direct or indirect.
Direct protection is when the code holds the kernel lock while
it runs. For example, sys_open for foo does kernel_lock(),
foo_open(), kernel_unlock(). Anything that foo_open() does is
atomic wrt module loading and unloading, *** but only when
foo_open() is coded according to requirement 2 ***.
Some routines do not run under the kernel lock, usually because
it would cause too much contention. Instead the code requires
a kernel token of some kind and that token can only be obtained
from code that was protected (directly or indirectly) by the
big kernel lock. Indirect protection relies on the routine
that provides the token also doing MOD_INC_USE_COUNT for every
token it gives out.
For example, read and write do not get the kernel lock. These
routines can only be called with a valid file descriptor, that
fd comes from open which runs under the big lock. The open
routine does MOD_INC_USE_COUNT for every fd it assigns, the
close routine does MOD_DEC_USE_COUNT for each fd that is
retired.
Only when the module use count is zero can the module be
unloaded. The module use count has the same effect wrt module
loading and unloading as the big kernel lock but it allows code
to run without incurring the contention that the big kernel
lock would generate.
(2) The big kernel lock is automatically released when a process sleeps
in the kernel and is automatically reacquired on reschedule if the
process had the lock originally. Any code that can be compiled as
a module and is entered with the big kernel lock held *MUST*
increment the use count to activate the indirect module protection
before doing anything that might sleep.
In practice, this means that all routines that live in modules and
are invoked under the big kernel lock should do MOD_INC_USE_COUNT
as their very first action. And all failure paths from that
routine must do MOD_DEC_USE_COUNT before returning.
Many modules are violating this requirement by doing work that can
sleep before they issue MOD_INC_USE_COUNT, this code leaves itself
open to races. For example, this code is wrong
static int open_foo(struct inode *inode, struct file *file)
{
if (request_irq(FOO_IRQ, foo_interrupt, 0, "FOO", NULL))
return -EBUSY;
MOD_INC_USE_COUNT;
return 0;
}
request_irq calls kmalloc(,GFP_KERNEL) which can sleep. While it
is sleeping, open_foo loses the kernel lock, allowing other
routines such as delete_module to run. The module use count has
not been incremented yet so as far as delete_module is concerned,
the foo module is not in use and can be removed. But request_irq
will eventually return into the foo module after the module has
been deleted - Oops. The code must be rewritten this way
static int open_foo(struct inode *inode, struct file *file)
{
MOD_INC_USE_COUNT; /* must be first */
if (request_irq(FOO_IRQ, foo_interrupt, 0, "FOO", NULL)) {
MOD_DEC_USE_COUNT; /* failure path */
return -EBUSY;
}
return 0;
}
This is the only way to ensure that the code is protected by the
big kernel lock all the way through and therefore cannot be removed
in mid flight. This protection can be direct (holds the lock) or
indirect (module has a non-zero use count) but the indirect
protection (increment the use count) *MUST* be activated before
doing anything that can sleep.
The task list.
Find all the module routines that are
* called from outside the module and
* are accessed via register_xxx (ignore EXPORT_SYMBOLs) and
* do not require kernel tokens to run.
Typical routines are open, statfs and socket, basically anything that
takes a "name" instead of a "fd".
(1) Make sure that all these routines are invoked under the big kernel
lock. In most cases this is already being done, e.g. sys_open and
sys_statfs grab the lock before calling the module routine.
Unfortunately sys_socket does not get the big kernel lock which
means that any module routines called from socket are unsafe wrt
module unloading. For example, sys_socket -> sock_create ->
inet6_create is unsafe when IPv6 is a module - yes, I know that v6
does not really work as a module but the attempt was made and there
are still bits of module code in af_inet6.c. Other socket based
modules are also open to races in this area, is their any reason
that sock_create does not do lock_kernel() around the calls to the
network family create routine?
(2) Make sure that the first thing these module routines do is
MOD_INC_USE_COUNT and that all failure paths do MOD_DEC_USE_COUNT.
This check is particularly important because many modules have been
incorrectly coded on the assumption that MOD_INC_USE_COUNT can be
delayed until the end of the routine.
Keith Owens, modules maintainer
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Más información sobre la lista de distribución Ayuda