2.2.10-14 i686 SMP: IDE RAID-5 array hangs on mount

Adam C Powell IV hazelsct en mit.edu
Lun Ene 24 20:34:43 CST 2000


Khimenko Victor wrote:

> In <3888CD6E.29D407EB en mit.edu> Adam C Powell IV (hazelsct en mit.edu) wrote:
> AI> Apologies for the delay, I've been having some email trouble.  Future followups
> AI> will be a lot quicker.
>
> AI> Khimenko Victor wrote:
>
> >> In <3883285B.450536A en mit.edu> Adam C Powell IV (hazelsct en mit.edu) wrote:
> >> > Greetings,
> >>
> >> > I have a RAID-5 array across 10 GB partitions on four 17 GB IDE drives
> >> > at hda-d, and a RAID-0 array across 5 GB partitions on the same drives,
> >> > on a dual-Celeron ABIT BP6 motherboard (haven't installed the recent
> >> > BIOS upgrade).
> >>
> >> > I've built SMP kernels from 2.2.10 and 2.2.13 source with gcc-2.95, and
> >> > then 2.2.13 and 2.2.14 with gcc-2.7.2.3, all using the standard Debian
> >> > .config except 686 and SMP (well, approximately the Debian 2.2.13 config
> >> > for 2.2.14).
> >>
> >> That is: you are using broken RAID implementation and expect it to work
> >> somehow ?
>
> AI> Uh, there's a broken RAID implementation in a stable kernel?
>
> Not exactly "broken". More like "unmaintained". It works for some peoples but
> there are LOTS of problems with it :-/ And noone bother to fix them since
> there are exist version 0.90 :-)
>
> >> > For all of these kernels, the machine always hangs while mounting the
> >> > RAID-5 array.  It never hangs while mounting the RAID-0 array, which
> >> > happens to come before it.  And it never hangs under the non-SMP 386
> >> > Debian kernel images.
> >>
> >> > Are there any known races which got into 2.2.14?  I haven't yet tried
> >> > 2.2.15-pre, is there any reason to believe it might solve the problem?
> >>
> >> It will not solve problem. RAID as it is in 2.2.x kernels us unstable and
> >> unmaintained. You REALLY should use raid patches. There are some
> >> incompatibilities with 2.2.x RAID implementation and latest RAID patches
> >> and thus it's not going in 2.2.x but if you need working raid (and you do
> >> hot have one) you SHOULD NOT use stock 2.2.x RAID. Use one from
> >> http://people.redhat.com/mingo/raid-2.2.14-B1
>
> AI> Hmm, I sense a contradiction here: 2.2.x RAID is incompatible with the latest
> AI> RAID patches and so will stay in, but 2.2.x RAID is unstable and unmaintained.
> AI> If you don't mind my asking, what kind of logic motivated that policy?
>
> 1. If you are lucky and have working RAID based on stock 2.2.x (for example
> RAID 0 :-) you should be able to upgrade to 2.2.14 without big hassle.
> So upgrade to RAID 0.90 in mainstream kernel posponed to 2.4 ...

Thanks, I will do that as soon as possible.

> 2. RAID 0.90 need some changes in some important kernel structures and such
> changes will affect even users without RAID.
>
> RedHat 6.1 includes RAID patches anyway so I'm not sure if 2. still can be
> considered seriously.

Cool, then maybe they'll be in 2.2.15?

> AI> But seriously.  So what you're telling me is that I need to back up all of my
> AI> data, install a patched kernel (and Debian raidtools2), and completely wipe and
> AI> reinstall the RAID arrays, right?  Are there any tools to translate old RAID
> AI> arrays into new ones?
>
> AFAIK you do not need to backup data and reinstall it (I can be wrong here).
> You should install new RAID tools and accomodate configuration files but
> actual contents of RAID array do not need to be reinstalled.

Thank you, this is very comforting news!  I'll back up anyway, but it's very good to
know that the old arrays should mount.

> AI> Please seriously consider the attached patch for 2.2.15.  It will save a lot of
> AI> time and grief for RAID amateurs like myself.
>
> RAID amateurs will use RedHat 6.1 with patched kernel :-)

Okay, then RAID amateurs who use any other distro (okay, perhaps SuSE and TurboLinux
have the patches...), or RAID amateurs who download the latest stock kernel.  When
2.2.15 comes out, should such amateurs be expected to know to wait for RedHat 6.2 (or
SuSE whatever)?

Thank you very much for the help, I appreciate it very much.  But I do still think
there needs to be some change in the way RAID is done in the stable kernel.  Either it
should be patched to use RAID 0.90, or there should be VERY LOUD WARNINGS in many
places, including drivers/block/Config.in, Documentation/Configure.help and
Documentation/md.txt (and in Debian, in /usr/share/doc/raidtools/DO-NOT-USE), telling
people not to expect what's there to work.

I cannot tell you how frustrated this has made me over the last three months, when
kernel after kernel just failed.  I tried different IDE options, I tried different
compilers, and each time my RAID array took more than seven hours to ckraid --fix
(which was not automatic in Debian potato until about a month ago, so the downtime was
often a lot longer), during which /home for my entire research group- including
everyone's web pages- was unavailable.  After the third or fourth failure, I went to
the debian-user mailing list, where the advice given was to just use gcc 2.7.2, which
of course did nothing for me, and I asked again and nobody had any clue.

I've been enjoying Linux for long enough on a wide enough variety of platforms- and
have had bad enough experiences with NT (really bad, hate it, hate it, hate it!)- that
switching has not been an option.  But the value of the hours and hours of lost time-
several days in total- made me *very* seriously reconsider this- this has been almost
as bad as the NT nightmares.  It probably would have even been worth the extra price
of Solaris/Sparc!

Linux can be one of two things:

  1. A professional system whose stable kernels just work.
  2. A hobbyist/geek system which requires patches to make things work, and doesn't
     even say so anywhere in the documentation, one must "just know" these things.

My impression has been that Linux aims for #1, but it seems I am wrong, or at the very
least, the RAID policy has been pure #2.  If it is about to be fixed, that's great,
but for future reference, quality control issues like this matter a whole lot to a
great many people.  I don't expect it to be perfect- we are human after all- but where
there are known problems or unmaintained sections of the kernel, please document them
far and wide.

Still disappointed, but grateful for the help!

             Adam Powell                     http://lyre.mit.edu/~powell/
             Thomas B. King Assistant Professor of Materials Engineering
             77 Massachusetts Ave. Rm. 4-117         Phone (617) 452-2086
             Cambridge, MA 02139 USA                   Fax (617) 253-5418


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo en vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



Más información sobre la lista de distribución Ayuda