Discussion:
SATA (hot) swapping for backup
Ian Stakenvicius
2006-09-21 15:56:44 UTC
Permalink
Hey everyone -- I just stumbled on this thread and thought it might be
helpful to add some of my own experience.
1. If I boot the server with no drive in the bay or with the drive
power off, it is never detected, even after powering it on. Is
there a way to get that to work?
Yes there is. The issue here is that the scsi bus does not automatically
re-scan when something is plugged in. You can do this by manipulating
/sys/... or using the 'rescan_scsi_bus.sh' script (search the web for it) or
the 'scsiadd' tool (http://llg.cubic.org/tools/). The reason it doesn't
work automatically is because of the lack of kernel hotplug support in
libata and sata drivers prior to 2.6.18.
2. I'm not 100% sure that unmounting the drive, powering it off,
removing it, and putting a new disk in its place is legit. Can
anyone confirm? My motherboard *does* claim to support SATA
hotswap, but I'm not sure if Linux supports it.
Linux probably doesn't, save in the most recent kernels, and possibly
only with appropriate patches.
I mostly concurr on this. Kernel support for yank-the-drive hot-plug for
libata is very new and has just been included in 2.6.18 (with the
appropriate updates to the Promise TX/TX2 SATA driver), but that's it so
far. Now that this is done though, most of the sata controllers that do
support this (ie, those not based on the ICH series) will probably follow
suit.

Warm-plug support, however, has been around for quite some time, and any 2.6
kernel that supports your sata controller should allow you to do this
(afaik)..more on this later.
In my case I'm wondering what could possibly go wrong? If the drive
is completely unmounted before it is powered down and removed, it
seems as though the OS has no reason to be concerned with how/when I
plug it in. Any ideas?
[ Snip doomsday scenarios and such ]


While experimenting with my system (two drive RAID1 w/trays, Promise SATAII
controller, kernel 2.6.15), yanking one of the drives while mounted and
active in the raid had no ill effects wrt. to the hardware or linux
subsystem. However, linux did not detect the removal of the drive and
bogged right down with IO errors as it constantly tried to continue updating
the missing hdd.

It did this forever (i would have assumed that it would have stopped trying
after a while and thrown the device out of the raid, but it did not).

Since there was nothing in the driver or libata to catch the drive-removed
interrupt (which did fire i noticed, as there was some sort of error about
an interrupt not being caught), this is not surprising and hopefully 2.6.18
will fix that. So this would handle having the device removed from the
kernel, but I'm still not sure what will happen to the raid subsystem if a
device suddenly disappears though.
Also, if you don't stop the drive spinning before you pull it then you
have cut power to a disk in rotation.
Huh. I guess unmounting the drive isn't enough to stop the spindle.
No, though you can send the appropriate command sequences with sdparm or
hdparm to put the drive to sleep. That should stop it well enough to
help here.
You can also do this with manipulating /sys/.. and probably the 'scsiadd'
tool mentioned above.

I'm not sure what the issue is with stopping a drive in rotation, however??
If the drive isn't physically being moved while its spinning down, is that
really a problem?
Lastly, if there's any standard way to automate backup jobs (mounting
disks, rsync or whatever, unmounting, etc.) I'd appreciate a
reference. I can always use cron scripts but I imagine someone has
probably come up with something better.
udev can fire off arbitrary code on insertion of a device. You can use
that to trigger a script that will, basically, do all the work for you.
Are there "standard" scripts for this purpose, or will I be whipping
one up at home?
I haven't looked into this fully yet as to its applicability to a udev
system, but there is the 'scsirastools' package and its scripts
(http://scsirastools.sourceforge.net/), so that might give you something to
start from.

With my raid-1 system, for general ease (so my users do not need to interact
with any software), i do an automated cold-swap: I use ACPI w/the power
button to cleanly power down the server, swap the drive, then on bootup i
use the following script to add the degraded hdd into the array (md0 is made
up of sda1 and sdb1):

#!/bin/sh
if ! (cat /proc/mdstat |grep md0 |grep -c sda) >/dev/null; then
mdadm /dev/md0 -a /dev/sda1;
fi
if ! (cat /proc/mdstat |grep md0 |grep -c sdb) >/dev/null; then
mdadm /dev/md0 -a /dev/sdb1;
fi

Once I upgrade to 2.6.18 I'm going to look into the scsirastools scripts in
depth, as hopefully they will at least provide a basis for automating the
entire process. Unfortunately, I think multiple layers of custom scripts
may still be required though.

[ Snip ]
Yeah. I think, basically, SATA hot swap is still too new for me to want
to use it in production. The USB case, for which hot swap is years old
and well tested, is much more likely to be a success.
Sorry if that wasn't clear to you -- the driver, OS and controller
hardware need to be hot-swap capable for this to have a chance of
working even remotely reliably.
Sounds like I've one out of three at the moment. Probably falling
back to USB is my best bet in the near term. I need to get the system
going and a backup system in place -- that's far more important than
having the backups be super fast.
I'm trying to get true yank-the-drive hotplug working with my system, but
i'm using this on a RAID1. David, since your system seems to be a simple
plug-in/bring-drive-up/rsync/bring-drive-down/un-plug, you might want to
look into using a warm-plug method:

1. plug drive in
2. manipulate /sys/ or use 'scsiadd' to rescan the scsi bus (udev should
pick up the new drive)
3. mount
4. do your backing up
5. unmount
6. manipulate /sys/ or use 'scsiadd' to power down the drive and rescan the
scsi bus (udev should remove the drive's devnode)
7. unplug the drive

You would have to perform the commands interactively after plugging in the
drive and before unplugging it, but you certainly wouldn't need to bring
down the system to do it. And with 2.6.18 or higher (whenever your
controller's driver gets support) this process could be automated with
udev/hal/dbus/whatever.
David Abrahams
2007-04-28 21:38:08 UTC
Permalink
[Picking up this thread from long ago; sorry for the big gap!]
Post by Ian Stakenvicius
Hey everyone -- I just stumbled on this thread and thought it might be
helpful to add some of my own experience.
1. If I boot the server with no drive in the bay or with the drive
power off, it is never detected, even after powering it on. Is
there a way to get that to work?
Yes there is. The issue here is that the scsi bus does not automatically
re-scan when something is plugged in. You can do this by manipulating
/sys/... or using the 'rescan_scsi_bus.sh' script (search the web for it) or
the 'scsiadd' tool (http://llg.cubic.org/tools/). The reason it doesn't
work automatically is because of the lack of kernel hotplug support in
libata and sata drivers prior to 2.6.18.
http://linux-ata.org/software-status.html#hotplug seems to imply that
if I upgrade my server to Feisty (which uses a 2.6.20 kernel), it will
work.
Post by Ian Stakenvicius
2. I'm not 100% sure that unmounting the drive, powering it off,
removing it, and putting a new disk in its place is legit. Can
anyone confirm? My motherboard *does* claim to support SATA
hotswap, but I'm not sure if Linux supports it.
Linux probably doesn't, save in the most recent kernels, and possibly
only with appropriate patches.
I mostly concurr on this. Kernel support for yank-the-drive hot-plug for
libata is very new and has just been included in 2.6.18 (with the
appropriate updates to the Promise TX/TX2 SATA driver), but that's it so
far. Now that this is done though, most of the sata controllers that do
support this (ie, those not based on the ICH series) will probably follow
suit.
According to http://linux-ata.org/software-status.html#hotplug, as
long as I'm just hotplugging devices, I guess I should be OK?
Post by Ian Stakenvicius
Warm-plug support, however, has been around for quite some time, and
any 2.6 kernel that supports your sata controller should allow you
to do this (afaik)..more on this later.
Indeed.
Post by Ian Stakenvicius
In my case I'm wondering what could possibly go wrong? If the drive
is completely unmounted before it is powered down and removed, it
seems as though the OS has no reason to be concerned with how/when I
plug it in. Any ideas?
[ Snip doomsday scenarios and such ]
While experimenting with my system (two drive RAID1 w/trays, Promise SATAII
controller, kernel 2.6.15), yanking one of the drives while mounted and
active in the raid had no ill effects wrt. to the hardware or linux
subsystem. However, linux did not detect the removal of the drive and
bogged right down with IO errors as it constantly tried to continue updating
the missing hdd.
Yeah, but I was talking about unmounting it first.
Post by Ian Stakenvicius
It did this forever (i would have assumed that it would have stopped trying
after a while and thrown the device out of the raid, but it did not).
Since there was nothing in the driver or libata to catch the drive-removed
interrupt (which did fire i noticed, as there was some sort of error about
an interrupt not being caught), this is not surprising and hopefully 2.6.18
will fix that. So this would handle having the device removed from the
kernel, but I'm still not sure what will happen to the raid subsystem if a
device suddenly disappears though.
Huh. I wasn't thinking about hotplugging my RAIDs, it seems like
a natural thing to want to do in an emergency if you absolutely can't
turn your machine off. My machine doesn't have to be up quite so
continuously, though. I just wanted this for backups, and I don't
think swapping RAID drives in and out is a very good approach to
backup anyway.
Post by Ian Stakenvicius
Also, if you don't stop the drive spinning before you pull it then you
have cut power to a disk in rotation.
Huh. I guess unmounting the drive isn't enough to stop the spindle.
No, though you can send the appropriate command sequences with sdparm or
hdparm to put the drive to sleep. That should stop it well enough to
help here.
You can also do this with manipulating /sys/.. and probably the 'scsiadd'
tool mentioned above.
I'm not sure what the issue is with stopping a drive in rotation, however??
If the drive isn't physically being moved while its spinning down, is that
really a problem?
IIUC there are all kinds of issues about how the drive deals with losing
power. Leaving the drive to do emergency shutdown instead of telling
it to park the heads is supposed to reduce the lifetime or something
like that.
Post by Ian Stakenvicius
Lastly, if there's any standard way to automate backup jobs (mounting
disks, rsync or whatever, unmounting, etc.) I'd appreciate a
reference. I can always use cron scripts but I imagine someone has
probably come up with something better.
udev can fire off arbitrary code on insertion of a device. You can use
that to trigger a script that will, basically, do all the work for you.
Are there "standard" scripts for this purpose, or will I be whipping
one up at home?
I haven't looked into this fully yet as to its applicability to a udev
system, but there is the 'scsirastools' package and its scripts
(http://scsirastools.sourceforge.net/), so that might give you something to
start from.
With my raid-1 system, for general ease (so my users do not need to interact
with any software), i do an automated cold-swap: I use ACPI w/the power
button to cleanly power down the server, swap the drive, then on bootup i
use the following script to add the degraded hdd into the array (md0 is made
#!/bin/sh
if ! (cat /proc/mdstat |grep md0 |grep -c sda) >/dev/null; then
mdadm /dev/md0 -a /dev/sda1;
fi
if ! (cat /proc/mdstat |grep md0 |grep -c sdb) >/dev/null; then
mdadm /dev/md0 -a /dev/sdb1;
fi
Once I upgrade to 2.6.18 I'm going to look into the scsirastools scripts in
depth, as hopefully they will at least provide a basis for automating the
entire process. Unfortunately, I think multiple layers of custom scripts
may still be required though.
How'd it work out for you?
Post by Ian Stakenvicius
Sounds like I've one out of three at the moment. Probably falling
back to USB is my best bet in the near term. I need to get the system
going and a backup system in place -- that's far more important than
having the backups be super fast.
I'm trying to get true yank-the-drive hotplug working with my system, but
i'm using this on a RAID1. David, since your system seems to be a simple
plug-in/bring-drive-up/rsync/bring-drive-down/un-plug, you might want to
1. plug drive in
2. manipulate /sys/ or use 'scsiadd' to rescan the scsi bus (udev should
pick up the new drive)
3. mount
4. do your backing up
5. unmount
6. manipulate /sys/ or use 'scsiadd' to power down the drive and rescan the
scsi bus (udev should remove the drive's devnode)
7. unplug the drive
You would have to perform the commands interactively after plugging in the
drive and before unplugging it, but you certainly wouldn't need to bring
down the system to do it. And with 2.6.18 or higher (whenever your
controller's driver gets support) this process could be automated with
udev/hal/dbus/whatever.
Unfortunately the worlds of "manipulate /sys/" and
"udev/hal/dbus/whatever" are still somewhat opaque to me.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Don't Miss BoostCon 2007! ==> http://www.boostcon.com
Loading...