[gpfsug-discuss] Multipath configurations
Orlando Richards
orlando.richards at ed.ac.uk
Thu Sep 19 15:53:12 BST 2013
On 16/09/13 16:25, Orlando Richards wrote:
> Hi folks,
>
> We're building a new storage service and are planning on using
> multipathd rather than LSI's rdac to handle the multipathing.
>
> It's all working well, but I'm looking at settling on the final
> parameters for the multipath.conf. In particular, the values for:
>
> * rr_min_io (1?)
> * failback (I think "manual" or "followover"?)
> * no_path_retry (guessing here - fail?)
> * dev_loss_tmo (guessing here - 15?)
> * fast_io_fail_tmo (guessing here - 10?)
>
> Does anyone have a working multipath.conf for LSI based storage systems
> (or others, for that matter), and/or have experience and wisdom to share
> on the above settings (and any others I may have missed?). Any war
> stories about dm-multipath to share?
>
>
Hi all,
Thanks for the feedback on all this. From that, and more digging and
testing, we've settled on the following multipath.conf stanzas:
path_grouping_policy group_by_prio
prio rdac
path_checker rdac
path_selector "round-robin 0"
hardware_handler "1 rdac"
features "2 pg_init_retries 50"
# All "standard" up to here
# Prevent ping-ponging of controllers, but
# allow for automatic failback
failback followover
# Massively accelerate the failure detection time
# (default settings give ~30-90 seconds, this gives ~5s)
fast_io_fail_tmo 5
# Keep the /dev device entries in situ for 90 seconds,
# in case of rapid recovery of paths
dev_loss_tmo 90
# Don't queue traffic down a failed path
no_path_retry fail
# balance much more aggressively across the active paths
rr_min_io 1
The primary goal was to have rapid and reliable failover in a cluster
environment (without ping-ponging). The defaults from multipathd gave a
30-90 second pause in I/O every time a path went away - we've managed to
get it down to ~5s with the above settings.
Note that we've not tried this "in production" yet, but it has held up
fine under heavy benchmark load.
Along the way we discovered an odd GPFS "feature" - if some nodes in the
cluster use RDAC (and thus have /dev/sdXX devices) and some use
multipathd (and thus use /dev/dm-XX devices), then the nodes can either
fail to find attached NSD devices (in the case of the RDAC host where
the NSD's were initially created on a multipath host) or can try to talk
to them down the wrong device (for instance - talking to /dev/sdXX
rather than /dev/dm-XX). We just set up this mixed environment to
compare rdac vs dm-multipath, and don't expect to put it into production
- but it's the kind of thing which could end up cropping up in a system
migrating from RDAC to dm-multipath, or vice versa. It seems that on
creation, the nsd is tagged somewhere as either "dmm" (dm-multipath) or
"generic" (rdac), and servers using one type can't see the other.
We're testing a workaround for the "dm-multipath server accessing via
/dev/sdXX" case just now - create the following (executable, root-owned)
script in /var/mmfs/etc/nsddevices on the dm-multipath hosts:
#!/bin/ksh
#
# this script ensures that we are not using the raw /dev/sd\* devices
for GPFS
# but use the multipath /dev/dm-\* devices instead
for dev in $( cat /proc/partitions | grep dm- | awk '{print $4}' )
do
echo $dev generic
done
# skip the GPFS device discovery
exit 0
except change that simple "$dev generic" echo to one which says "$dev
mpp" or "$dev generic" depending on whether the device was created with
dm-multipath or rdac attached hosts. The reverse also likely would work
to get the rdac host to pick up the dm-multipath created nsd's (echo
$dev mpp, for the /dev/sdXX devices).
Thankfully, we have no plans to mix the environment - but for future
reference it could be important (if ever migrating existing systems from
rdac to dm-multipath, for instance).
--
--
Dr Orlando Richards
Information Services
IT Infrastructure Division
Unix Section
Tel: 0131 650 4994
skype: orlando.richards
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the gpfsug-discuss
mailing list