[gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add
Edward Wahl
ewahl at osc.edu
Mon Jun 5 20:54:31 BST 2017
Just a thought, as we noticed the EXACT opposite of this, and what I think is
new behavior in either mmmount or mmfsfuncs.. Does the file system exist in
your /etc/fstab (or AIX equiv) yet?
Ed
On Mon, 5 Jun 2017 15:54:09 +0000
"Oesterlin, Robert" <Robert.Oesterlin at nuance.com> wrote:
> Our node build process re-adds a node to the cluster and then does a “service
> gpfs start”, but GPFS doesn’t start. From the build log:
>
> + ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com
> '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com'
> + rc=0
> + chkconfig gpfs on
> + service gpfs start
>
> The “service gpfs start” command hangs and never seems to return.
>
> If I look at the process tree:
>
> [root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs"
> 11715 ? S 0:00 /bin/bash ./nrgX_gpfs_post
> 12191 ? Ssl 0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10
> 10 /var/adm/ras/mmsdrserv.log 128 yes no 12208 ? S
> 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15 12271 ?
> S 0:00 /bin/sh /sbin/service gpfs start 12276 ? S
> 0:00 /bin/sh /etc/init.d/gpfs start 12278 ? S
> 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot
> 12292 ? S
> 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot
> 12293 ? S 0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num
> 12294 ? S 0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e
> s/\.num$// 21639 ? S
> 0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
>
> This is GPFS 4.2.2-1
>
> This seems to occur only on the initial startup after build - if I try to
> start GPFS again, it works just fine - any ideas on what it’s sitting here
> waiting? Nothing in mmfslog (does not exist)
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 507-269-0413
>
>
--
Ed Wahl
Ohio Supercomputer Center
614-292-9302
More information about the gpfsug-discuss
mailing list