[gpfsug-discuss] Odd behavior - GPSF failed to start after initial node add

Oesterlin, Robert Robert.Oesterlin at nuance.com
Mon Jun 5 16:54:09 BST 2017


Our node build process re-adds a node to the cluster and then does a “service gpfs start”, but GPFS doesn’t start.  From the build log:

+ ssh -o StrictHostKeyChecking=no nrg1-gpfs01.nrg1.us.grid.nuance.com '/usr/local/sbin/addnode.sh cnq-r02r09u27.nrg1.us.grid.nuance.com'
+ rc=0
+ chkconfig gpfs on
+ service gpfs start

The “service gpfs start” command hangs and never seems to return.

If I look at the process tree:

[root at cnq-r02r09u27 ~]# ps ax | egrep "mm|gpfs"
11715 ?        S      0:00 /bin/bash ./nrgX_gpfs_post
12191 ?        Ssl    0:00 /usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 128 yes no
12208 ?        S      0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
12271 ?        S      0:00 /bin/sh /sbin/service gpfs start
12276 ?        S      0:00 /bin/sh /etc/init.d/gpfs start
12278 ?        S      0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot
12292 ?        S      0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmautoload reboot
12293 ?        S      0:00 /bin/grep -lw /var/mmfs/gen/nodeFiles/*.num
12294 ?        S      0:00 /bin/sed -e s%/var/mmfs/gen/nodeFiles/....%% -e s/\.num$//
21639 ?        S      0:00 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15

This is GPFS 4.2.2-1

This seems to occur only on the initial startup after build - if I try to start GPFS again, it works just fine - any ideas on what it’s sitting here waiting? Nothing in mmfslog (does not exist)

Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170605/aa54df89/attachment.htm>


More information about the gpfsug-discuss mailing list