[gpfsug-discuss] filesystem manager crashes every time mmdelsnapshot (from either the filesystem manager or some other nsd/client) is called

Sabuj Pattanayek sabujp at gmail.com
Fri May 30 02:34:00 BST 2014


This is still happening in 3.5.0.18 and when a snapshot is being deleted it
slows NFS read speeds to a crawl (but not gpfs and not NFS writes).


On Thu, May 15, 2014 at 7:48 AM, Sabuj Pattanayek <sabujp at gmail.com> wrote:

> Hi all,
>
> We're running 3.5.0.17 now and it looks like the filesystem manager
> automatically reboots (and sometimes fails to automatically reboot) after
> mmdelsnapshot is called, either from the filesystem manager itself or from
> some other nsd/node . It didn't start happening immediately after we
> updated to 17, but we never had this issue when we were at 3.5.0.11 . The
> error mmdelsnapshot throws at some point is :
>
> Lost connection to file system daemon.
> mmdelsnapshot: An internode connection between GPFS nodes was disrupted.
> mmdelsnapshot: Command failed.  Examine previous error messages to
> determine cause.
>
> It also causes an mmfs generic error and or a kernel: BUG: soft lockup - CPU#15 stuck for 67s! [mmfsd:39266], the latter causes the system to not reboot itself (which is actually worse), but the former does.
>
>
> It also causes some havoc with CNFS file locking even after the filesystem manager is rebooted and has come up :
>
>
> May 15 07:10:12 mako-nsd1 sm-notify[19387]: Failed to bind RPC socket:
> Address already in use
>
>
> May 15 07:21:03 mako-nsd1 sm-notify[11052]: Invalid bind address or port
>
> for RPC socket: Name or service not known
>
>
> Saw some snapshot related fixes in 3.5.0.18, anyone seen this behavior or know if it's fixed in 18?
>
>
> Thanks,
>
> Sabuj
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140529/76d1258b/attachment.htm>


More information about the gpfsug-discuss mailing list