[gpfsug-discuss] spontaneous tracing?

IBM Spectrum Scale scale at us.ibm.com
Mon Mar 12 15:13:00 GMT 2018

/usr/lpp/mmfs/bin/mmcommon notifyOverload will not cause tracing to be
started.  One can verify that using the underlying command being called as
shown in the following example with /tmp/n containing node names one each
line that will get the notification and the IP address being the file
system manager from which the command is issued.

/usr/lpp/mmfs/bin/mmsdrcli notifyOverload /tmp/n 1191 3 8

The only case that deadlock detection code will initiate tracing is that
debugDataControl is set to "heavy" and tracing is not started.   Then on
deadlock detection tracing is turned on for 20 seconds and turned off.

That can be tested using command like
/usr/lpp/mmfs/bin/mmsdrcli notifyDeadlock /tmp/n 1191 3 8

And then mmfs.log will tell you what's going on.  That's not a silent

2018-03-12_10:16:11.243-0400: [N] sdrServ: Received deadlock notification from
2018-03-12_10:16:11.243-0400: [N] GPFS will attempt to collect debug data on this node.
2018-03-12_10:16:11.953-0400: [I] Tracing in overwrite mode  <== tracing started
Trace started: Wait 20 seconds before cut and stop trace
2018-03-12_10:16:37.147-0400: [I] Tracing disabled  <== tracing stopped 20 seconds later
mmtrace: move /tmp/mmfs/lxtrace.trc.c69bc2xn01.cpu0 /tmp/mmfs/trcfile.2018-03-12_10.16.11.2982.deadlock.c69bc2xn01.cpu0
mmtrace: formatting /tmp/mmfs/trcfile.2018-03-12_10.16.11.2982.deadlock.c69bc2xn01 to /tmp/mmfs/trcrpt.2018-03-12_10.16.11.2982.deadlock.c69bc2xn01.gz

> What's odd is there are no log events to indicate an overload occurred.

Overload msg is only seen in mmfs.log when debugDataControl is "heavy".
mmdiag --deadlock shows  overload related info starting from 4.2.3.

# mmdiag --deadlock

=== mmdiag: deadlock ===

Effective deadlock detection threshold on c69bc2xn01 is 1800 seconds
Effective deadlock detection threshold on c69bc2xn01 is 360 seconds for
short waiters

Cluster c69bc2xn01.gpfs.net is overloaded. The overload index on c69bc2xn01
is 0.01812  <==
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20180312/8c9797d4/attachment.html>

More information about the gpfsug-discuss mailing list