[gpfsug-discuss] gpfs client expels, fs hangind and waiters

Salvatore Di Nardo sdinardo at ebi.ac.uk
Fri Aug 15 14:31:29 BST 2014


Hello people,
Its quite a bit of time that i'm triing to solve a problem to our GPFS 
system, without much luck so i think its time to ask some help.

*First of a bit of introduction:**
*
Our GPFS system is made by 3xgss-26, In other words its made with 6x 
servers ( 4x10g links each) and several disk enclosures SAS attacked. 
The todal amount of spare its roughly 2PB, and the disks are SATA ( 
except few SSD dedicated to logtip ). My metadata and on dedicated 
vdisks, but both data and metadata vdiosks are in the same declustered 
arrays and recovery groups, so in the end they share the same spindles.

The clients its a LSF farm configured as another cluster ( standard 
multiclustering configuration) of  roughly 600 nodes .


*The issue:**
*
Recently we became aware that when some massive io request has been done 
we experience a lot of client expells. Heres an example of our logs:

        Fri Aug 15 12:40:24.680 2014: Expel 10.7.28.34 (gss03a) request
        from 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.138 (ebi3-138 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:41.652 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.38 (ebi5-037 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:45.754 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.58 (ebi3-058 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:40:52.305 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.68 (ebi5-067 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:17.069 2014: Expel 10.7.28.35 (gss03b) request
        from 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.161 (ebi3-161 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:23.555 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.136 (ebi3-136 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.258 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.22 (ebi5-021 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:41:54.540 2014: Expel 10.7.28.66 (gss02b) request
        from 10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.57 (ebi5-056 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:42:57.288 2014: Expel 10.7.35.5 (ebi5-132 in
        ebi-cluster.ebi.ac.uk) request from 10.7.28.34 (gss03a).
        Expelling: 10.7.35.5 (ebi5-132 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:43:24.327 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.99 (ebi5-226 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 12:44:54.202 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.165 (ebi3-165 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:15:54.450 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.37.89 (ebi5-216 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:20:16.524 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.55 (ebi3-055 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:26:54.177 2014: Expel 10.7.28.34 (gss03a) request
        from 10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.34.64 (ebi5-063 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:27:53.900 2014: Expel 10.7.28.3 (gss01b) request
        from 10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk). Expelling:
        10.7.35.15 (ebi5-142 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:28:24.297 2014: Expel 10.7.28.67 (gss02a) request
        from 172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk). Expelling:
        172.16.4.50 (ebi3-050 in ebi-cluster.ebi.ac.uk)
        Fri Aug 15 13:29:23.913 2014: Expel 10.7.28.3 (gss01b) request
        from 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk).
        Expelling: 172.16.4.156 (ebi3-156 in ebi-cluster.ebi.ac.uk)

at the same time we experience also long waiters queue (1000+ lines). An 
example in case of massive writes ( dd ) :

        0x7F522E1EEF90 waiting 1.861233182 seconds, NSDThread: on ThCond
        0x7F5158019B08 (0x7F5158019B08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.101 <c0n362>
        0x7F522E1EC9B0 waiting 1.490567470 seconds, NSDThread: on ThCond
        0x7F50F4038BA8 (0x7F50F4038BA8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1EB6C0 waiting 1.077098046 seconds, NSDThread: on ThCond
        0x7F50B40011F8 (0x7F50B40011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.156 <c0n647>
        0x7F522E1EA3D0 waiting 7.714968554 seconds, NSDThread: on ThCond
        0x7F50BC0078B8 (0x7F50BC0078B8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.107 <c0n455>
        0x7F522E1E90E0 waiting 4.774379417 seconds, NSDThread: on ThCond
        0x7F506801B1F8 (0x7F506801B1F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.23 <c0n493>
        0x7F522E1E7DF0 waiting 0.746172444 seconds, NSDThread: on ThCond
        0x7F5094007D78 (0x7F5094007D78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.84 <c0n18>
        0x7F522E1E6B00 waiting 1.553030487 seconds, NSDThread: on ThCond
        0x7F51C0004C78 (0x7F51C0004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.63 <c0n153>
        0x7F522E1E5810 waiting 2.165307633 seconds, NSDThread: on ThCond
        0x7F5178016A08 (0x7F5178016A08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.29 <c0n686>
        0x7F522E1E4520 waiting 1.128089273 seconds, NSDThread: on ThCond
        0x7F5074004D98 (0x7F5074004D98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1E3230 waiting 2.515214328 seconds, NSDThread: on ThCond
        0x7F51F400EF08 (0x7F51F400EF08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.90 <c0n494>
        0x7F522E1E1F40 waiting*162.966840834* seconds, NSDThread: on
        ThCond 0x7F51840207A8 (0x7F51840207A8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.34.97 <c0n6>
        0x7F522E1E0C50 waiting 1.140787288 seconds, NSDThread: on ThCond
        0x7F51AC005C08 (0x7F51AC005C08) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.94 <c0n664>
        0x7F522E1DF960 waiting 41.907415248 seconds, NSDThread: on
        ThCond 0x7F5160019038 (0x7F5160019038) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1DE670 waiting 0.466560418 seconds, NSDThread: on ThCond
        0x7F513802B258 (0x7F513802B258) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 172.16.4.168 <c0n598>
        0x7F522E1DD380 waiting 3.102803621 seconds, NSDThread: on ThCond
        0x7F516C0106C8 (0x7F516C0106C8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.37.91 <c0n143>
        0x7F522E1DC090 waiting 2.751614295 seconds, NSDThread: on ThCond
        0x7F504C0011F8 (0x7F504C0011F8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.25 <c0n651>
        0x7F522E1DADA0 waiting 5.083691891 seconds, NSDThread: on ThCond
        0x7F507401BE88 (0x7F507401BE88) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.61 <c0n300>
        0x7F522E1D9AB0 waiting 2.263374184 seconds, NSDThread: on ThCond
        0x7F5080003B98 (0x7F5080003B98) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.36 <c0n225>
        0x7F522E1D87C0 waiting 0.206989639 seconds, NSDThread: on ThCond
        0x7F505801F0D8 (0x7F505801F0D8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.55 <c0n498>
        0x7F522E1D74D0 waiting *41.841279897* seconds, NSDThread: on
        ThCond 0x7F5194008B88 (0x7F5194008B88) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.143 <c0n503>
        0x7F522E1D61E0 waiting 5.618652361 seconds, NSDThread: on ThCond
        0x1BAB868 (0x1BAB868) (MsgRecordCondvar), reason 'RPC wait' for
        getData on node 10.7.35.59 <c0n532>
        0x7F522E1D4EF0 waiting 6.185658427 seconds, NSDThread: on ThCond
        0x7F513802AAE8 (0x7F513802AAE8) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.35.6 <c0n330>
        0x7F522E1D3C00 waiting 2.652370892 seconds, NSDThread: on ThCond
        0x7F5130004C78 (0x7F5130004C78) (MsgRecordCondvar), reason 'RPC
        wait' for getData on node 10.7.34.45 <c0n271>
        0x7F522E1D2910 waiting 11.396142225 seconds, NSDThread: on
        ThCond 0x7F51A401C0C8 (0x7F51A401C0C8) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 172.16.4.169 <c0n549>
        0x7F522E1D1620 waiting 63.710723043 seconds, NSDThread: on
        ThCond 0x7F5038004D08 (0x7F5038004D08) (MsgRecordCondvar),
        reason 'RPC wait' for getData on node 10.7.37.120 <c0n8>


or for massive reads:

        0x7FBCE69A8C20 waiting 29.262629530 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE699CEC0 waiting 29.260869141 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE698C5A0 waiting 29.124824888 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6984110 waiting 22.729479654 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69512C0 waiting 29.272805926 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE69409A0 waiting 28.833650198 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6924320 waiting 29.237067128 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6921D40 waiting 29.237953228 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6915FE0 waiting 29.046721161 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE6913A00 waiting 29.264534710 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6900B00 waiting 29.267691105 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68F7380 waiting 29.266402464 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68D2870 waiting 29.276298231 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68BADB0 waiting 28.665700576 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68B61F0 waiting 29.236878611 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6885980 waiting *144*.530487248 seconds, NSDThread: on
        ThMutex 0x1803396A670 (0xFFFFC9003396A670) (DiskSchedulingMutex)
        0x7FBCE68833A0 waiting 29.231066610 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE68820B0 waiting 29.269954514 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE686A5F0 waiting *140*.662994256 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE6864740 waiting 29.254180742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE683FC30 waiting 29.271840565 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE682E020 waiting 29.200969209 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6825B90 waiting 19.136732919 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6805C40 waiting 29.236055550 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FEAA0 waiting 29.283264161 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67FC4C0 waiting 29.268992663 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67DFE40 waiting 29.150900786 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D2DF0 waiting 29.199058463 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67D1B00 waiting 29.203199738 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE67768D0 waiting 29.208231742 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6768590 waiting 5.228192589 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE67672A0 waiting 29.252839376 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6757C70 waiting 28.869359044 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6748640 waiting 29.289284179 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6734450 waiting 29.253591817 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6730B80 waiting 29.289987273 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6720260 waiting 26.597589551 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66F32C0 waiting 29.177692849 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66E3C90 waiting 29.160268518 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE66CC1D0 waiting 5.334330188 seconds, NSDThread: on ThCond
        0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar), reason
        'wait for permission to append to log'
        0x7FBCE66B3420 waiting 34.274433161 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE668E910 waiting 27.699999488 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6689D50 waiting 34.279090465 seconds, NSDThread: on
        ThMutex 0x180339A3140 (0xFFFFC900339A3140) (DiskSchedulingMutex)
        0x7FBCE66805D0 waiting 24.688626241 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE6675B60 waiting 35.367745840 seconds, NSDThread: on
        ThCond 0x18033B74D48 (0xFFFFC90033B74D48) (LeaseWaitCondvar),
        reason 'Waiting to acquire disklease'
        0x7FBCE665E0A0 waiting 29.235994598 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'
        0x7FBCE663CE60 waiting 29.162911979 seconds, NSDThread: on
        ThCond 0x7FBBF0045D40 (0x7FBBF0045D40) (VdiskLogAppendCondvar),
        reason 'wait for permission to append to log'


Another example with mmfsadm in case of massive reads:

        [root at gss02b ~]# mmfsadm dump waiters
        0x7F519000AEA0 waiting 28.915010347 seconds, replyCleanupThread:
        on ThCond 0x7F51101B27B8 (0x7F51101B27B8) (MsgRecordCondvar),
        reason 'RPC wait'
        0x7F511C012A10 waiting 279.522206863 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F52000095F8 (0x7F52000095F8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5120000B80 waiting 279.524782437 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5214000EE8 (0x7F5214000EE8)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F5154006310 waiting 138.164386224 seconds, Msg handler
        commMsgCheckMessages: on ThCond 0x7F5174003F08 (0x7F5174003F08)
        (InuseCondvar), reason 'waiting for exclusive use of connection
        for sending msg'
        0x7F522E1EB6C0 waiting 23.060703000 seconds, NSDThread: for poll
        on sock 85
        0x7F522E1E6B00 waiting 0.068456104 seconds, NSDThread: on ThCond
        0x7F50CC00E478 (0x7F50CC00E478) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1D0330 waiting 17.207907857 seconds, NSDThread: on
        ThCond 0x7F5078001688 (0x7F5078001688) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E1BFA10 waiting 0.181011711 seconds, NSDThread: on ThCond
        0x7F504000E558 (0x7F504000E558) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B4FA0 waiting 0.021780338 seconds, NSDThread: on ThCond
        0x7F522000E488 (0x7F522000E488) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E1B3CB0 waiting 0.794718000 seconds, NSDThread: for poll
        on sock 799
        0x7F522E186D10 waiting 0.191606803 seconds, NSDThread: on ThCond
        0x7F5184015D58 (0x7F5184015D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E184730 waiting 0.025562000 seconds, NSDThread: for poll
        on sock 867
        0x7F522E12CDD0 waiting 0.008921000 seconds, NSDThread: for poll
        on sock 543
        0x7F522E126F20 waiting 1.459531000 seconds, NSDThread: for poll
        on sock 983
        0x7F522E10F460 waiting 17.177936972 seconds, NSDThread: on
        ThCond 0x7F51EC002CE8 (0x7F51EC002CE8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E101120 waiting 17.232580316 seconds, NSDThread: on
        ThCond 0x7F51BC005BB8 (0x7F51BC005BB8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0F1AF0 waiting 438.556030000 seconds, NSDThread: for
        poll on sock 496
        0x7F522E0E7080 waiting 393.702839774 seconds, NSDThread: on
        ThCond 0x7F5164013668 (0x7F5164013668) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E09DA60 waiting 52.746984660 seconds, NSDThread: on
        ThCond 0x7F506C008858 (0x7F506C008858) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E084CB0 waiting 23.096688206 seconds, NSDThread: on
        ThCond 0x7F521C008E18 (0x7F521C008E18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E0839C0 waiting 0.093456000 seconds, NSDThread: for poll
        on sock 962
        0x7F522E076970 waiting 2.236659731 seconds, NSDThread: on ThCond
        0x7F51E0027538 (0x7F51E0027538) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522E044E10 waiting 52.752497765 seconds, NSDThread: on
        ThCond 0x7F513802BDD8 (0x7F513802BDD8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E033200 waiting 16.157355796 seconds, NSDThread: on
        ThCond 0x7F5104240D58 (0x7F5104240D58) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E02AD70 waiting 436.025203220 seconds, NSDThread: on
        ThCond 0x7F50E0016C28 (0x7F50E0016C28) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522E01A450 waiting 393.673252777 seconds, NSDThread: on
        ThCond 0x7F50A8009C18 (0x7F50A8009C18) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DFE0460 waiting 1.781358358 seconds, NSDThread: on ThCond
        0x7F51E0027638 (0x7F51E0027638) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF99420 waiting 0.038405427 seconds, NSDThread: on ThCond
        0x7F50F0172B18 (0x7F50F0172B18) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF7CDA0 waiting 438.204625355 seconds, NSDThread: on
        ThCond 0x7F50900023D8 (0x7F50900023D8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF76EF0 waiting 435.903645734 seconds, NSDThread: on
        ThCond 0x7F5084004BC8 (0x7F5084004BC8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF74910 waiting 21.749325022 seconds, NSDThread: on
        ThCond 0x7F507C011F48 (0x7F507C011F48) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF71040 waiting 1.027274000 seconds, NSDThread: for poll
        on sock 866
        0x7F522DF536D0 waiting 52.953847324 seconds, NSDThread: on
        ThCond 0x7F5200006FF8 (0x7F5200006FF8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF510F0 waiting 0.039278000 seconds, NSDThread: for poll
        on sock 837
        0x7F522DF4EB10 waiting 0.085745937 seconds, NSDThread: on ThCond
        0x7F51F0006828 (0x7F51F0006828) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF4C530 waiting 21.850733000 seconds, NSDThread: for poll
        on sock 986
        0x7F522DF4B240 waiting 0.054739884 seconds, NSDThread: on ThCond
        0x7F51EC0168D8 (0x7F51EC0168D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF48C60 waiting 0.186409714 seconds, NSDThread: on ThCond
        0x7F51E4000908 (0x7F51E4000908) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF41AC0 waiting 438.942861290 seconds, NSDThread: on
        ThCond 0x7F51CC010168 (0x7F51CC010168) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DF3F4E0 waiting 0.060235106 seconds, NSDThread: on ThCond
        0x7F51C400A438 (0x7F51C400A438) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF22E60 waiting 0.361288000 seconds, NSDThread: for poll
        on sock 518
        0x7F522DF21B70 waiting 0.060722464 seconds, NSDThread: on ThCond
        0x7F51580162D8 (0x7F51580162D8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DF12540 waiting 23.077564448 seconds, NSDThread: on
        ThCond 0x7F512C13E1E8 (0x7F512C13E1E8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEFD060 waiting 0.723370000 seconds, NSDThread: for poll
        on sock 503
        0x7F522DEE09E0 waiting 1.565799175 seconds, NSDThread: on ThCond
        0x7F5084004D58 (0x7F5084004D58) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDF6F0 waiting 22.063017342 seconds, NSDThread: on
        ThCond 0x7F5078003E08 (0x7F5078003E08) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEDD110 waiting 0.049108780 seconds, NSDThread: on ThCond
        0x7F5070001D78 (0x7F5070001D78) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEDAB30 waiting 229.603224376 seconds, NSDThread: on
        ThCond 0x7F50680221B8 (0x7F50680221B8) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DED7260 waiting 0.071855457 seconds, NSDThread: on ThCond
        0x7F506400A5A8 (0x7F506400A5A8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DED5F70 waiting 0.648324000 seconds, NSDThread: for poll
        on sock 766
        0x7F522DEC3070 waiting 1.809205756 seconds, NSDThread: on ThCond
        0x7F522000E518 (0x7F522000E518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DEB1460 waiting 436.017396645 seconds, NSDThread: on
        ThCond 0x7F51E4000978 (0x7F51E4000978) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DEAC8A0 waiting 393.734102000 seconds, NSDThread: for
        poll on sock 609
        0x7F522DEA3120 waiting 17.960778837 seconds, NSDThread: on
        ThCond 0x7F51B4001708 (0x7F51B4001708) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE86AA0 waiting 23.112060045 seconds, NSDThread: on
        ThCond 0x7F5154096118 (0x7F5154096118) (InuseCondvar), reason
        'waiting for exclusive use of connection for sending msg'
        0x7F522DE64570 waiting 0.076167410 seconds, NSDThread: on ThCond
        0x7F50D8005EF8 (0x7F50D8005EF8) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DE1AF50 waiting 17.460836000 seconds, NSDThread: for poll
        on sock 737
        0x7F522DE104E0 waiting 0.205037000 seconds, NSDThread: for poll
        on sock 865
        0x7F522DDB8B80 waiting 0.106192000 seconds, NSDThread: for poll
        on sock 78
        0x7F522DDA36A0 waiting 0.738921180 seconds, NSDThread: on ThCond
        0x7F505400E048 (0x7F505400E048) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD9C500 waiting 0.731118367 seconds, NSDThread: on ThCond
        0x7F503C00B518 (0x7F503C00B518) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'
        0x7F522DD89600 waiting 229.609363000 seconds, NSDThread: for
        poll on sock 515
        0x7F522DD567B0 waiting 1.508489195 seconds, NSDThread: on ThCond
        0x7F514C021F88 (0x7F514C021F88) (InuseCondvar), reason 'waiting
        for exclusive use of connection for sending msg'



Another thing worth to mention is that the filesystem its totaly 
unresponsive. Even a simple "cd" to a directory or an ls to a directory 
just hangs for several minutes ( litterally). This happens also if i try 
from the NSD servers.


*Few things i have looked into:*
* Our network seems fine, there might be some bottleneck on part of 
them, and this could explain the waiters, but doesnt explain why ad some 
poit those client ask to expel the NSD servers. THis also doesn't 
justify why the FS is slow even on NSD itself.

* Disk bottleneck? i dont think so. NSD servers have cpu usage  (and io 
wait ) very low. Also mmdiag --iohist seems condirming that the 
operation on the disks are reasonable fast:


        === mmdiag: iohist ===

        I/O history:

          I/O start time RW    Buf type disk:sectorNum     nSec  time
        ms  Type  Device/NSD ID         NSD server
        --------------- -- ----------- ----------------- ----- ------- 
        ---- ------------------ ---------------
        13:54:29.209276  W        data   34:5066338808    2056 88.307 
        lcl  sdtu
        13:54:29.209277  W        data   55:5095698936    2056 27.592 
        lcl  sdaab
        13:54:29.209278  W        data  171:5104087544    2056 22.801 
        lcl  sdtg
        13:54:29.209279  W        data  116:5011812856    2056 65.983 
        lcl  sdqr
        13:54:29.209280  W        data   98:4860817912    2056 17.892 
        lcl  sddl
        13:54:29.209281  W        data  159:4999229944    2056 21.324 
        lcl  sdjg
        13:54:29.209282  W        data   84:5049561592    2056 31.932 
        lcl  sdqz
        13:54:29.209283  W        data    8:5003424248    2056 30.912 
        lcl  sdcw
        13:54:29.209284  W        data   23:4965675512    2056 27.366 
        lcl  sdpt
        13:54:29.297715  W  vdiskMDLog    2:144008496        1 0.236 
        lcl  sdkr
        13:54:29.297717  W  vdiskMDLog    0:331703600        1 0.230 
        lcl  sdcm
        13:54:29.297718  W  vdiskMDLog    1:273769776        1 0.241 
        lcl  sdbp
        13:54:29.244902  W        data   51:3857589752    2056 35.566 
        lcl  sdyi
        13:54:29.244904  W        data   10:3773703672    2056 28.512 
        lcl  sdma
        13:54:29.244905  W        data   48:3639485944    2056 24.124 
        lcl  sdel
        13:54:29.244906  W        data   25:3777897976    2056 18.691 
        lcl  sdgt
        13:54:29.244908  W        data   91:3832423928    2056 20.699 
        lcl  sdlc
        13:54:29.244909  W        data  115:3723372024    2056 30.783 
        lcl  sdho
        13:54:29.244910  W        data  173:3882755576    2056 53.241 
        lcl  sdti
        13:54:29.244911  W        data   42:3782092280    2056 22.785 
        lcl  sddz
        13:54:29.244912  W        data   45:3647874552    2056 24.289 
        lcl  sdei
        13:54:29.244913  W        data   32:3652068856    2056 17.220 
        lcl  sdbn
        13:54:29.244914  W        data   39:3677234680    2056 26.017 
        lcl  sddw
        13:54:29.298273  W  vdiskMDLog    2:144008497        1 2.522 
        lcl  sduf
        13:54:29.298274  W  vdiskMDLog    0:331703601        1 1.025 
        lcl  sdlo
        13:54:29.298275  W  vdiskMDLog    1:273769777        1 2.586 
        lcl  sdtt
        13:54:29.288275  W        data   27:2249588200    2056 20.071 
        lcl  sdhb
        13:54:29.288279  W        data   33:2224422376    2056 19.682 
        lcl  sdts
        13:54:29.288281  W        data   47:2115370472    2056 21.667 
        lcl  sdwo
        13:54:29.288282  W        data   82:2316697064    2056 21.524 
        lcl  sdxy
        13:54:29.288283  W        data   85:2232810984    2056 17.467 
        lcl  sdra
        13:54:29.288285  W        data   30:2127953384    2056 18.475 
        lcl  sdqg
        13:54:29.288286  W        data   67:1876295144    2056 16.383 
        lcl  sdmx
        13:54:29.288287  W        data   64:2127953384    2056 21.908 
        lcl  sduh
        13:54:29.288288  W        data   38:2253782504    2056 19.775 
        lcl  sddv
        13:54:29.288290  W        data   15:2207645160    2056 20.599 
        lcl  sdet
        13:54:29.288291  W        data  157:2283142632    2056 21.198 
        lcl  sdiy


Bonding problem on the interfaces? Mellanox ( interface card prodicer) 
drivers and firmware updated, and we even tested the system with a 
single link ( without bonding).


Could someone help me with this? in particular:

* What exactly are client are looking to decide that another node is 
unresponsive? Ping? i dont think so because both NSD servers and clients 
can be pinged, so what they look? if comeone can also specify what port 
are they using i can try to tcpdump what exactly is cauding this expell.

* How can i monitor metadata operations to understand where EXACTLY is 
the bottleneck that causes this:

        [sdinardo at ebi5-001 ~]$ time ls /gpfs/nobackup/sdinardo

        1                   ebi3-054.ebi.ac.uk ebi3-154           
        ebi5-019.ebi.ac.uk  ebi5-052 ebi5-101           
        ebi5-156            ebi5-197 ebi5-228            ebi5-262.ebi.ac.uk
        10                  ebi3-055 ebi3-155           
        ebi5-021.ebi.ac.uk  ebi5-053 ebi5-104.ebi.ac.uk 
        ebi5-160.ebi.ac.uk  ebi5-198 ebi5-229            ebi5-263
        2                   ebi3-056.ebi.ac.uk ebi3-156           
        ebi5-022            ebi5-054.ebi.ac.uk ebi5-106           
        ebi5-161            ebi5-200 ebi5-230.ebi.ac.uk  ebi5-264
        3                   ebi3-057 ebi3-157           
        ebi5-023            ebi5-056 ebi5-109           
        ebi5-162.ebi.ac.uk  ebi5-201 ebi5-231.ebi.ac.uk  ebi5-265
        4                   ebi3-058 ebi3-158.ebi.ac.uk 
        ebi5-024.ebi.ac.uk  ebi5-057 ebi5-110.ebi.ac.uk 
        ebi5-163.ebi.ac.uk  ebi5-202.ebi.ac.uk ebi5-232           
        ebi5-266.ebi.ac.uk
        5                   ebi3-059.ebi.ac.uk ebi3-160           
        ebi5-025            ebi5-060 ebi5-111.ebi.ac.uk 
        ebi5-164            ebi5-204 ebi5-233            ebi5-267
        6                   ebi3-132 ebi3-161.ebi.ac.uk 
        ebi5-026            ebi5-061.ebi.ac.uk ebi5-112.ebi.ac.uk 
        ebi5-165            ebi5-205 ebi5-234            ebi5-269.ebi.ac.uk
        7                   ebi3-133 ebi3-163.ebi.ac.uk 
        ebi5-028            ebi5-062.ebi.ac.uk ebi5-129.ebi.ac.uk 
        ebi5-166            ebi5-206.ebi.ac.uk ebi5-236            ebi5-270
        8                   ebi3-134 ebi3-165           
        ebi5-030            ebi5-064 ebi5-131.ebi.ac.uk 
        ebi5-169.ebi.ac.uk  ebi5-207 ebi5-237            ebi5-271
        9                   ebi3-135 ebi3-166.ebi.ac.uk 
        ebi5-031            ebi5-065 ebi5-132           
        ebi5-170.ebi.ac.uk  ebi5-209 ebi5-239.ebi.ac.uk  launcher.sh

        _*real    21m14.948s*_( WTH ?!?!?!)
        user    0m0.004s
        sys    0m0.014s



I know that the question are not easy to answer, and i need to dig more, 
but could be very helpful if someone give me some hints about where to 
look at. My gpfs skills are limited since this is our first system and 
is in production for just few months, and the things stated to worsen 
just recenlty. In past we could get over 200Gb/s ( both read and write) 
without any issue. Now some clients get expelled even when data 
thoughuput is ad 4-5Gb/s.

Thanks in advance for any help.

Regards,
Salvatore
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140815/3eaa5bc1/attachment-0002.htm>


More information about the gpfsug-discuss mailing list