[gpfsug-discuss] node lockups in gpfs > 4.1.1.14

Aaron Knister aaron.s.knister at nasa.gov
Fri Aug 4 09:00:35 BST 2017


Hey All,

Anyone seen any strange behavior running either 4.1.1.15 or 4.1.1.16?

We are mid upgrade to 4.1.1.16 from 4.1.1.14 and have seen some rather 
disconcerting behavior. Specifically on some of the upgraded nodes GPFS 
will seemingly deadlock on the entire node rendering it unusable. I 
can't even get a session on the node (but I can trigger a crash dump via 
a sysrq trigger).

Most blocked tasks are blocked are in cxiWaitEventWait at the top of 
their call trace. That's probably not very helpful in of itself but I'm 
curious if anyone else out there has run into this issue or if this is a 
known bug.

(I'll open a PMR later today once I've gathered more diagnostic 
information).

-Aaron

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list