[gpfsug-discuss] gpfsgui in a core dump/restart loop
Losen, Stephen C (scl)
scl at virginia.edu
Tue Nov 30 12:47:46 GMT 2021
Hi folks,
Our gpfsgui service keeps crashing and restarting. About every three minutes we get files like these in /var/crash/scalemgmt
-rw------- 1 scalemgmt scalemgmt 1067843584 Nov 30 06:54 core.20211130.065414.59174.0001.dmp
-rw-r--r-- 1 scalemgmt scalemgmt 2636747 Nov 30 06:54 javacore.20211130.065414.59174.0002.txt
-rw-r--r-- 1 scalemgmt scalemgmt 1903304 Nov 30 06:54 Snap.20211130.065414.59174.0003.trc
-rw-r--r-- 1 scalemgmt scalemgmt 202 Nov 30 06:54 jitdump.20211130.065414.59174.0004.dmp
The core.*.dmp files are cores from the java command.
And the below errors keep repeating in /var/adm/ras/mmsysmonitor.log.
Any suggestions? Thanks for any help.
2021-11-30_07:25:09.944-0500: [W] ET_gui Event=gui_down identifier= arg0=started arg1=stopped
2021-11-30_07:25:09.961-0500: [I] ET_gui state_change for service: gui to FAILED at 2021.11.30 07.25.09.961572
2021-11-30_07:25:09.963-0500: [I] ClientThread-4 received command: 'thresholds refresh collectors 4021694'
2021-11-30_07:25:09.964-0500: [I] ClientThread-4 reload collectors
2021-11-30_07:25:09.964-0500: [I] ClientThread-4 read_collectors
2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryHandler: query response has no data results
2021-11-30_07:25:10.059-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryHandler: query response has no data results
2021-11-30_07:25:10.060-0500: [W] ClientThread-4 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:10.061-0500: [I] ClientThread-4 _activate_rules_scheduler completed
2021-11-30_07:25:10.147-0500: [I] ET_gui Event=component_state_change identifier= arg0=GUI arg1=FAILED
2021-11-30_07:25:10.148-0500: [I] ET_gui StateChange: change_to=FAILED nodestate=DEGRADED CESState=UNKNOWN
2021-11-30_07:25:10.148-0500: [I] ET_gui Service gui state changed. isInRunningState=True, wasInRunningState=True. New state=4
2021-11-30_07:25:10.148-0500: [I] ET_gui Monitor: LocalState:FAILED Events:607 Entities:0 RT: 0.83
2021-11-30_07:25:11.975-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpq4ac8o', '-c 4021693']
2021-11-30_07:25:11.975-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:04.553-0500: [D] ET_perfmon File collectors has no newer version than 4021693 - CCRProxy.getFile:119
2021-11-30_07:25:11.975-0500: [W] ET_perfmon Conditional put for file collectors with version 4021693 failed
2021-11-30_07:25:11.975-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:11.976-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:12.077-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:13.333-0500: [I] ClientThread-20 received command: 'thresholds refresh collectors 4021695'
2021-11-30_07:25:13.334-0500: [I] ClientThread-20 reload collectors
2021-11-30_07:25:13.335-0500: [I] ClientThread-20 read_collectors
2021-11-30_07:25:13.453-0500: [W] ClientThread-20 QueryHandler: query response has no data results
2021-11-30_07:25:13.454-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryHandler: query response has no data results
2021-11-30_07:25:13.463-0500: [W] ClientThread-20 QueryProcessor::execute: Error sending query in execute, quitting
2021-11-30_07:25:13.464-0500: [I] ClientThread-20 _activate_rules_scheduler completed
2021-11-30_07:25:15.528-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmpKTN69I', '-c 4021694']
2021-11-30_07:25:15.528-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:12.076-0500: [D] ET_perfmon File collectors has no newer version than 4021694 - CCRProxy.getFile:119
2021-11-30_07:25:15.529-0500: [W] ET_perfmon Conditional put for file collectors with version 4021694 failed
2021-11-30_07:25:15.529-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:15.529-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:15.626-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:16.594-0500: [I] ClientThread-3 received command: 'thresholds refresh collectors 4021696'
2021-11-30_07:25:16.595-0500: [I] ClientThread-3 reload collectors
2021-11-30_07:25:16.595-0500: [I] ClientThread-3 read_collectors
2021-11-30_07:25:19.780-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp3joeUB', '-c 4021695']
2021-11-30_07:25:19.780-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:15.625-0500: [D] ET_perfmon File collectors has no newer version than 4021695 - CCRProxy.getFile:119
2021-11-30_07:25:16.781-0500: [D] ClientThread-3 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119
2021-11-30_07:25:19.780-0500: [W] ET_perfmon Conditional put for file collectors with version 4021695 failed
2021-11-30_07:25:19.781-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:19.781-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:19.881-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:21.238-0500: [I] ClientThread-7 received command: 'thresholds refresh collectors 4021697'
2021-11-30_07:25:21.239-0500: [I] ClientThread-7 reload collectors
2021-11-30_07:25:21.239-0500: [I] ClientThread-7 read_collectors
2021-11-30_07:25:21.324-0500: [W] NMES monitor event arrived while still busy for perfmon
2021-11-30_07:25:21.481-0500: [I] ET_threshold Event=thresh_monitor_del_active identifier=active_thresh_monitor arg0=active_thresh_monitor
2021-11-30_07:25:21.482-0500: [I] ET_threshold Monitor: LocalState:HEALTHY Events:1 Entities:1 RT: 0.16
2021-11-30_07:25:24.211-0500: [W] ET_perfmon got rc (153) while executing ['/usr/lpp/mmfs/bin/mmccr', 'fput', 'collectors', '/var/mmfs/tmp/tmp8HAusb', '-c 4021696']
2021-11-30_07:25:24.211-0500: [E] ET_perfmon fput failed: Version mismatch on conditional put (err 805)
- CCRProxy._run_ccr_command:256
2021-09-29_20:03:53.322-0500: [I] MainThread ---------------------------------
2021-11-30_07:25:19.881-0500: [D] ET_perfmon File collectors has no newer version than 4021696 - CCRProxy.getFile:119
2021-11-30_07:25:21.411-0500: [D] ClientThread-7 File zmrules.json has no newer version than 1 - CCRProxy.getFile:119
2021-11-30_07:25:24.211-0500: [W] ET_perfmon Conditional put for file collectors with version 4021696 failed
2021-11-30_07:25:24.212-0500: [W] ET_perfmon New version received, start new collectors update cycle
2021-11-30_07:25:24.212-0500: [I] ET_perfmon read_collectors
2021-11-30_07:25:24.314-0500: [I] ET_perfmon write_collectors
2021-11-30_07:25:24.543-0500: [I] ET_gui ServiceMonitor => out=Type=notify
And then gpfsgui apparently crashes and systemd automatically restarts it.
Steve Losen
Research Computing
University of Virginia
scl at virginia.edu 434-924-0640
More information about the gpfsug-discuss
mailing list