[gpfsug-discuss] Bad performance with GPFS system monitoring (mmsysmon) in GPFS 220.127.116.11
Jonathon A Anderson
jonathon.anderson at colorado.edu
Tue Feb 21 21:39:48 GMT 2017
This thread happened before I joined gpfsug-discuss; but be advised that we also experienced severe (1.5x-3x) performance degradation in user applications when running mmsysmon. In particular, we’re running a Haswell+OPA system.
The issue appears to only happen when the user application is simultaneously using all available cores *and* communicating over the network. Synthetic cpu tests with HPL did not expose the issue, nor did OSU micro-benchmarks that were designed to maximize the network without necessarily using all CPUs.
I’ve stopped mmsysmon by hand[^1] for now; but I haven’t yet gone so far as to remove the config file to prevent it from starting in the future.
We intend to run further tests; but I wanted to share our experiences so far (as this took us way longer than I wish it had to diagnose).
More information about the gpfsug-discuss