[gpfsug-discuss] mmsysmon.py revisited

Wed Jul 19 14:28:23 BST 2017

I have opened a PMR, and the official response reflects what you just posted.
In addition, it seems there are some performance issues with Python 2 that will be 
improved with eventual migration to Python 3.  I was unaware of the mmhealth
functions that the mmsysmon daemon provides. The impact we were seeing 
was some variation in MPI benchmark results when the nodes were fully loaded.
I suppose it would be possible to turn off mmsysmon during the benchmarking,
but I appreciate the effort at streamlining the monitor service.  Cutting back on
fork/exec, better python, less polling, more notifications…  all good.

Thanks for the details,

 — ddj

> On Jul 19, 2017, at 9:05 AM, Mathias Dietz <MDIETZ at de.ibm.com> wrote:
> 
> thanks for the feedback. 
> 
> Let me clarify what mmsysmon is doing.
> Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
> Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events. 
> This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them. 
> 
> > It’s a huge problem. I don’t understand why it hasn’t been given 
> > much credit by dev or support.
> 
> Over the last couple of month, the development team has put a strong focus on this topic. 
> In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
> We are trying to reduce the polling overhead constantly and replace polling with notifications when possible. 
> 
> Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval) 
> See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm <https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm>
> In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter. 
> 
> Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems. 
> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
> 
> Kind regards
> 
> Mathias Dietz
> 
> IBM Spectrum Scale - Release Lead Architect and RAS Architect 
> 
> 
> gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
> 
> > From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date: 07/18/2017 07:51 PM
> > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > 
> > There’s no official way to cleanly disable it so far as I know yet; 
> > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
> > mmsysmonitor.conf.
> > 
> > It’s a huge problem. I don’t understand why it hasn’t been given 
> > much credit by dev or support.
> > 
> > ~jonathon
> > 
> > 
> > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
> > behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
> > on behalf of david_johnson at brown.edu> wrote:
> > 
> >     
> >     
> >     
> >     We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
> >     our diskless compute nodes. I read the earlier query, where it 
> > was answered:
> >     
> >     
> >     
> >     
> >     ces == Cluster Export Services,  mmsysmon.py comes from 
> > mmcesmon. It is used for managing export services of GPFS. If it is 
> > killed,  your nfs/smb etc will be out of work.
> >     Their overhead is small and they are very important. Don't 
> > attempt to kill them.
> >     
> >     
> >     
> >     
> >     
> >     
> >     Our question is this — we don’t run the latest “protocols", our 
> > NFS is CNFS, and our CIFS is clustered CIFS.
> >     I can understand it might be needed with Ganesha, but on every node? 
> >     
> >     
> >     Why in the world would I be getting this daemon running on all 
> > client nodes, when I didn’t install the “protocols" version 
> >     of the distribution?   We have release 4.2.2 at the moment.  How
> > can we disable this?
> >     
> >     
> >     Thanks,
> >      — ddj
> >     
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/669c525b/attachment.htm>