[gpfsug-discuss] mmsysmon.py revisited
David Johnson
david_johnson at brown.edu
Wed Jul 19 14:28:23 BST 2017
I have opened a PMR, and the official response reflects what you just posted.
In addition, it seems there are some performance issues with Python 2 that will be
improved with eventual migration to Python 3. I was unaware of the mmhealth
functions that the mmsysmon daemon provides. The impact we were seeing
was some variation in MPI benchmark results when the nodes were fully loaded.
I suppose it would be possible to turn off mmsysmon during the benchmarking,
but I appreciate the effort at streamlining the monitor service. Cutting back on
fork/exec, better python, less polling, more notifications… all good.
Thanks for the details,
— ddj
> On Jul 19, 2017, at 9:05 AM, Mathias Dietz <MDIETZ at de.ibm.com> wrote:
>
> thanks for the feedback.
>
> Let me clarify what mmsysmon is doing.
> Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
> Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
> This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
>
> > It’s a huge problem. I don’t understand why it hasn’t been given
> > much credit by dev or support.
>
> Over the last couple of month, the development team has put a strong focus on this topic.
> In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
> We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
>
> Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
> See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm <https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm>
> In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
>
> Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
>
> Kind regards
>
> Mathias Dietz
>
> IBM Spectrum Scale - Release Lead Architect and RAS Architect
>
>
> gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
>
> > From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Date: 07/18/2017 07:51 PM
> > Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >
> > There’s no official way to cleanly disable it so far as I know yet;
> > but you can defacto disable it by deleting /var/mmfs/mmsysmon/
> > mmsysmonitor.conf.
> >
> > It’s a huge problem. I don’t understand why it hasn’t been given
> > much credit by dev or support.
> >
> > ~jonathon
> >
> >
> > On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on
> > behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org
> > on behalf of david_johnson at brown.edu> wrote:
> >
> >
> >
> >
> > We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
> > our diskless compute nodes. I read the earlier query, where it
> > was answered:
> >
> >
> >
> >
> > ces == Cluster Export Services, mmsysmon.py comes from
> > mmcesmon. It is used for managing export services of GPFS. If it is
> > killed, your nfs/smb etc will be out of work.
> > Their overhead is small and they are very important. Don't
> > attempt to kill them.
> >
> >
> >
> >
> >
> >
> > Our question is this — we don’t run the latest “protocols", our
> > NFS is CNFS, and our CIFS is clustered CIFS.
> > I can understand it might be needed with Ganesha, but on every node?
> >
> >
> > Why in the world would I be getting this daemon running on all
> > client nodes, when I didn’t install the “protocols" version
> > of the distribution? We have release 4.2.2 at the moment. How
> > can we disable this?
> >
> >
> > Thanks,
> > — ddj
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170719/669c525b/attachment.htm>
More information about the gpfsug-discuss
mailing list