[gpfsug-discuss] mmsysmon.py revisited
david_johnson at brown.edu
david_johnson at brown.edu
Wed Jul 19 19:12:37 BST 2017
We have FDR14 Mellanox fabric, probably similar interrupt load as OPA.
-- ddj
Dave Johnson
On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson <jonathon.anderson at colorado.edu> wrote:
>> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
>
> I suspect it’s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.
>
> We’ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.
>
> The official company line of “we don't see significant CPU consumption by mmsysmon on our test systems” isn’t helping. Do you have a test system with OPA?
>
> ~jonathon
>
>
> On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:
>
> thanks for the feedback.
>
> Let me clarify what mmsysmon is doing.
> Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
> Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
>
> This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
>
>
>> It’s a huge problem. I don’t understand why it hasn’t been given
>
>> much credit by dev or support.
>
> Over the last couple of month, the development team has put a strong focus on this topic.
>
> In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
> We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
>
>
> Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
>
> See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
> In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
>
>
> Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
>
> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
>
> Kind regards
>
> Mathias Dietz
>
> IBM Spectrum Scale - Release Lead Architect and RAS Architect
>
>
>
> gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
>
>> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Date: 07/18/2017 07:51 PM
>> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>
>> There’s no official way to cleanly disable it so far as I know yet;
>> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
>> mmsysmonitor.conf.
>>
>> It’s a huge problem. I don’t understand why it hasn’t been given
>> much credit by dev or support.
>>
>> ~jonathon
>>
>>
>> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on
>> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org
>> on behalf of david_johnson at brown.edu> wrote:
>>
>>
>>
>>
>> We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
>> our diskless compute nodes. I read the earlier query, where it
>> was answered:
>>
>>
>>
>>
>> ces == Cluster Export Services, mmsysmon.py comes from
>> mmcesmon. It is used for managing export services of GPFS. If it is
>> killed, your nfs/smb etc will be out of work.
>> Their overhead is small and they are very important. Don't
>> attempt to kill them.
>>
>>
>>
>>
>>
>>
>> Our question is this — we don’t run the latest “protocols", our
>> NFS is CNFS, and our CIFS is clustered CIFS.
>> I can understand it might be needed with Ganesha, but on every node?
>>
>>
>> Why in the world would I be getting this daemon running on all
>> client nodes, when I didn’t install the “protocols" version
>> of the distribution? We have release 4.2.2 at the moment. How
>> can we disable this?
>>
>>
>> Thanks,
>> — ddj
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
More information about the gpfsug-discuss
mailing list