[gpfsug-discuss] mmsysmon.py revisited

david_johnson at brown.edu david_johnson at brown.edu
Wed Jul 19 19:12:37 BST 2017


We have FDR14 Mellanox fabric, probably similar interrupt load as OPA. 

  -- ddj
Dave Johnson

On Jul 19, 2017, at 1:52 PM, Jonathon A Anderson <jonathon.anderson at colorado.edu> wrote:

>> It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
> 
> I suspect it’s actually a result of frequent IO interrupts causing jitter in conflict with MPI on the shared Intel Omni-Path network, in our case.
> 
> We’ve already tried pursuing support on this through our vendor, DDN, and got no-where. Eventually we were the ones who tried killing mmsysmon, and that fixed our problem.
> 
> The official company line of “we don't see significant CPU consumption by mmsysmon on our test systems” isn’t helping. Do you have a test system with OPA?
> 
> ~jonathon
> 
> 
> On 7/19/17, 7:05 AM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Mathias Dietz" <gpfsug-discuss-bounces at spectrumscale.org on behalf of MDIETZ at de.ibm.com> wrote:
> 
>    thanks for the feedback. 
> 
>    Let me clarify what mmsysmon is doing.
>    Since IBM Spectrum Scale 4.2.1 the mmsysmon process is used for the overall health monitoring and CES failover handling.
>    Even without CES it is an essential part of the system because it monitors the individual components and provides health state information and error events.
> 
>    This information is needed by other Spectrum Scale components (mmhealth command, the IBM Spectrum Scale GUI, Support tools, Install Toolkit,..) and therefore disabling mmsysmon will impact them.
> 
> 
>> It’s a huge problem. I don’t understand why it hasn’t been given
> 
>> much credit by dev or support.
> 
>    Over the last couple of month, the development team has put a strong focus on this topic.
> 
>    In order to monitor the health of the individual components, mmsysmon listens for notifications/callback but also has to do some polling.
>    We are trying to reduce the polling overhead constantly and replace polling with notifications when possible.
> 
> 
>    Several improvements have been added to 4.2.3, including the ability to configure the polling frequency to reduce the overhead. (mmhealth config interval)
> 
>    See https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adm_mmhealth.htm
>    In addition a new option has been introduced to clock align the monitoring threads in order to reduce CPU jitter.
> 
> 
>    Nevertheless, we don't see significant CPU consumption by mmsysmon on our test systems.
>        
>    It might be a problem specific to your system environment or a wrong configuration therefore please get in contact with IBM support to analyze the root cause of the high usage.
> 
>    Kind regards
> 
>    Mathias Dietz
> 
>    IBM Spectrum Scale - Release Lead Architect and RAS Architect
> 
> 
> 
>    gpfsug-discuss-bounces at spectrumscale.org wrote on 07/18/2017 07:51:21 PM:
> 
>> From: Jonathon A Anderson <jonathon.anderson at colorado.edu>
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Date: 07/18/2017 07:51 PM
>> Subject: Re: [gpfsug-discuss] mmsysmon.py revisited
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> 
>> There’s no official way to cleanly disable it so far as I know yet; 
>> but you can defacto disable it by deleting /var/mmfs/mmsysmon/
>> mmsysmonitor.conf.
>> 
>> It’s a huge problem. I don’t understand why it hasn’t been given 
>> much credit by dev or support.
>> 
>> ~jonathon
>> 
>> 
>> On 7/18/17, 11:21 AM, "gpfsug-discuss-bounces at spectrumscale.org on 
>> behalf of David Johnson" <gpfsug-discuss-bounces at spectrumscale.org 
>> on behalf of david_johnson at brown.edu> wrote:
>> 
>> 
>> 
>> 
>>    We also noticed a fair amount of CPU time accumulated by mmsysmon.py on
>>    our diskless compute nodes. I read the earlier query, where it 
>> was answered:
>> 
>> 
>> 
>> 
>>    ces == Cluster Export Services,  mmsysmon.py comes from 
>> mmcesmon. It is used for managing export services of GPFS. If it is 
>> killed,  your nfs/smb etc will be out of work.
>>    Their overhead is small and they are very important. Don't 
>> attempt to kill them.
>> 
>> 
>> 
>> 
>> 
>> 
>>    Our question is this — we don’t run the latest “protocols", our 
>> NFS is CNFS, and our CIFS is clustered CIFS.
>>    I can understand it might be needed with Ganesha, but on every node? 
>> 
>> 
>>    Why in the world would I be getting this daemon running on all 
>> client nodes, when I didn’t install the “protocols" version 
>>    of the distribution?   We have release 4.2.2 at the moment.  How
>> can we disable this?
>> 
>> 
>>    Thanks,
>>     — ddj
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list