[gpfsug-discuss] CES log files

Sobey, Richard A r.sobey at imperial.ac.uk
Thu Jan 12 09:51:12 GMT 2017


Thanks Christof. Would this patch have made it in to CES/GPFS 4.2.1-2.. from what you say probably not?

This whole incident was caused by a scheduled and extremely rare shutdown of our main datacentre for electrical testing. It's not something that's likely to happen again if at all so reproducing it will be nigh on impossible.

Food for thought though!

Richard

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christof Schmitt
Sent: 11 January 2017 22:33
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] CES log files

A winbindd process taking up 100% could be caused by the problem documented in https://bugzilla.samba.org/show_bug.cgi?id=12105

Capturing a brief strace of the affected process and reporting that through a PMR would be helpful to debug this problem and provide a fix.

To answer the wider question: Log files are kept in /var/adm/ras/. In case more detailed traces are required, use the mmprotocoltrace command.

Regards,

Christof Schmitt || IBM || Spectrum Scale Development || Tucson, AZ
christof.schmitt at us.ibm.com  ||  +1-520-799-2469    (T/L: 321-2469)



From:   "Sobey, Richard A" <r.sobey at imperial.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   01/11/2017 07:00 AM
Subject:        Re: [gpfsug-discuss] CES log files
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Thanks. Some of the node would just say “failed” or “degraded” with the DCs offline. Of those that thought they were happy to host a CES IP address, they did not respond and winbindd process would take up 100% CPU as seen through top with no users on it.
 
Interesting that even though all CES nodes had the same configuration, three of them never had a problem at all.
 
JF – I’ll look at the protocol tracing next time this happens. It’s a rare thing that three DCs go offline at once but even so there should have been enough resiliency to cope.
 
Thanks
Richard
 
From: gpfsug-discuss-bounces at spectrumscale.org [ mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Andrew Beattie
Sent: 11 January 2017 09:55
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] CES log files
 
mmhealth might be a good place to start
 
CES should probably throw a message along the lines of the following:
 
mmhealth shows something is wrong with AD server:
...
CES                      DEGRADED                 ads_down 
...
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com
 
 
----- Original message -----
From: "Sobey, Richard A" <r.sobey at imperial.ac.uk> Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "'gpfsug-discuss at spectrumscale.org'" <gpfsug-discuss at spectrumscale.org
>
Cc:
Subject: [gpfsug-discuss] CES log files
Date: Wed, Jan 11, 2017 7:27 PM
 
Which files do I need to look in to determine what’s happening with CES… supposing for example a load of domain controllers were shut down and CES had no clue how to handle this and stopped working until the DCs were switched back on again.
 
Mmfs.log.latest said everything was fine btw.
 
Thanks
Richard
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
 
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


More information about the gpfsug-discuss mailing list