[gpfsug-discuss] nosmap parameter for RHEL7 x86_64 onHaswell/Broadwell?

Yuri L Volobuev volobuev at us.ibm.com
Tue Jun 7 19:25:39 BST 2016


Hi Paul,

Yes, GPFS certainly needs to behave better in this situation.  We are
currently working on proper support for running on newer hardware that
supports Superuser Mode Access Prevention (SMAP) instructions.  I believe
those are new to Broadwell CPUs, but there's some confusing info out there,
I'm not positive what the deal is with Haswell.  For the time being,
booting with the "nosmap" kernel parameter is the workaround, but you're
absolutely correct, the code needs to fail more gracefully when SMAP is
enabled.  We'll fix that.

The current FAQ structure is, without question, suboptimal.  We're looking
for a better format to present this information, along the lines of more
modern approaches like a structured Knowledge Base.  The problem is
recognized, on our end, but we've been having hard time making forward
progress on this.

yuri



From:	"Sanchez, Paul" <Paul.Sanchez at deshaw.com>
To:	"gpfsug main discussion list
            (gpfsug-discuss at spectrumscale.org)"
            <gpfsug-discuss at spectrumscale.org>,
Date:	06/03/2016 06:38 AM
Subject:	[gpfsug-discuss] nosmap parameter for RHEL7 x86_64 on
            Haswell/Broadwell?
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



After some puzzling debugging on our new Broadwell servers, all of which
slowly became brick-like upon after getting stuck starting GPFS, we
discovered that this was already a known issue in the FAQ.  Adding “nosmap”
to the kernel command line in grub prevents SMAP from seeing the
kernel-userspace memory interactions of GPFS as a reason to slowly grind
all cores to a standstill, apparently spinning on stuck locks(?).  (Big
thanks go to RedHat for turning us on to the answer when we opened a case.)

From
https://www.ibm.com/support/knowledgecenter/STXKQY/gpfsclustersfaq.html,
section 3.2:

Note:  In order for IBM Spectrum Scale on RHEL 7 to run on the Haswell
processor
Disable the Supervisor Mode Access Prevention (smap) kernel parameter
Reboot the RHEL 7 node before using GPFS


Some observations worth noting:

1.	We’ve been running for a year with Haswell processors and have hundreds
of Haswell RHEL7 nodes which do not exhibit this problem.  So maybe this
only really affects Broadwell CPUs?
2.	It would be very nice for SpectrumScale to take a peek at /proc/cpuinfo
and /proc/cmdline before starting up, and refuse to break the host when it
has affected processors and kernel without “nosmap”.  Instead, an error
message describing the fix would have made my day.
3.	I’m going to have to start using a script to diff the FAQ for these
gotchas, unless anyone knows of a better way to subscribe just to updates
to this doc.

Thanks,
Paul Sanchez
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160607/5c9f5311/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160607/5c9f5311/attachment.gif>


More information about the gpfsug-discuss mailing list