[gpfsug-discuss] bizarre performance behavior

Aaron Knister aaron.s.knister at nasa.gov
Fri Feb 17 17:13:08 GMT 2017


Well, I'm somewhat scrounging for hardware. This is in our test 
environment :) And yep, it's got the 2U gpu-tray in it although even 
without the riser it has 2 PCIe slots onboard (excluding the on-board 
dual-port mezz card) so I think it would make a fine NSD server even 
without the riser.

-Aaron

On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services) 
wrote:
> Maybe its related to interrupt handlers somehow? You drive the load up on one socket, you push all the interrupt handling to the other socket where the fabric card is attached?
>
> Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers, I assume its some 2U gpu-tray riser one or something !)
>
> Simon
> ________________________________________
> From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Aaron Knister [aaron.s.knister at nasa.gov]
> Sent: 17 February 2017 15:52
> To: gpfsug main discussion list
> Subject: [gpfsug-discuss] bizarre performance behavior
>
> This is a good one. I've got an NSD server with 4x 16GB fibre
> connections coming in and 1x FDR10 and 1x QDR connection going out to
> the clients. I was having a really hard time getting anything resembling
> sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
> reads). The back-end is a DDN SFA12K and I *know* it can do better than
> that.
>
> I don't remember quite how I figured this out but simply by running
> "openssl speed -multi 16" on the nsd server to drive up the load I saw
> an almost 4x performance jump which is pretty much goes against every
> sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
> quadruple your i/o performance").
>
> This feels like some type of C-states frequency scaling shenanigans that
> I haven't quite ironed down yet. I booted the box with the following
> kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
> didn't seem to make much of a difference. I also tried setting the
> frequency governer to userspace and setting the minimum frequency to
> 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
> to run something to drive up the CPU load and then performance improves.
>
> I'm wondering if this could be an issue with the C1E state? I'm curious
> if anyone has seen anything like this. The node is a dx360 M4
> (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list