[gpfsug-discuss] bizarre performance behavior
Kenneth Waegeman
kenneth.waegeman at ugent.be
Fri Apr 21 10:43:25 BST 2017
Hi,
We are running a test setup with 2 NSD Servers backed by 4 Dell
Powervaults MD3460s. nsd00 is primary serving LUNS of controller A of
the 4 powervaults, nsd02 is primary serving LUNS of controller B.
We are testing from 2 testing machines connected to the nsds with
infiniband, verbs enabled.
When we do dd from the NSD servers, we see indeed performance going to
5.8GB/s for one nsd, 7.2GB/s for the two! So it looks like GPFS is able
to get the data at a decent speed. Since we can write from the clients
at a good speed, I didn't suspect the communication between clients and
nsds being the issue, especially since total performance stays the same
using 1 or multiple clients.
I'll use the nsdperf tool to see if we can find anything,
thanks!
K
On 20/04/17 17:04, Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP]
wrote:
> Interesting. Could you share a little more about your architecture? Is
> it possible to mount the fs on an NSD server and do some dd's from the
> fs on the NSD server? If that gives you decent performance perhaps try
> NSDPERF next
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+(GPFS)/page/Testing+network+performance+with+nsdperf
>
>
> -Aaron
>
>
>
>
> On April 20, 2017 at 10:53:47 EDT, Kenneth Waegeman
> <kenneth.waegeman at ugent.be> wrote:
>>
>> Hi,
>>
>>
>> Having an issue that looks the same as this one:
>>
>> We can do sequential writes to the filesystem at 7,8 GB/s total ,
>> which is the expected speed for our current storage
>> backend. While we have even better performance with sequential reads
>> on raw storage LUNS, using GPFS we can only reach 1GB/s in total
>> (each nsd server seems limited by 0,5GB/s) independent of the number
>> of clients
>> (1,2,4,..) or ways we tested (fio,dd). We played with blockdev
>> params, MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as
>> discussed in this thread, but nothing seems to impact this read
>> performance.
>>
>> Any ideas?
>>
>> Thanks!
>>
>> Kenneth
>>
>> On 17/02/17 19:29, Jan-Frode Myklebust wrote:
>>> I just had a similar experience from a sandisk infiniflash system
>>> SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for
>>> writes. and 250-300 Mbyte/s on sequential reads!! Random reads were
>>> on the order of 2 Gbyte/s.
>>>
>>> After a bit head scratching snd fumbling around I found out that
>>> reducing maxMBpS from 10000 to 100 fixed the problem! Digging
>>> further I found that reducing prefetchThreads from default=72 to 32
>>> also fixed it, while leaving maxMBpS at 10000. Can now also read at
>>> 3,2 GByte/s.
>>>
>>> Could something like this be the problem on your box as well?
>>>
>>>
>>>
>>> -jf
>>> fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister
>>> <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>>:
>>>
>>> Well, I'm somewhat scrounging for hardware. This is in our test
>>> environment :) And yep, it's got the 2U gpu-tray in it although even
>>> without the riser it has 2 PCIe slots onboard (excluding the
>>> on-board
>>> dual-port mezz card) so I think it would make a fine NSD server even
>>> without the riser.
>>>
>>> -Aaron
>>>
>>> On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT
>>> Services)
>>> wrote:
>>> > Maybe its related to interrupt handlers somehow? You drive the
>>> load up on one socket, you push all the interrupt handling to
>>> the other socket where the fabric card is attached?
>>> >
>>> > Dunno ... (Though I am intrigued you use idataplex nodes as
>>> NSD servers, I assume its some 2U gpu-tray riser one or something !)
>>> >
>>> > Simon
>>> > ________________________________________
>>> > From: gpfsug-discuss-bounces at spectrumscale.org
>>> <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>>> [gpfsug-discuss-bounces at spectrumscale.org
>>> <mailto:gpfsug-discuss-bounces at spectrumscale.org>] on behalf of
>>> Aaron Knister [aaron.s.knister at nasa.gov
>>> <mailto:aaron.s.knister at nasa.gov>]
>>> > Sent: 17 February 2017 15:52
>>> > To: gpfsug main discussion list
>>> > Subject: [gpfsug-discuss] bizarre performance behavior
>>> >
>>> > This is a good one. I've got an NSD server with 4x 16GB fibre
>>> > connections coming in and 1x FDR10 and 1x QDR connection going
>>> out to
>>> > the clients. I was having a really hard time getting anything
>>> resembling
>>> > sensible performance out of it (4-5Gb/s writes but maybe
>>> 1.2Gb/s for
>>> > reads). The back-end is a DDN SFA12K and I *know* it can do
>>> better than
>>> > that.
>>> >
>>> > I don't remember quite how I figured this out but simply by
>>> running
>>> > "openssl speed -multi 16" on the nsd server to drive up the
>>> load I saw
>>> > an almost 4x performance jump which is pretty much goes
>>> against every
>>> > sysadmin fiber in me (i.e. "drive up the cpu load with
>>> unrelated crap to
>>> > quadruple your i/o performance").
>>> >
>>> > This feels like some type of C-states frequency scaling
>>> shenanigans that
>>> > I haven't quite ironed down yet. I booted the box with the
>>> following
>>> > kernel parameters "intel_idle.max_cstate=0
>>> processor.max_cstate=0" which
>>> > didn't seem to make much of a difference. I also tried setting the
>>> > frequency governer to userspace and setting the minimum
>>> frequency to
>>> > 2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I
>>> still have
>>> > to run something to drive up the CPU load and then performance
>>> improves.
>>> >
>>> > I'm wondering if this could be an issue with the C1E state?
>>> I'm curious
>>> > if anyone has seen anything like this. The node is a dx360 M4
>>> > (Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
>>> >
>>> > -Aaron
>>> >
>>> > --
>>> > Aaron Knister
>>> > NASA Center for Climate Simulation (Code 606.2)
>>> > Goddard Space Flight Center
>>> > (301) 286-2776
>>> > _______________________________________________
>>> > gpfsug-discuss mailing list
>>> > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> > _______________________________________________
>>> > gpfsug-discuss mailing list
>>> > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> >
>>>
>>> --
>>> Aaron Knister
>>> NASA Center for Climate Simulation (Code 606.2)
>>> Goddard Space Flight Center
>>> (301) 286-2776
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170421/f965dfbe/attachment.htm>
More information about the gpfsug-discuss
mailing list