[gpfsug-discuss] RAID type for system pool

Bryan Banister bbanister at jumptrading.com
Wed Sep 5 18:33:03 BST 2018


I agree with Anderson on his thoughts, mainly that if you want to go with RAID5 then you should analyze your current workload to see if it is mostly read operations or if you have more of a heavy write situation.  Read-modify-write penalties and write amplification wearing problems on SSDs will become an issue for performance and life of the SSDs if you have a heavy metadata write workload.  This also applies to the data in inode situation.  The current workload can be inspected with standard iostat, mmdiag --iohist, mmpmon, and the GPFS perfmon stuff.

We have SSDs in both RAID1 (metadata) and RAID5 configurations (data).  We’re using the RAID controllers to split up the RAID sets into multiple virtual volumes so that we can have more NSD servers hosting the storage and increase the number of I/O commands (aka queue depth x N LUNs > queue depth x 1 LUN) being sent to the storage.  Since there isn’t a seek penalty this is working well for us.

As mentioned below, be sure to round-robin the ServerList for the NSDs to spread the load across servers.

Hope that helps!
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Anderson Ferreira Nobre
Sent: Wednesday, September 5, 2018 11:51 AM
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] RAID type for system pool

Note: External Email
________________________________
Hi Kevin,

RAID5 is good when the read ratio of I/Os is 70% or more. About creating two RAID5 you need to consider the size of the disks and the time to rebuild the RAID in case of failure. Maybe a single RAID5 would be better because you have more disks working in the backend for a single RAID. I think since if you are using SSD disks the time to rebuild the RAID will always be fast. So you wouldn't need a RAID6. Maybe it's a good idea to read the manual of SAS RAID controller to see how long takes to rebuild the RAID in case of a failure.
About the stripe size of controller vs block size in GPFS. This is just a guess, and you would need to do some performance test to make sure. You could consider the stripe width of RAID to be the block size of metadata. I think this is the best you can do.
Break in several LUNs I consider a good idea for you don't have large queue length in the LUNs. Specially if the I/O profile is many I/O with small block size.
About balance the LUNs over the NSD Servers is a best practice. Do not leave all the LUNs pointing to the first node. Just remember that when you create the NSDs, the device is always corresponding to the first node of servers. This can be laborous work. So to make the things easier I create two NSD stanza files. The first one pointing to the first node like this:
%nsd device=/dev/mapper/mpatha
    nsd=nsd001
    servers=host1,host2,host3,host4
    usage=metadataOnly
    failureGroup=1
    pool=system

%nsd device=/dev/mapper/mpathb
    nsd=nsd002
    servers=
    servers=host1,host2,host3,host4
    usage=metadataOnly
    failureGroup=1
    pool=system

Then I use this stanza file to create the nsds. And create a second stanza file:
%nsd
    nsd=nsd001
    servers=host1,host2,host3,host4
    usage=metadataOnly
    failureGroup=1
    pool=system

%nsd
    nsd=nsd002
    servers=host2,host3,host4,host1
    usage=metadataOnly
    failureGroup=1
    pool=system

And change with mmchnsd.

Abraços / Regards / Saludos,


Anderson Nobre
AIX & Power Consultant
Master Certified IT Specialist
IBM Systems Hardware Client Technical Team – IBM Systems Lab Services

[community_general_lab_services]




________________________________

Phone: 55-19-2132-4317
E-mail: anobre at br.ibm.com<mailto:anobre at br.ibm.com>

[IBM]



----- Original message -----
From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [gpfsug-discuss] RAID type for system pool
Date: Wed, Sep 5, 2018 12:35 PM

Hi All,

We are in the process of finalizing the purchase of some new storage arrays (so no sales people who might be monitoring this list need contact me) to life-cycle some older hardware.  One of the things we are considering is the purchase of some new SSD’s for our “/home” filesystem and I have a question or two related to that.

Currently, the existing home filesystem has it’s metadata on SSD’s … two RAID 1 mirrors and metadata replication set to two.  However, the filesystem itself is old enough that it uses 512 byte inodes.  We have analyzed our users files and know that if we create a new filesystem with 4K inodes that a very significant portion of the files would now have their _data_ stored in the inode as well due to the files being 3.5K or smaller (currently all data is on spinning HD RAID 1 mirrors).

Of course, if we increase the size of the inodes by a factor of 8 then we also need 8 times as much space to store those inodes.  Given that Enterprise class SSDs are still very expensive and our budget is not unlimited, we’re trying to get the best bang for the buck.

We have always - even back in the day when our metadata was on spinning disk and not SSD - used RAID 1 mirrors and metadata replication of two.  However, we are wondering if it might be possible to switch to RAID 5?  Specifically, what we are considering doing is buying 8 new SSDs and creating two 3+1P RAID 5 LUNs (metadata replication would stay at two).  That would give us 50% more usable space than if we configured those same 8 drives as four RAID 1 mirrors.

Unfortunately, unless I’m misunderstanding something, mean that the RAID stripe size and the GPFS block size could not match.  Therefore, even though we don’t need the space, would we be much better off to buy 10 SSDs and create two 4+1P RAID 5 LUNs?

I’ve searched the mailing list archives and scanned the DeveloperWorks wiki and even glanced at the GPFS documentation and haven’t found anything that says “bad idea, Kevin”… ;-)

Expanding on this further … if we just present those two RAID 5 LUNs to GPFS as NSDs then we can only have two NSD servers as primary for them.  So another thing we’re considering is to take those RAID 5 LUNs and further sub-divide them into a total of 8 logical volumes, each of which could be a GPFS NSD and therefore would allow us to have each of our 8 NSD servers be primary for one of them.  Even worse idea?!?  Good idea?

Anybody have any better ideas???  ;-)

Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on moving to GPFS 5.0.1-x before creating the new filesystem.

Thanks much…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company’s treatment of personal data, please email datarequests at jumptrading.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180905/5cdd90d5/attachment.htm>


More information about the gpfsug-discuss mailing list