[gpfsug-discuss] RAID type for system pool
Aaron Knister
aaron.s.knister at nasa.gov
Wed Sep 5 23:42:05 BST 2018
I've heard it highly recommended (and have been *really* glad at times
to have it) to have at least 2 replicas of metadata to help maintain fs
consistency in the event of fs issues or hardware bugs (e.g. a torn write).
-Aaron
On 9/5/18 1:37 PM, Frederick Stock wrote:
> Another option for saving space is to not keep 2 copies of the metadata
> within GPFS. The SSDs are mirrored so you have two copies though very
> likely they share a possible single point of failure and that could be a
> deal breaker. I have my doubts that RAID5 will perform well for the
> reasons Marc described but worth testing to see how it does perform. If
> you do test I presume you would also run equivalent tests with a RAID1
> (mirrored) configuration.
>
> Regarding your point about making multiple volumes that would become
> GPFS NSDs for metadata. It has been my experience that for traditional
> RAID systems it is better to have many small metadata LUNs (more IO
> paths) then a few large metadata LUNs. This becomes less of an issue
> with ESS, i.e. there you can have a few metadata NSDs yet still get very
> good performance.
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
>
>
>
> From: "Marc A Kaplan" <makaplan at us.ibm.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 09/05/2018 01:22 PM
> Subject: Re: [gpfsug-discuss] RAID type for system pool
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> It's good to try to reason and think this out... But there's a good
> likelihood that we don't understand ALL the details, some of which may
> negatively impact performance - so no matter what scheme you come up
> with - test, test, and re-test before deploying and depending on it in
> production.
>
> Having said that, I'm pretty sure that old "spinning" RAID 5
> implementations had horrible performance for GPFS metadata/system pool.
> Why? Among other things, the large stripe size vs the almost random
> small writes directed to system pool.
>
> That random-small-writes pattern won't change when we go to SSD RAID 5 -
> so you'd have to see if the SSD implementation is somehow smarter than
> an old fashioned RAID 5 implementation which I believe requires several
> physical reads and writes, for each "small" logical write.
> (Top decent google result I found quickly
> _http://rickardnobel.se/raid-5-write-penalty/_But you will probably want
> to do more research!)
>
> Consider GPFS small write performance for: inode updates, log writes,
> small files (possibly in inode), directory updates, allocation map
> updates, index of indirect blocks.
>
>
>
> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 09/05/2018 11:36 AM
> Subject: [gpfsug-discuss] RAID type for system pool
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
>
>
>
> Hi All,
>
> We are in the process of finalizing the purchase of some new storage
> arrays (so no sales people who might be monitoring this list need
> contact me) to life-cycle some older hardware. One of the things we are
> considering is the purchase of some new SSD’s for our “/home” filesystem
> and I have a question or two related to that.
>
> Currently, the existing home filesystem has it’s metadata on SSD’s … two
> RAID 1 mirrors and metadata replication set to two. However, the
> filesystem itself is old enough that it uses 512 byte inodes. We have
> analyzed our users files and know that if we create a new filesystem
> with 4K inodes that a very significant portion of the files would now
> have their _data_ stored in the inode as well due to the files being
> 3.5K or smaller (currently all data is on spinning HD RAID 1 mirrors).
>
> Of course, if we increase the size of the inodes by a factor of 8 then
> we also need 8 times as much space to store those inodes. Given that
> Enterprise class SSDs are still very expensive and our budget is not
> unlimited, we’re trying to get the best bang for the buck.
>
> We have always - even back in the day when our metadata was on spinning
> disk and not SSD - used RAID 1 mirrors and metadata replication of two.
> However, we are wondering if it might be possible to switch to RAID 5?
> Specifically, what we are considering doing is buying 8 new SSDs and
> creating two 3+1P RAID 5 LUNs (metadata replication would stay at two).
> That would give us 50% more usable space than if we configured those
> same 8 drives as four RAID 1 mirrors.
>
> Unfortunately, unless I’m misunderstanding something, mean that the RAID
> stripe size and the GPFS block size could not match. Therefore, even
> though we don’t need the space, would we be much better off to buy 10
> SSDs and create two 4+1P RAID 5 LUNs?
>
> I’ve searched the mailing list archives and scanned the DeveloperWorks
> wiki and even glanced at the GPFS documentation and haven’t found
> anything that says “bad idea, Kevin”… ;-)
>
> Expanding on this further … if we just present those two RAID 5 LUNs to
> GPFS as NSDs then we can only have two NSD servers as primary for them.
> So another thing we’re considering is to take those RAID 5 LUNs and
> further sub-divide them into a total of 8 logical volumes, each of which
> could be a GPFS NSD and therefore would allow us to have each of our 8
> NSD servers be primary for one of them. Even worse idea?!? Good idea?
>
> Anybody have any better ideas??? ;-)
>
> Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on
> moving to GPFS 5.0.1-x before creating the new filesystem.
>
> Thanks much…
>
> —
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education_
> __Kevin.Buterbaugh at vanderbilt.edu_
> <mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list