[gpfsug-discuss] RAID type for system pool

Aaron Knister aaron.s.knister at nasa.gov
Wed Sep 5 23:42:05 BST 2018


I've heard it highly recommended (and have been *really* glad at times 
to have it) to have at least 2 replicas of metadata to help maintain fs 
consistency in the event of fs issues or hardware bugs (e.g. a torn write).

-Aaron

On 9/5/18 1:37 PM, Frederick Stock wrote:
> Another option for saving space is to not keep 2 copies of the metadata 
> within GPFS.  The SSDs are mirrored so you have two copies though very 
> likely they share a possible single point of failure and that could be a 
> deal breaker.  I have my doubts that RAID5 will perform well for the 
> reasons Marc described but worth testing to see how it does perform.  If 
> you do test I presume you would also run equivalent tests with a RAID1 
> (mirrored) configuration.
> 
> Regarding your point about making multiple volumes that would become 
> GPFS NSDs for metadata.  It has been my experience that for traditional 
> RAID systems it is better to have many small metadata LUNs (more IO 
> paths) then a few large metadata LUNs.  This becomes less of an issue 
> with ESS, i.e. there you can have a few metadata NSDs yet still get very 
> good performance.
> 
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821
> stockf at us.ibm.com
> 
> 
> 
> From: "Marc A Kaplan" <makaplan at us.ibm.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 09/05/2018 01:22 PM
> Subject: Re: [gpfsug-discuss] RAID type for system pool
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
> 
> 
> 
> It's good to try to reason and think this out... But there's a good 
> likelihood that we don't understand ALL the details, some of which may 
> negatively impact performance - so no matter what scheme you come up 
> with - test, test, and re-test before deploying and depending on it in 
> production.
> 
> Having said that, I'm pretty sure that old "spinning" RAID 5 
> implementations had horrible performance for GPFS metadata/system pool.
> Why? Among other things, the large stripe size vs the almost random 
> small writes directed to system pool.
> 
> That random-small-writes pattern won't change when we go to SSD RAID 5 - 
> so you'd have to see if the SSD implementation is somehow smarter than 
> an old fashioned RAID 5 implementation which I believe requires several 
> physical reads and writes, for each "small" logical write.
> (Top decent google result I found quickly 
> _http://rickardnobel.se/raid-5-write-penalty/_But you will probably want 
> to do more research!)
> 
> Consider GPFS small write performance for:  inode updates, log writes, 
> small files (possibly in inode), directory updates, allocation map 
> updates, index of indirect blocks.
> 
> 
> 
> From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 09/05/2018 11:36 AM
> Subject: [gpfsug-discuss] RAID type for system pool
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------------------------------------------------
> 
> 
> 
> Hi All,
> 
> We are in the process of finalizing the purchase of some new storage 
> arrays (so no sales people who might be monitoring this list need 
> contact me) to life-cycle some older hardware.  One of the things we are 
> considering is the purchase of some new SSD’s for our “/home” filesystem 
> and I have a question or two related to that.
> 
> Currently, the existing home filesystem has it’s metadata on SSD’s … two 
> RAID 1 mirrors and metadata replication set to two.  However, the 
> filesystem itself is old enough that it uses 512 byte inodes.  We have 
> analyzed our users files and know that if we create a new filesystem 
> with 4K inodes that a very significant portion of the files would now 
> have their _data_ stored in the inode as well due to the files being 
> 3.5K or smaller (currently all data is on spinning HD RAID 1 mirrors).
> 
> Of course, if we increase the size of the inodes by a factor of 8 then 
> we also need 8 times as much space to store those inodes.  Given that 
> Enterprise class SSDs are still very expensive and our budget is not 
> unlimited, we’re trying to get the best bang for the buck.
> 
> We have always - even back in the day when our metadata was on spinning 
> disk and not SSD - used RAID 1 mirrors and metadata replication of two. 
>   However, we are wondering if it might be possible to switch to RAID 5? 
>   Specifically, what we are considering doing is buying 8 new SSDs and 
> creating two 3+1P RAID 5 LUNs (metadata replication would stay at two). 
>   That would give us 50% more usable space than if we configured those 
> same 8 drives as four RAID 1 mirrors.
> 
> Unfortunately, unless I’m misunderstanding something, mean that the RAID 
> stripe size and the GPFS block size could not match.  Therefore, even 
> though we don’t need the space, would we be much better off to buy 10 
> SSDs and create two 4+1P RAID 5 LUNs?
> 
> I’ve searched the mailing list archives and scanned the DeveloperWorks 
> wiki and even glanced at the GPFS documentation and haven’t found 
> anything that says “bad idea, Kevin”… ;-)
> 
> Expanding on this further … if we just present those two RAID 5 LUNs to 
> GPFS as NSDs then we can only have two NSD servers as primary for them. 
>   So another thing we’re considering is to take those RAID 5 LUNs and 
> further sub-divide them into a total of 8 logical volumes, each of which 
> could be a GPFS NSD and therefore would allow us to have each of our 8 
> NSD servers be primary for one of them.  Even worse idea?!?  Good idea?
> 
> Anybody have any better ideas???  ;-)
> 
> Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on 
> moving to GPFS 5.0.1-x before creating the new filesystem.
> 
> Thanks much…
> 
>> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and 
> Education_
> __Kevin.Buterbaugh at vanderbilt.edu_ 
> <mailto:Kevin.Buterbaugh at vanderbilt.edu>- (615)875-9633
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org_
> __http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list