[gpfsug-discuss] Fwd: Blocksize

Thu Sep 29 16:03:08 BST 2016

Resending from the right e-mail address...

Begin forwarded message:

From: gpfsug-discuss-owner at spectrumscale.org<mailto:gpfsug-discuss-owner at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Blocksize
Date: September 29, 2016 at 10:00:36 AM CDT
To: klb at accre.vanderbilt.edu<mailto:klb at accre.vanderbilt.edu>

You are not allowed to post to this mailing list, and your message has
been automatically rejected.  If you think that your messages are
being rejected in error, contact the mailing list owner at
gpfsug-discuss-owner at spectrumscale.org<mailto:gpfsug-discuss-owner at spectrumscale.org>.

From: "Kevin L. Buterbaugh" <klb at accre.vanderbilt.edu<mailto:klb at accre.vanderbilt.edu>>
Subject: Re: [gpfsug-discuss] Blocksize
Date: September 29, 2016 at 10:00:29 AM CDT
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>

Hi Marc and others,

I understand … I guess I did a poor job of wording my question, so I’ll try again.  The IBM recommendation for metadata block size seems to be somewhere between 256K - 1 MB, depending on who responds to the question.  If I were to hypothetically use a 256K metadata block size, does the “1/32nd of a block” come into play like it does for “not metadata”?  I.e. 256 / 32 = 8K, so am I reading / writing *2* inodes (assuming 4K inode size) minimum?

And here’s a really off the wall question … yesterday we were discussing the fact that there is now a single inode file.  Historically, we have always used RAID 1 mirrors (first with spinning disk, as of last fall now on SSD) for metadata and then use GPFS replication on top of that.  But given that there is a single inode file is that “old way” of doing things still the right way?  In other words, could we potentially be better off by using a couple of 8+2P RAID 6 LUNs?

One potential downside of that would be that we would then only have two NSD servers serving up metadata, so we discussed the idea of taking each RAID 6 LUN and splitting it up into multiple logical volumes (all that done on the storage array, of course) and then presenting those to GPFS as NSDs???

Or have I gone from merely asking stupid questions to Trump-level craziness????  ;-)

Kevin

On Sep 28, 2016, at 10:23 AM, Marc A Kaplan <makaplan at us.ibm.com<mailto:makaplan at us.ibm.com>> wrote:

OKAY, I'll say it again.  inodes are PACKED into a single inode file.  So a 4KB inode takes 4KB, REGARDLESS of metadata blocksize.  There is no wasted space.

(Of course if you have metadata replication = 2, then yes, double that.  And yes, there overhead for indirect blocks (indices), allocation maps, etc, etc.)

And your choice is not just 512 or 4096.  Maybe 1KB or 2KB is a good choice for your data distribution, to optimize packing of data and/or directories into inodes...

Hmmm... I don't know why the doc leaves out 2048, perhaps a typo...

mmcrfs x2K -i 2048

[root at n2 charts]# mmlsfs x2K -i
flag                value                    description
------------------- ------------------------ -----------------------------------
 -i                 2048                     Inode size in bytes

Works for me!
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160929/60fcd5c0/attachment.htm>