[gpfsug-discuss] Sub-block size wrong on GPFS 5 filesystem?

Marc A Kaplan makaplan at us.ibm.com
Wed Aug 1 19:47:31 BST 2018


I guess that particular table is not the whole truth, nor a specification, 
nor a promise, but a simplified summary of what you get when there is just 
one block size that applies to both meta-data and data-data. 

You have discovered that it does not apply to systems where metadata has a 
different blocksize than data-data. 

My guesstimate (speculation!) is that the deployed code chooses one 
subblocks-per-full-block parameter and applies that to both. Which would 
explain the results we're seeing.  Further is seems the the mmlsfs command 
assumes at least in some places that there is only one subblocks-per-block 
parameter...
Looking deeper into code, is another story for another day -- but I'll say 
that there seems to be sufficient flexibility that if this were deemed a 
burning issue, there could be futher "enhancements..."  ;-)




From:   "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   08/01/2018 02:24 PM
Subject:        Re: [gpfsug-discuss] Sub-block size wrong on GPFS 5 
filesystem?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi Marc, 

Thanks for the response … I understand what you’re saying, but since I’m 
asking for a 1 MB block size for metadata and a 4 MB block size for data 
and according to the chart in the mmcrfs man page both result in an 8 KB 
sub block size I’m still confused as to why I’ve got a 32 KB sub block 
size for my non-system (i.e. data) pools?  Especially when you consider 
that 32 KB isn’t the default even if I had chosen an 8 or 16 MB block 
size!

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and 
Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633

On Aug 1, 2018, at 12:21 PM, Marc A Kaplan <makaplan at us.ibm.com> wrote:

I haven't looked into all the details but here's a clue -- notice there is 
only one "subblocks-per-full-block" parameter.  

And it is the same for both metadata blocks and datadata blocks.

So maybe (MAYBE) that is a constraint somewhere...

Certainly, in the currently supported code, that's what you get.

From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        08/01/2018 12:55 PM
Subject:        [gpfsug-discuss] Sub-block size wrong on GPFS 5 
filesystem?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hi All, 

Our production cluster is still on GPFS 4.2.3.x, but in preparation for 
moving to GPFS 5 I have upgraded our small (7 node) test cluster to GPFS 
5.0.1-1.  I am setting up a new filesystem there using hardware that we 
recently life-cycled out of our production environment.

I “successfully” created a filesystem but I believe the sub-block size is 
wrong.  I’m using a 4 MB filesystem block size, so according to the mmcrfs 
man page the sub-block size should be 8K:

         Table 1. Block sizes and subblock sizes

+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| Block size                    | Subblock size                 |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 64 KiB                        | 2 KiB                         |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 128 KiB                       | 4 KiB                         |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 256 KiB, 512 KiB, 1 MiB, 2    | 8 KiB                         |
| MiB, 4 MiB                    |                               |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+
| 8 MiB, 16 MiB                 | 16 KiB                        |
+‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐+

However, it appears that it’s 8K for the system pool but 32K for the other 
pools:

flag                value                    description
------------------- ------------------------ 
-----------------------------------
 -f                 8192                     Minimum fragment (subblock) 
size in bytes (system pool)
                    32768                    Minimum fragment (subblock) 
size in bytes (other pools)
 -i                 4096                     Inode size in bytes
 -I                 32768                    Indirect block size in bytes
 -m                 2                        Default number of metadata 
replicas
 -M                 3                        Maximum number of metadata 
replicas
 -r                 1                        Default number of data 
replicas
 -R                 3                        Maximum number of data 
replicas
 -j                 scatter                  Block allocation type
 -D                 nfs4                     File locking semantics in 
effect
 -k                 all                      ACL semantics in effect
 -n                 32                       Estimated number of nodes 
that will mount file system
 -B                 1048576                  Block size (system pool)
                    4194304                  Block size (other pools)
 -Q                 user;group;fileset       Quotas accounting enabled
                    user;group;fileset       Quotas enforced
                    none                     Default quotas enabled
 --perfileset-quota No                       Per-fileset quota enforcement
 --filesetdf        No                       Fileset df enabled?
 -V                 19.01 (5.0.1.0)          File system version
 --create-time      Wed Aug  1 11:39:39 2018 File system creation time
 -z                 No                       Is DMAPI enabled?
 -L                 33554432                 Logfile size
 -E                 Yes                      Exact mtime mount option
 -S                 relatime                 Suppress atime mount option
 -K                 whenpossible             Strict replica allocation 
option
 --fastea           Yes                      Fast external attributes 
enabled?
 --encryption       No                       Encryption enabled?
 --inode-limit      101095424                Maximum number of inodes
 --log-replicas     0                        Number of log replicas
 --is4KAligned      Yes                      is4KAligned?
 --rapid-repair     Yes                      rapidRepair enabled?
 --write-cache-threshold 0                   HAWC Threshold (max 65536)
 --subblocks-per-full-block 128              Number of subblocks per full 
block
 -P                 system;raid1;raid6       Disk storage pools in file 
system
 --file-audit-log   No                       File Audit Logging enabled?
 --maintenance-mode No                       Maintenance Mode enabled?
 -d 
test21A3nsd;test21A4nsd;test21B3nsd;test21B4nsd;test23Ansd;test23Bnsd;test23Cnsd;test24Ansd;test24Bnsd;test24Cnsd;test25Ansd;test25Bnsd;test25Cnsd 
 Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /gpfs5                   Default mount point
 --mount-priority   0                        Mount priority

Output of mmcrfs:

mmcrfs gpfs5 -F ~/gpfs/gpfs5.stanza -A yes -B 4M -E yes -i 4096 -j scatter 
-k all -K whenpossible -m 2 -M 3 -n 32 -Q yes -r 1 -R 3 -T /gpfs5 -v yes 
--nofilesetdf --metadata-block-size 1M

The following disks of gpfs5 will be formatted on node testnsd3:
    test21A3nsd: size 953609 MB
    test21A4nsd: size 953609 MB
    test21B3nsd: size 953609 MB
    test21B4nsd: size 953609 MB
    test23Ansd: size 15259744 MB
    test23Bnsd: size 15259744 MB
    test23Cnsd: size 1907468 MB
    test24Ansd: size 15259744 MB
    test24Bnsd: size 15259744 MB
    test24Cnsd: size 1907468 MB
    test25Ansd: size 15259744 MB
    test25Bnsd: size 15259744 MB
    test25Cnsd: size 1907468 MB
Formatting file system ...
Disks up to size 8.29 TB can be added to storage pool system.
Disks up to size 16.60 TB can be added to storage pool raid1.
Disks up to size 132.62 TB can be added to storage pool raid6.
Creating Inode File
   8 % complete on Wed Aug  1 11:39:19 2018
  18 % complete on Wed Aug  1 11:39:24 2018
  27 % complete on Wed Aug  1 11:39:29 2018
  37 % complete on Wed Aug  1 11:39:34 2018
  48 % complete on Wed Aug  1 11:39:39 2018
  60 % complete on Wed Aug  1 11:39:44 2018
  72 % complete on Wed Aug  1 11:39:49 2018
  83 % complete on Wed Aug  1 11:39:54 2018
  95 % complete on Wed Aug  1 11:39:59 2018
 100 % complete on Wed Aug  1 11:40:01 2018
Creating Allocation Maps
Creating Log Files
   3 % complete on Wed Aug  1 11:40:07 2018
  28 % complete on Wed Aug  1 11:40:14 2018
  53 % complete on Wed Aug  1 11:40:19 2018
  78 % complete on Wed Aug  1 11:40:24 2018
 100 % complete on Wed Aug  1 11:40:25 2018
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
  85 % complete on Wed Aug  1 11:40:32 2018
 100 % complete on Wed Aug  1 11:40:33 2018
Formatting Allocation Map for storage pool raid1
  53 % complete on Wed Aug  1 11:40:38 2018
 100 % complete on Wed Aug  1 11:40:42 2018
Formatting Allocation Map for storage pool raid6
  20 % complete on Wed Aug  1 11:40:47 2018
  39 % complete on Wed Aug  1 11:40:52 2018
  60 % complete on Wed Aug  1 11:40:57 2018
  79 % complete on Wed Aug  1 11:41:02 2018
 100 % complete on Wed Aug  1 11:41:08 2018
Completed creation of file system /dev/gpfs5.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

And contents of stanza file:

%nsd:
  nsd=test21A3nsd
  usage=metadataOnly
  failureGroup=210
  pool=system
  servers=testnsd3,testnsd1,testnsd2
  device=dm-15

%nsd:
  nsd=test21A4nsd
  usage=metadataOnly
  failureGroup=210
  pool=system
  servers=testnsd1,testnsd2,testnsd3
  device=dm-14

%nsd:
  nsd=test21B3nsd
  usage=metadataOnly
  failureGroup=211
  pool=system
  servers=testnsd1,testnsd2,testnsd3
  device=dm-17

%nsd:
  nsd=test21B4nsd
  usage=metadataOnly
  failureGroup=211
  pool=system
  servers=testnsd2,testnsd3,testnsd1
  device=dm-16

%nsd:
  nsd=test23Ansd
  usage=dataOnly
  failureGroup=23
  pool=raid6
  servers=testnsd2,testnsd3,testnsd1
  device=dm-10

%nsd:
  nsd=test23Bnsd
  usage=dataOnly
  failureGroup=23
  pool=raid6
  servers=testnsd3,testnsd1,testnsd2
  device=dm-9

%nsd:
  nsd=test23Cnsd
  usage=dataOnly
  failureGroup=23
  pool=raid1
  servers=testnsd1,testnsd2,testnsd3
  device=dm-5

%nsd:
  nsd=test24Ansd
  usage=dataOnly
  failureGroup=24
  pool=raid6
  servers=testnsd3,testnsd1,testnsd2
  device=dm-6

%nsd:
  nsd=test24Bnsd
  usage=dataOnly
  failureGroup=24
  pool=raid6
  servers=testnsd1,testnsd2,testnsd3
  device=dm-0

%nsd:
  nsd=test24Cnsd
  usage=dataOnly
  failureGroup=24
  pool=raid1
  servers=testnsd2,testnsd3,testnsd1
  device=dm-2

%nsd:
  nsd=test25Ansd
  usage=dataOnly
  failureGroup=25
  pool=raid6
  servers=testnsd1,testnsd2,testnsd3
  device=dm-6

%nsd:
  nsd=test25Bnsd
  usage=dataOnly
  failureGroup=25
  pool=raid6
  servers=testnsd2,testnsd3,testnsd1
  device=dm-6

%nsd:
  nsd=test25Cnsd
  usage=dataOnly
  failureGroup=25
  pool=raid1
  servers=testnsd3,testnsd1,testnsd2
  device=dm-3

%pool:
  pool=system
  blockSize=1M
  usage=metadataOnly
  layoutMap=scatter
  allowWriteAffinity=no

%pool:
  pool=raid6
  blockSize=4M
  usage=dataOnly
  layoutMap=scatter
  allowWriteAffinity=no

%pool:
  pool=raid1
  blockSize=4M
  usage=dataOnly
  layoutMap=scatter
  allowWriteAffinity=no

What am I missing or what have I done wrong?  Thanks…

Kevin
?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and 
Education
Kevin.Buterbaugh at vanderbilt.edu- (615)875-9633


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cd84fdde05c65406d4d9008d5f7d32f0f%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636687408760535040&sdata=hqVZVIQLbxakARTspzbSkMZBHi2b6%2BIcrPLU1atNbus%3D&reserved=0
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180801/30da9ecd/attachment.htm>


More information about the gpfsug-discuss mailing list