[gpfsug-discuss] Question about changing inode capacity safely

Jared David Baker Jared.Baker at uwyo.edu
Fri Jan 2 18:37:19 GMT 2015


Hello GPFS admins! I hope everybody had a great start to the new year so far.

Lately, I've had a few of my users get an error similar to:

      error creating file: no space left on device.


When trying to create even simple files (using Linux `touch` command). However, if they try again in a second or two, the file is created without a problem and they go on about doing their work. I can never tell when they are likely to get the error message about 'no space left on device'. The filesystem creates many files in parallel (depending on the usage of the system and movement of files from other sites)

However, let me first describe our environment a little better. We have a 3 GPFS file systems (home, project, gscratch) on RHEL 6.3 InfiniBand HPC cluster. The version of GPFS is 3.5.0-11. We utilize fileset quotas (on block limits, not file limits) for each file system. Each user has a home fileset for storing basic configuration files, basic notes, and other small files. Each user belongs to a minimum of one project and the quota is shared between the users of the project. The gscratch file system is similar to that of the project file system except that files are deleted after ~9 days.

The partially good news (perhaps) is that the error mentioned above only occurs on the project file system. We've at least not observed the error on the home and gscratch file systems. Here's my initial investigation so far:


1.)Checked the fileset quota on one of the experienced filesets:

--
# mmlsquota -j ModMast project
                         Block Limits                                    |     File Limits
Filesystem type             KB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
project    FILESET   953382016          0 16106127360          0     none |  8666828       0        0        0     none
--

It would seem from the information that the project is indeed well under their quota for their particular project.


2.)Then I checked the overall file system to see if the capacity/inode is nearly full:

--
# mmdf project
disk                disk size  failure holds    holds              free KB             free KB
name                    in KB    group metadata data        in full blocks        in fragments
--------------- ------------- -------- -------- ----- -------------------- -------------------
Disks in storage pool: system (Maximum disk size allowed is 397 TB)
U01_L0            15623913472       -1 Yes      Yes      7404335104 ( 47%)     667820032 ( 4%)
U01_L1            15623913472       -1 Yes      Yes      7498215424 ( 48%)     642773120 ( 4%)
U01_L2            15623913472       -1 Yes      Yes      7497969664 ( 48%)     642664576 ( 4%)
U01_L3            15623913472       -1 Yes      Yes      7496232960 ( 48%)     644327936 ( 4%)
U01_L4            15623913472       -1 Yes      Yes      7499296768 ( 48%)     640117376 ( 4%)
U01_L5            15623913472       -1 Yes      Yes      7494881280 ( 48%)     644168320 ( 4%)
U01_L6            15623913472       -1 Yes      Yes      7494164480 ( 48%)     643673216 ( 4%)
U01_L7            15623913472       -1 Yes      Yes      7497433088 ( 48%)     639918976 ( 4%)
U01_L8            15623913472       -1 Yes      Yes      7494139904 ( 48%)     645130240 ( 4%)
U01_L9            15623913472       -1 Yes      Yes      7498375168 ( 48%)     639979520 ( 4%)
U01_L10           15623913472       -1 Yes      Yes      7496028160 ( 48%)     641909632 ( 4%)
U01_L11           15623913472       -1 Yes      Yes      7496093696 ( 48%)     643749504 ( 4%)
U01_L12           15623913472       -1 Yes      Yes      7496425472 ( 48%)     641556992 ( 4%)
U01_L13           15623913472       -1 Yes      Yes      7495516160 ( 48%)     643395840 ( 4%)
U01_L14           15623913472       -1 Yes      Yes      7496908800 ( 48%)     642418816 ( 4%)
U01_L15           15623913472       -1 Yes      Yes      7495823360 ( 48%)     643580416 ( 4%)
U01_L16           15623913472       -1 Yes      Yes      7499939840 ( 48%)     641538688 ( 4%)
U01_L17           15623913472       -1 Yes      Yes      7497355264 ( 48%)     642184704 ( 4%)
U13_L0             2339553280       -1 Yes      No       2322395136 ( 99%)       8190848 ( 0%)
U13_L1             2339553280       -1 Yes      No       2322411520 ( 99%)       8189312 ( 0%)
U13_L12           15623921664       -1 Yes      Yes      7799422976 ( 50%)     335150208 ( 2%)
U13_L13           15623921664       -1 Yes      Yes      8002662400 ( 51%)     126059264 ( 1%)
U13_L14           15623921664       -1 Yes      Yes      8001093632 ( 51%)     126107648 ( 1%)
U13_L15           15623921664       -1 Yes      Yes      8001732608 ( 51%)     126167168 ( 1%)
U13_L16           15623921664       -1 Yes      Yes      8000077824 ( 51%)     126240768 ( 1%)
U13_L17           15623921664       -1 Yes      Yes      8001458176 ( 51%)     126068480 ( 1%)
U13_L18           15623921664       -1 Yes      Yes      7998636032 ( 51%)     127111680 ( 1%)
U13_L19           15623921664       -1 Yes      Yes      8001892352 ( 51%)     125148928 ( 1%)
U13_L20           15623921664       -1 Yes      Yes      8001916928 ( 51%)     126187904 ( 1%)
U13_L21           15623921664       -1 Yes      Yes      8002568192 ( 51%)     126591616 ( 1%)
                -------------                         -------------------- -------------------
(pool total)     442148765696                          219305402368 ( 50%)   13078121728 ( 3%)

                =============                         ==================== ===================
(data)           437469659136                          214660595712 ( 49%)   13061741568 ( 3%)
(metadata)       442148765696                          219305402368 ( 50%)   13078121728 ( 3%)
                =============                         ==================== ===================
(total)          442148765696                          219305402368 ( 50%)   13078121728 ( 3%)

Inode Information
-----------------
Number of used inodes:       133031523
Number of free inodes:         1186205
Number of allocated inodes:  134217728
Maximum number of inodes:    134217728
--

Eureka! From here it seems that the inode capacity is teetering on its limit. I think at this point it would be best to educate our users on not writing millions of small text files as I don't think it is possible to adjust the GPFS block size to something lower (block size is currently 4MB). The system was originally targeted at large read/writes from traditional HPC users, but we have now diversified our user base to include other computing areas outside traditional HPC. Documentation states that if parallel writes are to be done, that a minimum of 5% of the inodes need to be free otherwise performance will suffer. From above, we have less than 1% free which I think is the root of our problem.

Therefore, is there a method to safely increase the maximum inode count and could it be done during operation or should the system be unmounted? I've man paged / searched online and found a few hints suggesting below but was curious about its safety during operation:

      mmchfs project --inode-limit <new_max_inode_count>

The man page describes that the limit is:

      max_files = total_filesystem_space / (inode_size + subblock_size)

and the subblock size defined from IBM's website as 1/32 of the block size (which is 4MB). Therefore, I calculate that the maximum number of inodes I could potentially have is:

      3440846425

Which is approximately 25x the current maximum, so I think there is reason that I can increase the inode count without too much worry. Are there any caveats to my logic here? I'm not saying I'll increase it to the maximum value right away because the inode space would take away from some usable capacity of the system.

Thanks for any comments and recommendations. I have a grand size maintenance period coming up due to datacenter power upgrades and I'll be given ~2 weeks of down time for maintenance which I'm trying to get all my ducks in line and if I need to do something time consuming with the file systems, I'd like to know ahead of time so I can do it during the maintenance window as I will probably not get another one window for many months after.

Again, thank you all!

Jared Baker
ARCC



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150102/ede3f500/attachment.htm>


More information about the gpfsug-discuss mailing list