[gpfsug-discuss] Inode scan optimization

Marc A Kaplan makaplan at us.ibm.com
Thu Feb 8 13:56:42 GMT 2018


Recall that many years ago we demonstrated a Billion files scanned with 
mmapplypolicy in under 20 minutes...
And that was on ordinary at the time, spinning disks (not SSD!)... Granted 
we packed about 1000 files per directory and made some other choices that 
might not be typical usage....  OTOH storage and nodes have improved since 
then...

SO when you say it takes 60 days to backup 2 billion files and that's a 
problem....
Like any large computing job, one has to do some analysis to find out what 
parts of the job are taking how much time...

So... what commands are you using to do the backup...?
What timing statistics or measurements have you collected?

If you are using mmbackup and/or mmapplypolicy, those commands can show 
you how much time they spend scanning the file system looking for files to 
backup AND then how much time they spend copying the data to backup media. 
 In fact they operate in distinct phases... directory scan, inode scan, 
THEN data copying ... so it's straightforward to see which phases are 
taking how much time.

OH... I see you also say you are using gpfs_stat_inode_with_xattrs64 -- 
These APIs are tricky and not a panacea.... That's why we provide you with 
mmapplypolicy which in fact uses those APIs in clever, patented ways  -- 
optimized and honed with years of work.... 

And more recently, we provided you with samples/ilm/mmfind -- which has 
the functionality of the classic unix find command -- but runs in parallel 
- using mmapplypolicy.
TRY IT on you file system!



From:   "Tomasz.Wolski at ts.fujitsu.com" <Tomasz.Wolski at ts.fujitsu.com>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   02/08/2018 05:50 AM
Subject:        [gpfsug-discuss] Inode scan optimization
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



Hello All,
 
A full backup of an 2 billion inodes spectrum scale file system on 
V4.1.1.16 takes 60 days.
 
We try to optimize and using inode scans seems to improve, even when we 
are using a directory scan and the inode scan just for having a better 
performance concerning stat (using gpfs_stat_inode_with_xattrs64). With 20 
processes in parallel doing dir scans (+ inode scans for stat info) we 
have decreased the time to 40 days.
All NSDs are dataAndMetadata type.
 
I have the following questions:
·         Is there a way to increase the inode scan cache (we may use 32 
GByte)? 
o   Can we us the “hidden” config parameters
§     iscanPrefetchAggressiveness 2
§     iscanPrefetchDepth 0
§     iscanPrefetchThreadsPerNode 0
·         Is there a documentation concerning cache behavior?
o   if no, is the  inode scan cache process or node specific?
o   Is there a suggestion to optimize the termIno parameter in the 
gpfs_stat_inode_with_xattrs64() in such a use case?
 
Thanks! 
 
Best regards,
Tomasz Wolski_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=cvpnBBH0j41aQy0RPiG2xRL_M8mTc1izuQD3_PmtjZ8&m=mWxVB2lS_snDiYR4E348tnzbQTSuuWSrRiBDhJPjyh8&s=FG9fDxbmiCuSh0cvt4hsQS0bKdGHjI7loVGEKO0eTf0&e=





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180208/f0d4a6bc/attachment.htm>


More information about the gpfsug-discuss mailing list