[gpfsug-discuss] Blocksize - file size distribution

Wed Sep 28 21:18:55 BST 2016

On Wed, 28 Sep 2016 10:34:05 -0400
Marc A Kaplan <makaplan at us.ibm.com> wrote:

> Consider using samples/ilm/mmfind (or mmapplypolicy with a LIST ...
> SHOW rule) to gather the stats much faster.  Should be minutes, not
> hours.
> 

I'll agree with the policy engine.  Runs like a beast if you tune it a
little for nodes and threads. 

 Only takes a couple of minutes to collect info on over a hundred
 million files. Show where the data is now by pool and sort it by age
 with queries? quick hack up example. you could sort the mess on the
 front end fairly quickly. (use fileset or pool, etc as your storage
 needs)

RULE '2yrold_files' LIST  '2yrold_filelist.txt'

SHOW (varchar(file_size) || '  ' || varchar(USER_ID) || '  ' || varchar(POOL_NAME))
WHERE DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) >= 730 AND DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) < 1095

don't forget to run the engine with the -I defer for this kind of
list/show policy.

Ed

-- 

Ed Wahl
Ohio Supercomputer Center
614-292-9302