[gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

Tue Apr 18 14:56:43 BST 2017

Kevin,
Here's a silly theory: Have you tried putting a weight value in? I wonder
if during migration it hits some large file that would go over the
threshold and stops. With a weight flag you could move all small files in
first or by lack of heat etc to pack the tier more tightly.
Just something else to try before the PMR process.
Zach

On Apr 18, 2017 9:32 AM, "Buterbaugh, Kevin L" <
Kevin.Buterbaugh at vanderbilt.edu> wrote:

Hi All, but especially Marc,

I ran the mmapplypolicy again last night and, unfortunately, it again did
not fill the capacity pool like it said it would.  From the log file:

[I] Summary of Rule Applicability and File Choices:
 Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
 KB_Ill     Rule
     0      3632859     181380873184        1620175     61434283936
      0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL
'gpfs23capacity' LIMIT(98.000000) WHERE(.)
     1           88        99230048              88        99230048
      0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO
POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 442962867.

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 61533513984KB: 1620263 of 3632947 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878464    124983549952     97.999999609%
gpfs23data                 128885076416    343753326592     37.493477574%
system                                0               0      0.000000000%
(no user data)
[I] 2017-04-18 at 02:52:48.402 Policy execution. 0 files dispatched.

And the tail end of the log file says that it moved those files:

[I] 2017-04-18 at 09:06:51.124 Policy execution. 1620263 files dispatched.
[I] A total of 1620263 files have been migrated, deleted or processed by an
EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

But mmdf (and how quickly the mmapplypolicy itself ran) say otherwise:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.73T ( 51%)
       64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.73T ( 51%)
       64.61G ( 0%)
                -------------                         --------------------
-------------------
(pool total)           116.4T                                59.45T ( 51%)
       128.8G ( 0%)

Ideas?  Or is it time for me to open a PMR?

Thanks…

Kevin

On Apr 17, 2017, at 4:16 PM, Buterbaugh, Kevin L <
Kevin.Buterbaugh at Vanderbilt.Edu> wrote:

Hi Marc, Alex, all,

Thank you for the responses.  To answer Alex’s questions first … the full
command line I used (except for some stuff I’m redacting but you don’t need
the exact details anyway) was:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on
another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N
some,list,of,NSD,server,nodes

And yes, it printed out the very normal, “Hey, I migrated all 1.8 million
files I said I would successfully, so I’m done here” message:

[I] A total of 1869469 files have been migrated, deleted or processed by an
EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

Marc - I ran what you suggest in your response below - section 3a.  The
output of a “test” mmapplypolicy and mmdf was very consistent.  Therefore,
I’m moving on to 3b and running against the full filesystem again … the
only difference between the command line above and what I’m doing now is
that I’m running with “-L 2” this time around.  I’m not fond of doing this
during the week but I need to figure out what’s going on and I *really*
need to get some stuff moved from my “data” pool to my “capacity” pool.

I will respond back on the list again where there’s something to report.
Thanks again, all…

Kevin

On Apr 17, 2017, at 3:11 PM, Marc A Kaplan <makaplan at us.ibm.com> wrote:

Kevin,

1. Running with both fairly simple rules so that you migrate "in both
directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen".

3. To begin to understand "what the hxxx is going on" (as our fearless
leader liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy
/gpfs23/test-directory -I test ...` and check that the
[I] ... Current data pool utilization
message is consistent with the output of `mmdf gpfs23`.

They should be, but if they're not, that's a weird problem right there
since they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS
with your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and
"add up"?
Do the file counts seem reasonable, considering that you recently did
migrations/deletions that should have changed the counts compared to
previous runs
of mmapplypolicy?  If you just want to look and not actually change
anything, use `-I test` which will skip the migration steps.  If you want
to see the list of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy
and mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options
to make sure every file has its data blocks where they are supposed to be
and is replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (né GPFS)

From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        04/17/2017 11:25 AM
Subject:        Re: [gpfsug-discuss] mmapplypolicy didn't migrate
everything it should        have - why not?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
------------------------------

Hi Marc,

I do understand what you’re saying about mmapplypolicy deciding it only
needed to move ~1.8 million files to fill the capacity pool to ~98% full.
However, it is now more than 24 hours since the mmapplypolicy finished
“successfully” and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.66T ( 51%)
       64.16G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.66T ( 51%)
       64.61G ( 0%)
                -------------                         --------------------
-------------------
(pool total)           116.4T                                59.33T ( 51%)
       128.8G ( 0%)

And yes, I did run the mmapplypolicy with “-I yes” … here’s the partially
redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g <some folder on
another gpfs filesystem> -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N
some,list,of,NSD,server,nodes

And here’s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in
one direction at a time … i.e. I used to have those two rules in two
separate files and would run an mmapplypolicy using the OldStuff rule the
1st weekend of the month and run the other rule the other weekends of the
month.  This is the 1st weekend that I attempted to run an mmapplypolicy
that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other
filesystem that we are still in the process of migrating off of.  So gpfs23
goes 1st and as soon as it’s done the other filesystem migration kicks
off.  I don’t like to run two migrations simultaneously if at all
possible.  The 2nd migration ran until this morning, when it was
unfortunately terminated by a network switch crash that has also had me
tied up all morning until now.  :-(

And yes, there is something else going on … well, was going on - the
network switch crash killed this too … I have been running an rsync on one
particular ~80TB directory tree from the old filesystem to gpfs23.  I
understand that the migration wouldn’t know about those files and that’s
fine … I just don’t understand why mmapplypolicy said it was going to fill
the capacity pool to 98% but didn’t do it … wait, mmapplypolicy hasn’t gone
into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan <*makaplan at us.ibm.com*
<makaplan at us.ibm.com>> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000%
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
 KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720
      0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL
'gpfs23capacity' LIMIT(98.000000) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses
1.868Million files that add up to 67,355GB and figures that if it migrates
those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23
will end up  97.9999% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%

So that's why it chooses to migrate "only" 67GB....

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the
changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while
mmapplypolicy is running -- it doesn't "know" about that!

Run it again!

<ATT00001.gif>

From:        "Buterbaugh, Kevin L" <*Kevin.Buterbaugh at Vanderbilt.Edu*
<Kevin.Buterbaugh at Vanderbilt.Edu>>
To:        gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
<gpfsug-discuss at spectrumscale.org>>
Date:        04/16/2017 09:47 AM
Subject:        [gpfsug-discuss] mmapplypolicy didn't migrate everything it
should        have - why not?
Sent by:        *gpfsug-discuss-bounces at spectrumscale.org*
<gpfsug-discuss-bounces at spectrumscale.org>
------------------------------

Hi All,

First off, I can open a PMR for this if I need to.  Second, I am far from
an mmapplypolicy guru.  With that out of the way … I have an mmapplypolicy
job that didn’t migrate anywhere close to what it could / should have.
>From the log file I have it create, here is the part where it shows the
policies I told it to invoke:

[I] Qos 'maintenance' configured as inf
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity              55365193728    124983549952     44.297984614%
gpfs23data                 166747037696    343753326592     48.507759721%
system                                0               0      0.000000000%
(no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.
[I] Loaded policy rules from /root/gpfs/gpfs23_migration.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-04-15 at 01:13:02 UTC
Parsed 2 policy rules.

RULE 'OldStuff'
 MIGRATE FROM POOL 'gpfs23data'
 TO POOL 'gpfs23capacity'
 LIMIT(98)
 WHERE (((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 14) AND
(KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
 MIGRATE FROM POOL 'gpfs23capacity'
 TO POOL 'gpfs23data'
 LIMIT(75)
 WHERE ((DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) < 14)

And then the log shows it scanning all the directories and then says, "OK,
here’s what I’m going to do":

[I] Summary of Rule Applicability and File Choices:
Rule#      Hit_Cnt          KB_Hit          Chosen       KB_Chosen
 KB_Ill     Rule
    0      5255960     237675081344        1868858     67355430720
      0     RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL
'gpfs23capacity' LIMIT(98.000000) WHERE(.)
    1          611       236745504             611       236745504
      0     RULE 'INeedThatAfterAll' MIGRATE FROM POOL 'gpfs23capacity' TO
POOL 'gpfs23data' LIMIT(75.000000) WHERE(.)

[I] Filesystem objects with no applicable rules: 414911602.

[I] GPFS Policy Decisions and File Choice Totals:
Chose to migrate 67592176224KB: 1869469 of 5256571 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             122483878944    124983549952     97.999999993%
gpfs23data                 104742360032    343753326592     30.470209865%
system                                0               0      0.000000000%
(no user data)

Notice that it says it’s only going to migrate less than 2 million of the
5.25 million candidate files!!  And sure enough, that’s all it did:

[I] A total of 1869469 files have been migrated, deleted or processed by an
EXTERNAL EXEC/script;
       0 'skipped' files and/or errors.

And, not surprisingly, the gpfs23capacity pool on gpfs23 is nowhere near
98% full:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          29.54T ( 51%)
       63.93G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          29.54T ( 51%)
       64.39G ( 0%)
               -------------                         --------------------
-------------------
(pool total)           116.4T                                59.08T ( 51%)
       128.3G ( 0%)

I don’t understand why it only migrated a small subset of what it could /
should have?

We are doing a migration from one filesystem (gpfs21) to gpfs23 and I
really need to stuff my gpfs23capacity pool as full of data as I can to
keep the migration going.  Any ideas anyone?  Thanks in advance…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
*Kevin.Buterbaugh at vanderbilt.edu* <Kevin.Buterbaugh at vanderbilt.edu>-
(615)875-9633 <(615)%20875-9633>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
*http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
<http://gpfsug.org/mailman/listinfo/gpfsug-discuss>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
*http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
<http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 <(615)%20875-9633>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170418/3d926f50/attachment.htm>