[gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?

Mon Jan 20 15:15:33 GMT 2020

Hello Venkat,

Thank you very much, upgrading to 5.0.4.1 did indeed fix the issue. AFM now compiles the list of pending changes in a few hours. Before we estimated >20days.

We had to increase disk space in /var/mmfs/afm/ and /var/mmfs/tmp/  to allow AFM to store all intermediate file lists. The manual did  recommend to provide much disk space in /var/mmfs/afm/ only, but some processes doing a resync placed lists in /var/mmfs/tmp/, too.

Cheers,

Heiner

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, 14 January 2020 at 17:51
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?

Hi,

>The dirtyDirs file holds 130’000 lines, which don’t seem to be so many. But dirtyDirDirents holds about 80M entries.  Can we estimate how long it will take to finish processing?

Yes, this is the major problem fixed as mentioned in the APAR below. The dirtyDirs file is opened for the each entry in the dirtyDirDirents file, and this causes the performance overhead.

>At the moment all we can do is to wait? We run version 5.0.2.3. Would version 5.0.3 or 5.0.4 show a different behavior? Is this fixed/improved in a later release?
>There probably is no way to flush the pending queue entries while recovery is ongoing?

Later versions have the fix mentioned in that APAR, and I believe it should fix the your current performance issue. Flushing the pending queue entries is not avaible as of today (5.0.4), we are currently working on this feature.

~Venkat (vpuvvada at in.ibm.com)

From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        01/13/2020 05:29 PM
Subject:        [EXTERNAL] Re: [gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________

Hello Venkat,

thank you, this  seems to match our issue.  did trace tspcachescan and do see a long series of open()/read()/close() to the dirtyDirs file. The dirtyDirs file holds 130’000 lines, which don’t seem to be so many. But dirtyDirDirents holds about 80M entries.  Can we estimate how long it will take to finish processing?

tspcachescan does the following again and again for different directories

11:11:36.837032 stat("/fs3101/XXXXX/.snapshots/XXXXX.afm.75872/yyyyy/yyyy", {st_mode=S_IFDIR|0770, st_size=4096, ...}) = 0
11:11:36.837092 open("/var/mmfs/afm/fs3101-43/recovery/policylist.data.list.dirtyDirs", O_RDONLY) = 8
11:11:36.837127 fstat(8, {st_mode=S_IFREG|0600, st_size=32564140, ...}) = 0
11:11:36.837160 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x3fff96930000
11:11:36.837192 read(8, "539492355 65537 2795  553648131 "..., 8192) = 8192
11:11:36.837317 read(8, "Caches/com.apple.helpd/Generated"..., 8192) = 8192
11:11:36.837439 read(8, "ish\n539848852 1509237202 2795  5"..., 8192) = 8192
Many more reads
11:11:36.864104 close(8)                = 0
11:11:36.864135 munmap(0x3fff96930000, 8192) = 0

A single iteration takes about 27ms. Doing this 130’000 times would be o.k., but if tspcachescan does it 80M times we wait 600hours. Is there a way to estimate how many iteration tspcachescan will do? The cache fileset holds 140M inodes.

At the moment all we can do is to wait? We run version 5.0.2.3. Would version 5.0.3 or 5.0.4 show a different behavior? Is this fixed/improved in a later release?

There probably is no way to flush the pending queue entries while recovery is ongoing?

I did open a case with IBM TS003219893 and will continue there.

Kind regards,

Heiner

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Venkateswara R Puvvada <vpuvvada at in.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 13 January 2020 at 08:40
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?

AFM maintains in-memory queue at the gateway node to keep track of changes happening on the fileset. If the in-memory queue is lost (memory pressure, daemon shutdown etc..), AFM runs recovery process which involves creating the snapshot, running the policy scan and finally queueing the recovered operations.  Due to message (operations) dependency, any changes to the AFM fileset during the recovery won't get replicated until the recovery the completion. AFM does the home directory scan for only dirty directories  to get the names of the deleted and renamed files because old name for renamed file and deleted file name are not available at the cache on disk. Directories are made dirty when there is a rename or unlink operation is performed inside it.  In your case it may be that all the directories became dirty due to the rename/unlink operations. AFM recovery process is single threaded.

>Is this to be expected and normal behavior?  What to do about it?
>Will every reboot of a gateway node trigger a recovery of all afm filesets and a full scan of home? This would make normal rolling updates  very unpractical, or is there some better way?

Only  for the dirty directories, see above.

>Home is a gpfs cluster, hence we easily could produce the needed filelist on home with a policyscan in a few minutes.

There is some work going on to preserve  the  file names of the unlinked/renamed files  in the cache until they get replicated to home so that home directory scan can be avoided.

These are some issues fixed in this regard. What is the scale version ?

https://www-01.ibm.com/support/docview.wss?uid=isg1IJ15436

~Venkat (vpuvvada at in.ibm.com)

From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        01/08/2020 10:32 PM
Subject:        [EXTERNAL] [gpfsug-discuss] AFM Recovery of SW cache does a full scan of home - is this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________

Hello,

still new to AFM, so some basic question on how Recovery works for a SW cache:

we have an AFM SW cache in recovery mode – recovery first did run policies on the cache cluster, but now I see a ‘tcpcachescan’ process on cache slowly scanning home via nfs. Single host, single process, no parallelism as far as I can see, but I may be wrong. This scan of home on a cache afmgateway takes very long while further updates on cache queue up. Home has about 100M files. After 8hours I see about 70M entries in the file /var/mmfs/afm/…/recovery/homelist, i.e. we get about 2500 lines/s.  (We may have very many changes on cache due to some recursive ACL operations, but I’m not sure.)

So I expect that 12hours pass to buildup filelists before recovery starts to update home. I see some risk: In this time new changes pile up on cache. Memory may become an issue? Cache may fill up and we can’t evict?

I wonder

  *   Is this to be expected and normal behavior?  What to do about it?
  *   Will every reboot of a gateway node trigger a recovery of all afm filesets and a full scan of home? This would make normal rolling updates  very unpractical, or is there some better way?

Home is a gpfs cluster, hence we easily could produce the needed filelist on home with a policyscan in a few minutes.

Thank you, I will welcome and clarification, advice or comments.

Kind regards,

Heiner
.

--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20200120/0a86eb64/attachment-0002.htm>