[gpfsug-discuss] Using AFM to migrate files. (Peter Childs) (Peter Childs)

Bill Pappas bpappas at dstonline.com
Fri Oct 21 19:46:09 BST 2016


>>the largest of the filesets has 52TB and 63 million files


Are you using NFS as the transport path between the home and cache?


If you are using NFS, how are you producing the list of files to migrate?  mmafmctl with the prefetch option? If so, I would measure the time it takes for that command (with that option) to produce the list of files it intends to prefetch. From my experience, this is very important as a) it can take a long time if you have >10 million of files and b) I've seen this operation crash when the list grew large.  Does anyone else on this thread have any experiences?  I would love to hear positive experiences as well.  I tried so hard and for so long to make AFM work with one customer, but we gave up as it was not reliable and stable for large scale (many files) migrations.


If you are using GPFS as the conduit between the home and cache (i.e. no NFS), I would still ask the same question, more with respect to stability for large file lists during the initial prefetch stages.


As far as I could tell, from GPFS 3.5 to 4.2, the phases of prefetch where the home and cache are compared (i.e. let's make a list of what is ot be migrated over) before the data transfer begins only runs on the GW node managing that cache.  It does not leverage multiple gw nodes and multiple home nodes to speed up this 'list and find' stage of prefetch.  I hope some AFM developers can clarify or correct my findings.  This was a huge impediment for large file migrations where it is difficult (organizationally, not technically) to split a folder structure into multiple file sets.  The lack of stability under these large scans was the real failing for us.


Bill Pappas

901-619-0585

bpappas at dstonline.com


[1466780990050_DSTlogo.png]


[http://www.prweb.com/releases/2016/06/prweb13504050.htm]

http://www.prweb.com/releases/2016/06/prweb13504050.htm


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Thursday, October 20, 2016 2:07 PM
To: gpfsug-discuss at spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 57, Issue 53

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Re: Using AFM to migrate files. (Peter Childs) (Peter Childs)


----------------------------------------------------------------------

Message: 1
Date: Thu, 20 Oct 2016 19:07:44 +0000
From: Peter Childs <p.childs at qmul.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Using AFM to migrate files. (Peter
        Childs)
Message-ID: <5qv6d7inj2j1pa94kqamk2uf.1476989646711 at email.android.com>
Content-Type: text/plain; charset="iso-8859-1"

Yes, most of the filesets are based on research groups, projects or departments, with the exception of scratch and home, hence the idea to use a different method for these filesets.

There are approximately 230 million files, the largest of the filesets has 52TB and 63 million files. 300TB in total.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London


---- Bill Pappas wrote ----


I have some ideas to suggest given some of my experiences. First, I have some questions:


How many files are you migrating?

Will you be creating multiple file sets on the target system based off of business or project needs? Like, file set a is for "department a" and file set b is for "large scale project a"


Thanks.


Bill Pappas

901-619-0585

bpappas at dstonline.com


[1466780990050_DSTlogo.png]


[http://www.prweb.com/releases/2016/06/prweb13504050.htm]

http://www.prweb.com/releases/2016/06/prweb13504050.htm


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of gpfsug-discuss-request at spectrumscale.org <gpfsug-discuss-request at spectrumscale.org>
Sent: Wednesday, October 19, 2016 3:12 PM
To: gpfsug-discuss at spectrumscale.org
Subject: gpfsug-discuss Digest, Vol 57, Issue 49

Send gpfsug-discuss mailing list submissions to
        gpfsug-discuss at spectrumscale.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
        gpfsug-discuss-request at spectrumscale.org

You can reach the person managing the list at
        gpfsug-discuss-owner at spectrumscale.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."


Today's Topics:

   1. Using AFM to migrate files. (Peter Childs)
   2. subnets (Brian Marshall)
   3. Re: subnets (Simon Thompson (Research Computing - IT Services))
   4. Re: subnets (Uwe Falke)
   5. Will there be any more GPFS 4.2.0-x releases?
      (Buterbaugh, Kevin L)


----------------------------------------------------------------------

Message: 1
Date: Wed, 19 Oct 2016 14:12:41 +0000
From: Peter Childs <p.childs at qmul.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Using AFM to migrate files.
Message-ID:
        <HE1PR0701MB2554710DD534587615543AE5A4D20 at HE1PR0701MB2554.eurprd07.prod.outlook.com>

Content-Type: text/plain; charset="iso-8859-1"


We are planning to use AFM to migrate our old GPFS file store to a new GPFS file store. This will give us the advantages of Spectrum Scale (GPFS) 4.2, such as larger block and inode size. I would like to attempt to gain some insight on my plans before I start.

The old file store was running GPFS 3.5 with 512 byte inodes and 1MB block size. We have now upgraded it to 4.1 and are working towards 4.2 with 300TB of files. (385TB max space) this is so we can use both the old and new storage via multi-cluster.

We are moving to a new GPFS cluster so we can use the new protocol nodes eventually and also put the new storage machines as cluster managers, as this should be faster and future proof

The new hardware has 1PB of space running GPFS 4.2

We have multiple filesets, and would like to maintain our namespace as far as possible.

My plan was to.

1. Create a read-only (RO) AFM cache on the new storage (ro)
2a. Move old fileset and replace with SymLink to new.
2b. Convert RO AFM to Local Update (LU) AFM pointing to new parking area of old files.
2c. move user access to new location in cache.
3. Flush everything into cache and disconnect.

I've read the docs including the ones on migration but it's not clear if it's safe to move the home of a cache and update the target. It looks like it should be possible and my tests say it works.

An alternative plan is to use a Independent Writer (IW) AFM Cache to move the home directories which are pointed to by LDAP. Hence we can move users one at a time and only have to drain the HPC cluster at the end to disconnect the cache. I assume that migrating users over an Independent Writer is safe so long as the users don't use both sides of the cache at once (ie home and target)

I'm also interested in any recipe people have on GPFS policies to preseed and flush the cache.

We plan to do all the migration using AFM over GPFS we're not currently using NFS and have no plans to start. I believe using GPFS is the faster method to preform the migration.

Any suggestions and experience of doing similar migration jobs would be helpful.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London



------------------------------

Message: 2
Date: Wed, 19 Oct 2016 13:46:02 -0400
From: Brian Marshall <mimarsh2 at vt.edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] subnets
Message-ID:
        <CAD0XtKRDTXe9Y5qQB5-qVRdo_RTbv9WctoJKf+CB97kNkmss0g at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

All,

We are setting up communication between 2 clusters using ethernet and
IPoFabric.

The Daemon interface is running on ethernet, so all admin traffic will use
it.

We are still getting the subnets setting correct.

Question:

Does GPFS have a way to query how it is connecting to a given cluster/node?
 i.e. once we have subnets setup how can we tell GPFS is actually using
them.  Currently we just do a large transfer and check tcpdump for any
packets flowing on the high-speed/data/non-admin subnet.


Thank you,
Brian Marshall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161019/5b59ed8e/attachment-0001.html>

------------------------------

Message: 3
Date: Wed, 19 Oct 2016 18:10:38 +0000
From: "Simon Thompson (Research Computing - IT Services)"
        <S.J.Thompson at bham.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] subnets
Message-ID:
        <CF45EE16DEF2FE4B9AA7FF2B6EE26545F584168E at EX13.adf.bham.ac.uk>
Content-Type: text/plain; charset="us-ascii"


mmdiag --network

Simon
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Brian Marshall [mimarsh2 at vt.edu]
Sent: 19 October 2016 18:46
To: gpfsug main discussion list
Subject: [gpfsug-discuss] subnets

All,

We are setting up communication between 2 clusters using ethernet and IPoFabric.

The Daemon interface is running on ethernet, so all admin traffic will use it.

We are still getting the subnets setting correct.

Question:

Does GPFS have a way to query how it is connecting to a given cluster/node?  i.e. once we have subnets setup how can we tell GPFS is actually using them.  Currently we just do a large transfer and check tcpdump for any packets flowing on the high-speed/data/non-admin subnet.


Thank you,
Brian Marshall


------------------------------

Message: 4
Date: Wed, 19 Oct 2016 20:15:52 +0200
From: "Uwe Falke" <UWEFALKE at de.ibm.com>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] subnets
Message-ID:
        <OF96AC7F85.6594A994-ONC1258051.0064379D-C1258051.0064547E at notes.na.collabserv.com>

Content-Type: text/plain; charset="ISO-8859-1"

Hi Brian,
you might use

mmfsadm saferdump tscomm

to check on which route peer cluster members are reached.


Mit freundlichen Gr??en / Kind regards


Dr. Uwe Falke

IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Gesch?ftsf?hrung:
Frank Hammer, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122




From:   Brian Marshall <mimarsh2 at vt.edu>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   10/19/2016 07:46 PM
Subject:        [gpfsug-discuss] subnets
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



All,

We are setting up communication between 2 clusters using ethernet and
IPoFabric.

The Daemon interface is running on ethernet, so all admin traffic will use
it.

We are still getting the subnets setting correct.

Question:

Does GPFS have a way to query how it is connecting to a given
cluster/node?  i.e. once we have subnets setup how can we tell GPFS is
actually using them.  Currently we just do a large transfer and check
tcpdump for any packets flowing on the high-speed/data/non-admin subnet.


Thank you,
Brian Marshall_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss






------------------------------

Message: 5
Date: Wed, 19 Oct 2016 20:11:57 +0000
From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Will there be any more GPFS 4.2.0-x
        releases?
Message-ID: <142FECE0-E157-42D9-BC10-4C48E78FA065 at vanderbilt.edu>
Content-Type: text/plain; charset="utf-8"

Hi All,

We?re currently running GPFS 4.2.0-4 with an efix installed and now we need a 2nd efix.  I?m not a big fan of adding efix to efix and would prefer to go to a new PTF that contains both efixes.

So ? is there going to be a GPFS 4.2.0-5 (it?s been a longer than normal interval since PTF 4 came out) or do we need to go to GPFS 4.2.1-x?  If the latter, any major changes to watch out for?  Thanks?

Kevin

?
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161019/3a0a91e7/attachment.html>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 57, Issue 49
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161020/47b2ddf6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OutlookEmoji-1466780990050_DSTlogo.png.png
Type: image/png
Size: 6282 bytes
Desc: OutlookEmoji-1466780990050_DSTlogo.png.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161020/47b2ddf6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OutlookEmoji-httpwww.prweb.comreleases201606prweb13504050.htm.jpg
Type: image/jpeg
Size: 14887 bytes
Desc: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20161020/47b2ddf6/attachment.jpg>

------------------------------

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


End of gpfsug-discuss Digest, Vol 57, Issue 53
**********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161021/c8b73cf6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OutlookEmoji-1466780990050_DSTlogo.png.png
Type: image/png
Size: 6282 bytes
Desc: OutlookEmoji-1466780990050_DSTlogo.png.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161021/c8b73cf6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg
Type: image/jpeg
Size: 14887 bytes
Desc: OutlookEmoji-http://www.prweb.com/releases/2016/06/prweb13504050.htm.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161021/c8b73cf6/attachment.jpg>


More information about the gpfsug-discuss mailing list