[gpfsug-discuss] Best way to migrate data

Carl mutantllama at gmail.com
Thu Oct 18 21:54:42 BST 2018


It may be overkill for your use case but MPI file utils is very good for
large datasets.

https://github.com/hpc/mpifileutils

Cheers,

Carl.

On Fri, 19 Oct 2018 at 7:05 am, <Dwayne.Hart at med.mun.ca> wrote:

> Thank you all for the responses. I'm currently using msrsync and things
> appear to be going very well.
>
> The data transfer is contained inside our DC. I'm transferring a user's
> home directory content from one GPFS file system to another. Our IBM
> Spectrum Scale Solution consists of 12 IO nodes connected to IB and the
> client node that I'm transferring the data from one fs to another is also
> connected to IB with a possible maximum of 2 hops.
>
> [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32
> /gpfs/home/user/ /research/project/user/
> [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0
> G/s bw] [monq 0] [jq 62043]
>
> Best,
> Dwayne
>
> -----Original Message-----
> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:
> gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black
> Sent: Thursday, October 18, 2018 4:43 PM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Best way to migrate data
>
> Other tools and approaches that we've found helpful:
> msrsync: handles parallelizing rsync within a dir tree and can greatly
> speed up transfers on a single node with both filesystems mounted,
> especially when dealing with many small files
> Globus/GridFTP: set up one or more endpoints on each side, gridftp will
> auto parallelize and recover from disruptions
>
> msrsync is easier to get going but is limited to one parent dir per node.
> We've sometimes done an additional level of parallelization by running
> msrsync with different top level directories on different hpc nodes
> simultaneously.
>
> Best,
> Chris
>
> Refs:
> https://github.com/jbd/msrsync
> https://www.globus.org/
>
> On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Sanchez, Paul" <gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Paul.Sanchez at deshaw.com> wrote:
>
>     Sharding can also work, if you have a storage-connected compute grid
> in your environment:  If you enumerate all of the directories, then use a
> non-recursive rsync for each one, you may be able to parallelize the
> workload by using several clients simultaneously.  It may still max out the
> links of these clients (assuming your source read throughput and target
> write throughput bottlenecks aren't encountered first) but it may run that
> way for 1/100th of the time if you can use 100+ machines.
>
>     -Paul
>     -----Original Message-----
>     From: gpfsug-discuss-bounces at spectrumscale.org <
> gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Buterbaugh, Kevin L
>     Sent: Thursday, October 18, 2018 2:26 PM
>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>     Subject: Re: [gpfsug-discuss] Best way to migrate data
>
>     Hi Dwayne,
>
>     I’m assuming you can’t just let an rsync run, possibly throttled in
> some way?  If not, and if you’re just tapping out your network, then would
> it be possible to go old school?  We have parts of the Medical Center here
> where their network connections are … um, less than robust.  So they tar
> stuff up to a portable HD, sneaker net it to us, and we untar is from an
> NSD server.
>
>     HTH, and I really hope that someone has a better idea than that!
>
>     Kevin
>
>     > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote:
>     >
>     > Hi,
>     >
>     > Just wondering what the best recipe for migrating a user’s home
> directory content from one GFPS file system to another which hosts a larger
> research GPFS file system? I’m currently using rsync and it has maxed out
> the client system’s IB interface.
>     >
>     > Best,
>     > Dwayne
>     > —
>     > Dwayne Hart | Systems Administrator IV
>     >
>     > CHIA, Faculty of Medicine
>     > Memorial University of Newfoundland
>     > 300 Prince Philip Drive
>     > St. John’s, Newfoundland | A1B 3V6
>     > Craig L Dobbin Building | 4M409
>     > T 709 864 6631
>     > _______________________________________________
>     > gpfsug-discuss mailing list
>     > gpfsug-discuss at spectrumscale.org
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e=
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e=
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e=
>
>
> ________________________________
>
> This message is for the recipient’s use only, and may contain
> confidential, privileged or protected information. Any unauthorized use or
> dissemination of this communication is prohibited. If you received this
> message in error, please immediately notify the sender and destroy all
> copies of this message. The recipient should check this email and any
> attachments for the presence of viruses, as we accept no liability for any
> damage caused by any virus transmitted by this email.
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181019/e8f7fb85/attachment.htm>


More information about the gpfsug-discuss mailing list