[gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync

Alexander Saupp Alexander.Saupp at de.ibm.com
Tue Oct 23 06:51:54 BST 2018



Hi,

I agree, a tool with proper wrapping delivered in samples would be the
right approach.

No warranty, no support - below a prototype I documented 2 years ago (prior
to mmfind availability). The BP used an alternate approach, so its not
tested at scale, but the principle was tested and works.
Reading through it right now I'd re-test the 'deleted files on destination
that were deleted on the source' scenario, that might now require some
fixing.

# Use 'GPFS patched' rsync on both ends to keep GPFS attributes
      https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync

# Policy - initial & differential (add mod_time > .. for incremental runs.
Use MOD_TIME < .. to have a defined start for the next incremental rsync,
remove it for the 'final' rsync)
      #
      http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_usngfileattrbts.htm

      cat /tmp/policy.pol
      RULE 'mmfind'
          LIST 'mmfindList'
          DIRECTORIES_PLUS
          SHOW(
                VARCHAR(MODE) || ' ' ||
                VARCHAR(NLINK) || ' ' ||
                VARCHAR(USER_ID) || ' ' ||
                VARCHAR(GROUP_ID) || ' ' ||
                VARCHAR(FILE_SIZE) || ' ' ||
                VARCHAR(KB_ALLOCATED) || ' ' ||
                VARCHAR(POOL_NAME) || ' ' ||
                VARCHAR(MISC_ATTRIBUTES) || ' ' ||
                VARCHAR(ACCESS_TIME) || ' ' ||
                VARCHAR(CREATION_TIME) || ' ' ||
                VARCHAR(MODIFICATION_TIME)
              )
      # First run
          WHERE MODIFICATION_TIME < TIMESTAMP('2016-08-10 00:00:00')
      # Incremental runs
          WHERE MODIFICATION_TIME > TIMESTAMP('2016-08-10 00:00:00') and
      MODIFICATION_TIME < TIMESTAMP('2016-08-20 00:00:00')
      # Final run during maintenance, should also do deletes, ensure you to
      call rsync the proper way (--delete)
          WHERE TRUE


# Apply policy, defer will ensure the result file(s) are not deleted
      mmapplypolicy  group3fs -P /tmp/policy.pol  -f /ibm/group3fs/pol.txt
      -I defer

# FYI only - look at results, ... not required
      # cat /ibm/group3fs/pol.txt.list.mmfindList
      3 1 0  drwxr-xr-x 4 0 0 262144 512 system D2u 2016-08-25
      08:30:35.053057 -- /ibm/group3fs
      41472 1077291531 0  drwxr-xr-x 5 0 0 4096 0 system D2u 2016-08-18
      21:07:36.996777 -- /ibm/group3fs/ces
      60416 842873924 0  drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18
      21:07:45.947920 -- /ibm/group3fs/ces/ha
      60417 2062486126 0  -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19
      15:17:57.428922 -- /ibm/group3fs/ces/ha/.dummy
      60418 436745294 0  drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18
      21:05:54.482094 -- /ibm/group3fs/ces/ces
      60419 647668346 0  -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19
      15:17:57.484923 -- /ibm/group3fs/ces/ces/.dummy
      60420 1474765985 0  -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-18
      21:06:43.133640
      -- /ibm/group3fs/ces/ces/addrs/1471554403-node0-9.155.118.69
      60421 1020724013 0  drwxr-xr-x 2 0 0 4096 0 system D2um 2016-08-18
      21:07:37.000695 -- /ibm/group3fs/ces/ganesha
      cat /ibm/group3fs/pol.txt.list.mmfindList  |awk ' { print $19}'

      /ibm/group3fs/ces/ha/.dummy
      /ibm/group3fs/ces/ces/.dummy
      /ibm/group3fs/ces/ha/nfs/ganesha/v4recov/node3
      /ibm/group3fs/ces/ha/nfs/ganesha/v4old/node3
      /ibm/group3fs/pol.txt.list.mmfindList
      /ibm/group3fs/ces/ces/connections
      /ibm/group3fs/ces/ha/nfs/ganesha/gpfs-epoch
      /ibm/group3fs/ces/ha/nfs/ganesha/v4recov
      /ibm/group3fs/ces/ha/nfs/ganesha/v4old

# Start rsync - could split up single result file into multiple ones for
parallel / multi node runs
      rsync -av --gpfs-attrs --progress --files-from $
      ( cat /ibm/group3fs/pol.txt.list.mmfindList ) 10.10.10.10:/path

Be sure you verify that extended attributes are properly replicated. I have
in mind that you need to ensure the 'remote' rsync is not the default one,
but the one with GPFS capabilities (rsync -e "remoteshell").

Kind regards,
Alex Saupp


Mit freundlichen Grüßen / Kind regards

Alexander Saupp

IBM Systems, Storage Platform, EMEA Storage Competence Center
                                                                                                              
                                                                                                              
                                                                                                              
                                                                                                              
                                                                                                              
 Phone:            +49 7034-643-1512                         IBM Deutschland GmbH                             
                                                                                                              
 Mobile:           +49-172 7251072                           Am Weiher 24                                     
                                                                                                              
 Email:            alexander.saupp at de.ibm.com                65451 Kelsterbach                                
                                                                                                              
                                                             Germany                                          
                                                                                                              
                                                                                                              
                                                                                                              
 IBM Deutschland                                                                                              
 GmbH /                                                                                                       
 Vorsitzender des                                                                                             
 Aufsichtsrats:                                                                                               
 Martin Jetter                                                                                                
 Geschäftsführung:                                                                                            
 Matthias Hartmann                                                                                            
 (Vorsitzender),                                                                                              
 Norbert Janzen,                                                                                              
 Stefan Lutz,                                                                                                 
 Nicole Reimer,                                                                                               
 Dr. Klaus                                                                                                    
 Seifert, Wolfgang                                                                                            
 Wendt                                                                                                        
 Sitz der                                                                                                     
 Gesellschaft:                                                                                                
 Ehningen /                                                                                                   
 Registergericht:                                                                                             
 Amtsgericht                                                                                                  
 Stuttgart, HRB                                                                                               
 14562 /                                                                                                      
 WEEE-Reg.-Nr. DE                                                                                             
 99369940                                                                                                     
                                                                                                              

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181023/27092fed/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181023/27092fed/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1C800025.gif
Type: image/gif
Size: 1851 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181023/27092fed/attachment-0001.gif>


More information about the gpfsug-discuss mailing list