[gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup scripts) vs. TSM(backup)

Jaime Pinto pinto at scinet.utoronto.ca
Thu Mar 10 11:17:41 GMT 2016


Here is some feedback on the use of mmbackup:

Last night I decided to test mmbackup again, in the simplest syntax  
call possible (see below), and it ran like a charm!

We have a 15TB GPFS with some 41 million files, running gpfs v 3.5; it  
certainty behaved better than what I remember when I last tried this  
under 3.3 or 3.2, however I still didn't specify a snapshot.

I guess it didn't really matter. My idea of sourcing the dsmenv file  
normally used by the TSM BA client before starting mmbackup was just  
what I needed to land the backup material in the same pool and using  
the same policies normally used by the TSM BA client for this file  
system. For my surprise, mmbackup was smart enough to query the proper  
TSM database for all files already there and perform the incremental  
backup just as the TSM client would on its own.

The best of all: it took just under 7 hours, while previously the TSM  
client was taking over 27 hours: that is nearly 1/4 of the time, using  
the same node! This is really good, since now I can finally do a true  
*daily* backup of this FS, so I'll refining and adopting this process  
moving forward, possibly adding a few more nodes as traversing helpers.

Cheers
Jaime



[root at gpc-f114n016 bin]# mmbackup /sysadmin -t incremental -s /tmp
--------------------------------------------------------
mmbackup: Backup of /sysadmin begins at Wed Mar 9 19:45:27 EST 2016.
--------------------------------------------------------
Wed Mar  9 19:45:48 2016 mmbackup:Could not restore previous shadow  
file from TSM server TAPENODE
Wed Mar  9 19:45:48 2016 mmbackup:Querying files currently backed up  
in TSM server:TAPENODE.
Wed Mar  9 21:55:59 2016 mmbackup:Built query data file from TSM  
server: TAPENODE rc = 0
Wed Mar  9 21:56:01 2016 mmbackup:Scanning file system sysadmin
Wed Mar  9 23:47:53 2016 mmbackup:Reconstructing previous shadow file  
/sysadmin/.mmbackupShadow.1.TAPENODE from query data for TAPENODE
Thu Mar 10 01:05:06 2016 mmbackup:Determining file system changes for  
sysadmin [TAPENODE].
Thu Mar 10 01:08:40 2016 mmbackup:changed=26211, expired=30875,  
unsupported=0 for server [TAPENODE]
Thu Mar 10 01:08:40 2016 mmbackup:Sending files to the TSM server  
[26211 changed, 30875 expired].
Thu Mar 10 01:38:41 2016 mmbackup:Expiring files: 0 backed up, 15500  
expired, 0 failed.
Thu Mar 10 02:42:08 2016 mmbackup:Backing up files: 10428 backed up,  
30875 expired, 72 failed.
Thu Mar 10 02:58:40 2016 mmbackup:mmapplypolicy for Backup detected  
errors (rc=9).
Thu Mar 10 02:58:40 2016 mmbackup:Completed policy backup run with 0  
policy errors, 72 files failed, 0 severe errors, returning rc=9.
Thu Mar 10 02:58:40 2016 mmbackup:Policy for backup returned 9 Highest  
TSM error 4
mmbackup: TSM Summary Information:
	Total number of objects inspected: 	57086
	Total number of objects backed up: 	26139
	Total number of objects updated: 	0
	Total number of objects rebound: 	0
	Total number of objects deleted: 	0
	Total number of objects expired: 	30875
	Total number of objects failed: 	72
Thu Mar 10 02:58:40 2016 mmbackup:Analyzing audit log file  
/sysadmin/mmbackup.audit.sysadmin.TAPENODE
Thu Mar 10 02:58:40 2016 mmbackup:72 files not backed up for this  
server. ( failed:72 )
Thu Mar 10 02:58:40 2016 mmbackup:Worst TSM exit 4
Thu Mar 10 02:58:41 2016 mmbackup:72 failures were logged.  
Compensating shadow database...
Thu Mar 10 03:06:23 2016 mmbackup:Analysis complete.
	72 of 72 failed or excluded paths compensated for in 1 pass(es).
Thu Mar 10 03:09:08 2016 mmbackup:TSM server TAPENODE
	had 72 failures or excluded paths and returned 4.
	Its shadow database has been updated.
Thu Mar 10 03:09:08 2016 mmbackup:Incremental backup completed with  
some skipped files.
	TSM had 0 severe errors and returned 4. See the TSM log file for more  
information.
  	72 files had errors, TSM audit logs recorded 72 errors from 1 TSM  
servers, 0 TSM servers skipped.
   exit 4

----------------------------------------------------------
mmbackup: Backup of /sysadmin completed with some skipped files at Thu  
Mar 10 03:09:11 EST 2016.
----------------------------------------------------------
mmbackup: Command failed.  Examine previous error messages to determine cause.






Quoting Jaime Pinto <pinto at scinet.utoronto.ca>:

> Quoting Yaron Daniel <YARD at il.ibm.com>:
>
>> Hi
>>
>> Did u use mmbackup with TSM ?
>>
>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adm.doc/bl1adm_mmbackup.htm
>
> I have used mmbackup on test mode a few times before, while under gpfs
> 3.2 and 3.3, but not under 3.5 yet or 4.x series (not installed in our
> facility yet).
>
> Under both 3.2 and 3.3 mmbackup would always lock up our cluster when
> using snapshot. I never understood the behavior without snapshot, and
> the lock up was intermittent in the carved-out small test cluster, so I
> never felt confident enough to deploy over the larger 4000+ clients
> cluster.
>
> Another issue was that the version of mmbackup then would not let me
> choose the client environment associated with a particular gpfs file
> system, fileset or path, and the equivalent storage pool and /or policy
> on the TSM side.
>
> With the native TSM client we can do this by configuring the dsmenv
> file, and even the NODEMANE/ASNODE, etc, with which to access TSM, so
> we can keep the backups segregated on different pools/tapes if
> necessary (by user, by group, by project, etc)
>
> The problem we all agree on is that TSM client traversing is VERY SLOW,
> and can not be parallelized. I always knew that the mmbackup client was
> supposed to replace the TSM client for the traversing, and then parse
> the "necessary parameters" and files to the native TSM client, so it
> could then take over for the remainder of the workflow.
>
> Therefore, the remaining problems are as follows:
> * I never understood the snapshot induced lookup, and how to fix it.
> Was it due to the size of our cluster or the version of GPFS? Has it
> been addressed under 3.5 or 4.x series? Without the snapshot how would
> mmbackup know what was already gone to backup since the previous
> incremental backup? Does it check each file against what is already on
> TSM to build the list of candidates? What is the experience out there?
>
> * In the v4r2 version of the manual for the mmbackup utility we still
> don't seem to be able to determine which TSM BA Client dsmenv to use as
> a parameter. All we can do is choose the --tsm-servers
> TSMServer[,TSMServer...]] . I can only conclude that all the contents
> of any backup on the GPFS side will always end-up on a default storage
> pool and use the standard TSM policy if nothing else is done. I'm now
> wondering if it would be ok to simply 'source dsmenv' from a shell for
> each instance of the mmbackup we fire up, in addition to setting up the
> other MMBACKUP_DSMC_MISC, MMBACKUP_DSMC_BACKUP, ..., etc as described
> on man page.
>
> * what about the restore side of things? Most mm* commands can only be
> executed by root. Should we still have to rely on the TSM BA Client
> (dsmc|dsmj) if unprivileged users want to restore their own stuff?
>
> I guess I'll have to conduct more experiments.
>
>
>
>>
>> Please also review this :
>>
>> http://files.gpfsug.org/presentations/2015/SBENDER-GPFS_UG_UK_2015-05-20.pdf
>>
>
> This is pretty good, as a high level overview. Much better than a few
> others I've seen with the release of the Spectrum Suite, since it focus
> entirely on GPFS/TSM/backup|(HSM). It would be nice to have some
> typical implementation examples.
>
>
>
> Thanks a lot for the references Yaron, and again thanks for any further
> comments.
> Jaime
>
>
>>
>>
>> Regards
>>
>>
>>
>>
>>
>> Yaron Daniel
>> 94 Em Ha'Moshavot Rd
>>
>> Server, Storage and Data Services - Team Leader
>> Petach Tiqva, 49527
>> Global Technology Services
>> Israel
>> Phone:
>> +972-3-916-5672
>>
>>
>> Fax:
>> +972-3-916-5672
>>
>>
>> Mobile:
>> +972-52-8395593
>>
>>
>> e-mail:
>> yard at il.ibm.com
>>
>>
>> IBM Israel
>>
>>
>>
>>
>>
>>
>>
>> gpfsug-discuss-bounces at spectrumscale.org wrote on 03/09/2016 09:56:13 PM:
>>
>>> From: Jaime Pinto <pinto at scinet.utoronto.ca>
>>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>>> Date: 03/09/2016 09:56 PM
>>> Subject: [gpfsug-discuss] GPFS(snapshot, backup) vs. GPFS(backup
>>> scripts) vs. TSM(backup)
>>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>>>
>>> Here is another area where I've been reading material from several
>>> sources for years, and in fact trying one solution over the other from
>>> time-to-time in a test environment. However, to date I have not been
>>> able to find a one-piece-document where all these different IBM
>>> alternatives for backup are discussed at length, with the pos and cons
>>> well explained, along with the how-to's.
>>>
>>> I'm currently using TSM(built-in backup client), and over the years I
>>> developed a set of tricks to rely on disk based volumes as
>>> intermediate cache, and multiple backup client nodes, to split the
>>> load and substantially improve the performance of the backup compared
>>> to when I first deployed this solution. However I suspect it could
>>> still be improved further if I was to apply tools from the GPFS side
>>> of the equation.
>>>
>>> I would appreciate any comments/pointers.
>>>
>>> Thanks
>>> Jaime
>>>
>>>
>>>
>>>
>>>
>>> ---
>>> Jaime Pinto
>>> SciNet HPC Consortium  - Compute/Calcul Canada
>>> www.scinet.utoronto.ca - www.computecanada.org
>>> University of Toronto
>>> 256 McCaul Street, Room 235
>>> Toronto, ON, M5T1W5
>>> P: 416-978-2755
>>> C: 416-505-1477
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP at SciNet Consortium, University of
>> Toronto.
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>
>>
>>
>
>
>
>
>
>
>          ************************************
>           TELL US ABOUT YOUR SUCCESS STORIES
>          http://www.scinethpc.ca/testimonials
>          ************************************
> ---
> Jaime Pinto
> SciNet HPC Consortium  - Compute/Calcul Canada
> www.scinet.utoronto.ca - www.computecanada.org
> University of Toronto
> 256 McCaul Street, Room 235
> Toronto, ON, M5T1W5
> P: 416-978-2755
> C: 416-505-1477
>
> ----------------------------------------------------------------
> This message was sent using IMP at SciNet Consortium, University of Toronto.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>






          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto
SciNet HPC Consortium  - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.org
University of Toronto
256 McCaul Street, Room 235
Toronto, ON, M5T1W5
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.





More information about the gpfsug-discuss mailing list