[gpfsug-discuss] mmbackup questions

Thu Oct 17 19:04:47 BST 2019

On 17/10/2019 15:26, Skylar Thompson wrote:
> On Thu, Oct 17, 2019 at 10:26:45AM +0000, Jonathan Buzzard wrote:
>> I have been looking to give mmbackup another go (a very long history
>> with it being a pile of steaming dinosaur droppings last time I tried,
>> but that was seven years ago).
>>
>> Anyway having done a backup last night I am curious about something
>> that does not appear to be explained in the documentation.
>>
>> Basically the output has a line like the following
>>
>>          Total number of objects inspected:      474630
>>
>> What is this number? Is it the number of files that have changed since
>> the last backup or something else as it is not the number of files on
>> the file system by any stretch of the imagination. One would hope that
>> it inspected everything on the file system...
> 
> I believe this is the number of paths that matched some include rule (or
> didn't match some exclude rule) for mmbackup. I would assume it would
> differ from the "total number of objects backed up" line if there were
> include/exclude rules that mmbackup couldn't process, leaving it to dsmc to
> decide whether to process.
>   

After digging through dsminstr.log it would appear to be the sum of the 
combination of new, changed and deleted files that mmbackup is going to 
process. There is some wierd sh*t going on though with mmbackup on the 
face of it, where it sends one file to the TSM server.

A line with the total number of files in the file system (aka potential 
backup candidates) would be nice I think.

>> Also it appears that the shadow database is held on the GPFS file system
>> that is being backed up. Is there any way to change the location of that?
>> I am only using one node for backup (because I am cheap and don't like
>> paying for more PVU's than I need to) and would like to hold it on the
>> node doing the backup where I can put it on SSD. Which does to things
>> firstly hopefully goes a lot faster, and secondly reduces the impact on
>> the file system of the backup.
> 
> I haven't tried it, but there is a MMBACKUP_RECORD_ROOT environment
> variable noted in the mmbackup man path:
> 
>                    Specifies an alternative directory name for
>                    storing all temporary and permanent records for
>                    the backup. The directory name specified must
>                    be an existing directory and it cannot contain
>                    special characters (for example, a colon,
>                    semicolon, blank, tab, or comma).
> 
> Which seems like it might provide a mechanism to store the shadow database
> elsewhere. For us, though, we provide storage via a cost center, so we
> would want our customers to eat the full cost of their excessive file counts.
>   

We have set a file quota of one million for all our users. So far only 
one users has actually needed it raising. It does however make users 
come and have a conversation with us about what they are doing. With the 
one exception they have found ways to do their work without abusing the 
file system as a database.

We don't have a SSD storage pool on the file system so moving it to the 
backup node for which we can add SSD cheaply (I mean really really cheap 
these days) is more realistic that adding some SSD for a storage pool to 
the file system. Once I am a bit more familiar with it I will try 
changing it to the system disks. It's not SSD at the moment but if it 
works I can easily justify getting some and replacing the existing 
drives (it would just be two RAID rebuilds away).

Last time it was brought up you could not add extra shelves to an 
existing DSS-G system, you had to buy a whole new one. This is despite 
the servers shipping with a full complement of SAS cards and a large box 
full of 12Gbps SAS cables (well over £1000 worth at list I reckon) that 
are completely useless. Ok they work and I could use them elsewhere but 
frankly why ship them if I can't expand!!!

>> Anyway a significant speed up (assuming it worked) was achieved but I
>> note even the ancient Xeon E3113 (dual core 3GHz) was never taxed (load
>> average never went above one) and we didn't touch the swap despite only
>> have 24GB of RAM. Though the 10GbE networking did get busy during the
>> transfer of data to the TSM server bit of the backup but during the
>> "assembly stage" it was all a bit quiet, and the DSS-G server nodes where
>> not busy either. What options are there for tuning things because I feel
>> it should be able to go a lot faster.
> 
> We have some TSM nodes (corresponding to GPFS filesets) that stress out our
> mmbackup cluster at the sort step of mmbackup. UNIX sort is not
> RAM-friendly, as it happens.
> 

I have configured more monitoring  of the system, and will watch it over 
the coming days, but nothing was stressed on our system at all as far as 
I can tell but it was going slower than I had hoped. It was still way 
faster than a traditional dsmc incr but I was hoping for more though I 
am not sure why as the backup now takes place well inside my backup 
window. Perhaps I am being greedy.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG