[gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority

Thu Mar 10 08:56:01 GMT 2016

Hi Jaime,

... maybe I can give some comments with experience from the field:
I would suggest, after reaching a high-watermark threshold, the recall
speed could be throttled to a rate that is lower than migration speed
(but still high enough to not run into a timeout). I don't think it's a
good idea to send access denied while trying to prioritize migration. If
non-IT people would see this message they could think the system is
broken. It would be unclear what a batch job would do that has to
prepare data, in the worst case processing would start with incomplete data.

We are currently recalling all out data on tape to be moved to a
different system. There is 15x more data on tape than what would fit on
the disk pool (and there are millions of files before we set inode quota
to a low number). We are moving user/project after an other by using
tape ordered recalls. For that we had to disable a policy that was
aggressively pre-migrating files and allowed to quickly free space on
the disk pool. I must admit that it took us a while of tuning thresholds
and policies.

Best
Konstantin

On 03/09/2016 01:12 PM, Jaime Pinto wrote:
> Yes! A behavior along those lines would be desirable. Users understand
> very well what it means for a file system to be near full.
> 
> Are there any customers already doing something similar?
> 
> Thanks
> Jaime
> 
> Quoting Dominic Mueller-Wicke01 <dominic.mueller at de.ibm.com>:
> 
>>
>> Hi Jamie,
>>
>> I see. So, the recall-shutdown would be something for a short time
>> period.
>> right? Just for the time it takes to migrate files out and free space. If
>> HSM would allow the recall-shutdown, the impact for the users would be
>> that
>> each access to migrated files would lead to an access denied error. Would
>> that be acceptable for the users?
>>
>> Greetings, Dominic.
>>
>> ______________________________________________________________________________________________________________
>>
>>
>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical
>> Lead |
>> +49 7034 64 32794 | dominic.mueller at de.ibm.com
>>
>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk
>> Wittkopp
>> Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart,
>> HRB 243294
>>
>>
>>
>> From:    Jaime Pinto <pinto at scinet.utoronto.ca>
>> To:    Dominic Mueller-Wicke01/Germany/IBM at IBMDE
>> Cc:    gpfsug-discuss at spectrumscale.org
>> Date:    08.03.2016 21:38
>> Subject:    Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration
>>             priority
>>
>>
>>
>> Thanks for the suggestions Dominic
>>
>> I remember playing around with premigrated files at the time, and that
>> was not satisfactory.
>>
>> What we are looking for is a configuration based parameter what will
>> basically break out of the "transparency for the user" mode, and not
>> perform any further recalling, period, if|when the file system
>> occupancy is above a certain threshold (98%). We would not mind if
>> instead gpfs would issue a preemptive "disk full" error message to any
>> user/app/job relying on those files to be recalled, so migration on
>> demand will have a chance to be performance. What we prefer is to swap
>> precedence, ie, any migration requests would be executed ahead of any
>> recalls, at least until a certain amount of free space on the file
>> system has been cleared.
>>
>> It's really important that this type of feature is present, for us to
>> reconsider the TSM version of HSM as a solution. It's not clear from
>> the manual that this can be accomplish in some fashion.
>>
>> Thanks
>> Jaime
>>
>> Quoting Dominic Mueller-Wicke01 <dominic.mueller at de.ibm.com>:
>>
>>>
>>>
>>> Hi,
>>>
>>> in all cases a recall request will be handled transparent for the
>>> user at
>>> the time a migrated files is accessed. This can't be prevented and has
>> two
>>> down sides: a) the space used in the file system increases and b) random
>>> access to storage media in the Spectrum Protect server happens. With
>> newer
>>> versions of Spectrum Protect for Space Management a so called tape
>>> optimized recall method is available that can reduce the impact to the
>>> system (especially Spectrum Protect server).
>>> If the problem was that the file system went out of space at the time
>>> the
>>> recalls came in I would recommend to reduce the threshold settings for
>> the
>>> file system and increase the number of premigrated files. This will
>>> allow
>>> to free space very quickly if needed. If you didn't use the policy based
>>> threshold migration so far I recommend to use it. This method is
>>> significant faster compared to the classical HSM based threshold
>> migration
>>> approach.
>>>
>>> Greetings, Dominic.
>>>
>>>
>> ______________________________________________________________________________________________________________
>>
>>
>>>
>>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical
>>> Lead
>> |
>>> +49 7034 64 32794 | dominic.mueller at de.ibm.com
>>>
>>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk
>>> Wittkopp
>>> Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht
>>> Stuttgart,
>>> HRB 243294
>>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016
>> 18:21
>>> -----
>>>
>>> From:         Jaime Pinto <pinto at scinet.utoronto.ca>
>>> To:         gpfsug main discussion list
>>> <gpfsug-discuss at spectrumscale.org>
>>> Date:         08.03.2016 17:36
>>> Subject:         [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration
>> priority
>>> Sent by:         gpfsug-discuss-bounces at spectrumscale.org
>>>
>>>
>>>
>>> I'm wondering whether the new version of the "Spectrum Suite" will
>>> allow us set the priority of the HSM migration to be higher than
>>> staging.
>>>
>>>
>>> I ask this because back in 2011 when we were still using Tivoli HSM
>>> with GPFS, during mixed requests for migration and staging operations,
>>> we had a very annoying behavior in which the staging would always take
>>> precedence over migration. The end-result was that the GPFS would fill
>>> up to 100% and induce a deadlock on the cluster, unless we identified
>>> all the user driven stage requests in time, and killed them all. We
>>> contacted IBM support a few times asking for a way fix this, and were
>>> told it was built into TSM. Back then we gave up IBM's HSM primarily
>>> for this reason, although performance was also a consideration (more
>>> to this on another post).
>>>
>>> We are now reconsidering HSM for a new deployment, however only if
>>> this issue has been resolved (among a few others).
>>>
>>> What has been some of the experience out there?
>>>
>>> Thanks
>>> Jaime
>>>
>>>
>>>
>>>
>>> ---
>>> Jaime Pinto
>>> SciNet HPC Consortium  - Compute/Calcul Canada
>>> www.scinet.utoronto.ca - www.computecanada.org
>>> University of Toronto
>>> 256 McCaul Street, Room 235
>>> Toronto, ON, M5T1W5
>>> P: 416-978-2755
>>> C: 416-505-1477
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP at SciNet Consortium, University of
>>> Toronto.
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>           ************************************
>>            TELL US ABOUT YOUR SUCCESS STORIES
>>           http://www.scinethpc.ca/testimonials
>>           ************************************
>> ---
>> Jaime Pinto
>> SciNet HPC Consortium  - Compute/Calcul Canada
>> www.scinet.utoronto.ca - www.computecanada.org
>> University of Toronto
>> 256 McCaul Street, Room 235
>> Toronto, ON, M5T1W5
>> P: 416-978-2755
>> C: 416-505-1477
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP at SciNet Consortium, University of
>> Toronto.
>>
>>
>>
>>
> 
> 
> 
> 
> 
> 
>          ************************************
>           TELL US ABOUT YOUR SUCCESS STORIES
>          http://www.scinethpc.ca/testimonials
>          ************************************
> ---
> Jaime Pinto
> SciNet HPC Consortium  - Compute/Calcul Canada
> www.scinet.utoronto.ca - www.computecanada.org
> University of Toronto
> 256 McCaul Street, Room 235
> Toronto, ON, M5T1W5
> P: 416-978-2755
> C: 416-505-1477
> 
> ----------------------------------------------------------------
> This message was sent using IMP at SciNet Consortium, University of
> Toronto.
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss