[gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration priority

Jaime Pinto pinto at scinet.utoronto.ca
Thu Mar 10 10:55:21 GMT 2016


Quoting Konstantin Arnold <konstantin.arnold at unibas.ch>:

> Hi Jaime,
>
> ... maybe I can give some comments with experience from the field:
> I would suggest, after reaching a high-watermark threshold, the recall
> speed could be throttled to a rate that is lower than migration speed
> (but still high enough to not run into a timeout). I don't think it's a
> good idea to send access denied while trying to prioritize migration. If
> non-IT people would see this message they could think the system is
> broken. It would be unclear what a batch job would do that has to
> prepare data, in the worst case processing would start with incomplete data.

I wouldn't object to any strategy that lets us empty the vase quicker  
than it's being filled. It may just make the solution more complex for  
developers, since this feels a lot like a mini-scheduler.

On the other hand I don't see much of an issue for non-IT people or  
batch jobs depending on the data to be recalled: we already enable  
quotas on our file systems. When quotas are reached the system is  
supposed to "break" anyway, for that particular user|group or  
application, and they still have to handle this situation properly.


>
> We are currently recalling all out data on tape to be moved to a
> different system. There is 15x more data on tape than what would fit on
> the disk pool (and there are millions of files before we set inode quota
> to a low number). We are moving user/project after an other by using
> tape ordered recalls. For that we had to disable a policy that was
> aggressively pre-migrating files and allowed to quickly free space on
> the disk pool. I must admit that it took us a while of tuning thresholds
> and policies.

That is certainly an approach to consider. We still think the  
application should be able to properly manage occupancy on the same  
file system. We run a different system which has a disk based cache  
layer as well, and the strategy is to keep it as full as possible  
(85-90%), so to avoid retrieving data from tape whenever possible,  
while still leaving some cushion for newly saved data. Indeed finding  
the sweet spot is a balancing act.

Thanks for the feedback
Jaime


>
> Best
> Konstantin
>
>
>
> On 03/09/2016 01:12 PM, Jaime Pinto wrote:
>> Yes! A behavior along those lines would be desirable. Users understand
>> very well what it means for a file system to be near full.
>>
>> Are there any customers already doing something similar?
>>
>> Thanks
>> Jaime
>>
>> Quoting Dominic Mueller-Wicke01 <dominic.mueller at de.ibm.com>:
>>
>>>
>>> Hi Jamie,
>>>
>>> I see. So, the recall-shutdown would be something for a short time
>>> period.
>>> right? Just for the time it takes to migrate files out and free space. If
>>> HSM would allow the recall-shutdown, the impact for the users would be
>>> that
>>> each access to migrated files would lead to an access denied error. Would
>>> that be acceptable for the users?
>>>
>>> Greetings, Dominic.
>>>
>>> ______________________________________________________________________________________________________________
>>>
>>>
>>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical
>>> Lead |
>>> +49 7034 64 32794 | dominic.mueller at de.ibm.com
>>>
>>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk
>>> Wittkopp
>>> Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht Stuttgart,
>>> HRB 243294
>>>
>>>
>>>
>>> From:    Jaime Pinto <pinto at scinet.utoronto.ca>
>>> To:    Dominic Mueller-Wicke01/Germany/IBM at IBMDE
>>> Cc:    gpfsug-discuss at spectrumscale.org
>>> Date:    08.03.2016 21:38
>>> Subject:    Re: [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration
>>>             priority
>>>
>>>
>>>
>>> Thanks for the suggestions Dominic
>>>
>>> I remember playing around with premigrated files at the time, and that
>>> was not satisfactory.
>>>
>>> What we are looking for is a configuration based parameter what will
>>> basically break out of the "transparency for the user" mode, and not
>>> perform any further recalling, period, if|when the file system
>>> occupancy is above a certain threshold (98%). We would not mind if
>>> instead gpfs would issue a preemptive "disk full" error message to any
>>> user/app/job relying on those files to be recalled, so migration on
>>> demand will have a chance to be performance. What we prefer is to swap
>>> precedence, ie, any migration requests would be executed ahead of any
>>> recalls, at least until a certain amount of free space on the file
>>> system has been cleared.
>>>
>>> It's really important that this type of feature is present, for us to
>>> reconsider the TSM version of HSM as a solution. It's not clear from
>>> the manual that this can be accomplish in some fashion.
>>>
>>> Thanks
>>> Jaime
>>>
>>> Quoting Dominic Mueller-Wicke01 <dominic.mueller at de.ibm.com>:
>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> in all cases a recall request will be handled transparent for the
>>>> user at
>>>> the time a migrated files is accessed. This can't be prevented and has
>>> two
>>>> down sides: a) the space used in the file system increases and b) random
>>>> access to storage media in the Spectrum Protect server happens. With
>>> newer
>>>> versions of Spectrum Protect for Space Management a so called tape
>>>> optimized recall method is available that can reduce the impact to the
>>>> system (especially Spectrum Protect server).
>>>> If the problem was that the file system went out of space at the time
>>>> the
>>>> recalls came in I would recommend to reduce the threshold settings for
>>> the
>>>> file system and increase the number of premigrated files. This will
>>>> allow
>>>> to free space very quickly if needed. If you didn't use the policy based
>>>> threshold migration so far I recommend to use it. This method is
>>>> significant faster compared to the classical HSM based threshold
>>> migration
>>>> approach.
>>>>
>>>> Greetings, Dominic.
>>>>
>>>>
>>> ______________________________________________________________________________________________________________
>>>
>>>
>>>>
>>>> Dominic Mueller-Wicke | IBM Spectrum Protect Development | Technical
>>>> Lead
>>> |
>>>> +49 7034 64 32794 | dominic.mueller at de.ibm.com
>>>>
>>>> Vorsitzende des Aufsichtsrats: Martina Koederitz; Geschäftsführung: Dirk
>>>> Wittkopp
>>>> Sitz der Gesellschaft: Böblingen; Registergericht: Amtsgericht
>>>> Stuttgart,
>>>> HRB 243294
>>>> ----- Forwarded by Dominic Mueller-Wicke01/Germany/IBM on 08.03.2016
>>> 18:21
>>>> -----
>>>>
>>>> From:         Jaime Pinto <pinto at scinet.utoronto.ca>
>>>> To:         gpfsug main discussion list
>>>> <gpfsug-discuss at spectrumscale.org>
>>>> Date:         08.03.2016 17:36
>>>> Subject:         [gpfsug-discuss] GPFS+TSM+HSM: staging vs. migration
>>> priority
>>>> Sent by:         gpfsug-discuss-bounces at spectrumscale.org
>>>>
>>>>
>>>>
>>>> I'm wondering whether the new version of the "Spectrum Suite" will
>>>> allow us set the priority of the HSM migration to be higher than
>>>> staging.
>>>>
>>>>
>>>> I ask this because back in 2011 when we were still using Tivoli HSM
>>>> with GPFS, during mixed requests for migration and staging operations,
>>>> we had a very annoying behavior in which the staging would always take
>>>> precedence over migration. The end-result was that the GPFS would fill
>>>> up to 100% and induce a deadlock on the cluster, unless we identified
>>>> all the user driven stage requests in time, and killed them all. We
>>>> contacted IBM support a few times asking for a way fix this, and were
>>>> told it was built into TSM. Back then we gave up IBM's HSM primarily
>>>> for this reason, although performance was also a consideration (more
>>>> to this on another post).
>>>>
>>>> We are now reconsidering HSM for a new deployment, however only if
>>>> this issue has been resolved (among a few others).
>>>>
>>>> What has been some of the experience out there?
>>>>
>>>> Thanks
>>>> Jaime
>>>>
>>>>
>>>>
>>>>
>>>> ---
>>>> Jaime Pinto
>>>> SciNet HPC Consortium  - Compute/Calcul Canada
>>>> www.scinet.utoronto.ca - www.computecanada.org
>>>> University of Toronto
>>>> 256 McCaul Street, Room 235
>>>> Toronto, ON, M5T1W5
>>>> P: 416-978-2755
>>>> C: 416-505-1477
>>>>
>>>> ----------------------------------------------------------------
>>>> This message was sent using IMP at SciNet Consortium, University of
>>>> Toronto.
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>           ************************************
>>>            TELL US ABOUT YOUR SUCCESS STORIES
>>>           http://www.scinethpc.ca/testimonials
>>>           ************************************
>>> ---
>>> Jaime Pinto
>>> SciNet HPC Consortium  - Compute/Calcul Canada
>>> www.scinet.utoronto.ca - www.computecanada.org
>>> University of Toronto
>>> 256 McCaul Street, Room 235
>>> Toronto, ON, M5T1W5
>>> P: 416-978-2755
>>> C: 416-505-1477
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP at SciNet Consortium, University of
>>> Toronto.
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>          ************************************
>>           TELL US ABOUT YOUR SUCCESS STORIES
>>          http://www.scinethpc.ca/testimonials
>>          ************************************
>> ---
>> Jaime Pinto
>> SciNet HPC Consortium  - Compute/Calcul Canada
>> www.scinet.utoronto.ca - www.computecanada.org
>> University of Toronto
>> 256 McCaul Street, Room 235
>> Toronto, ON, M5T1W5
>> P: 416-978-2755
>> C: 416-505-1477
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP at SciNet Consortium, University of
>> Toronto.
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>






          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto
SciNet HPC Consortium  - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.org
University of Toronto
256 McCaul Street, Room 235
Toronto, ON, M5T1W5
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.




More information about the gpfsug-discuss mailing list