[gpfsug-discuss] Same file opened by many nodes / processes

Peter Childs p.childs at qmul.ac.uk
Sun Jul 22 12:26:35 BST 2018


Yes, we run mmbackup, using a snapshot.

The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday)

It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being.

I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet.

We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently.

Thanks for the help.



Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

---- Yaron Daniel wrote ----

Hi

Do u run mmbackup on snapshot , which is read only ?


Regards

________________________________



Yaron Daniel     94 Em Ha'Moshavot Rd
[cid:_1_0C9372140C936C60006FF189C22582D1]

Storage Architect – IL Lab Services (Storage)    Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales     Israel

Phone:  +972-3-916-5672
Fax:    +972-3-916-5672
Mobile: +972-52-8395593
e-mail: yard at il.ibm.com
IBM Israel<http://www.ibm.com/il/he/>



[IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png]       [Related image]



From:        Peter Childs <p.childs at qmul.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        07/10/2018 05:51 PM
Subject:        [gpfsug-discuss] Same file opened by many nodes / processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________



We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

--
Peter Childs
ITS Research Storage
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00001.gif
Type: image/gif
Size: 1851 bytes
Desc: ATT00001.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00002.gif
Type: image/gif
Size: 4376 bytes
Desc: ATT00002.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00003.gif
Type: image/gif
Size: 5093 bytes
Desc: ATT00003.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00004.gif
Type: image/gif
Size: 4746 bytes
Desc: ATT00004.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00005.gif
Type: image/gif
Size: 4557 bytes
Desc: ATT00005.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00006.gif
Type: image/gif
Size: 5093 bytes
Desc: ATT00006.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ATT00007.jpg
Type: image/jpeg
Size: 11294 bytes
Desc: ATT00007.jpg
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180722/f6e2a2c1/attachment.jpg>


More information about the gpfsug-discuss mailing list