[gpfsug-discuss] Same file opened by many nodes / processes

Mon Jul 23 09:37:41 BST 2018

On Mon, 2018-07-23 at 00:51 +1200, José Filipe Higino wrote:

Hi there,

Have you been able to create a test case (replicate the problem)? Can you tell us a bit more about the setup?

Not really, It feels like a perfect storm, any one of the tasks running on its own would be fine, Its the shear load, our mmpmon data says the storage has been flat lining when it occurs.

Its a reasonably standard (small) HPC cluster, with a very mixed work load, hence while we can usually find "bad" jobs from the point of view of io on this occasion we can see a few large array jobs all accessing the same file, the cluster runs fine until we get to a certain point and one more will tip the balance. We've been attempting to limit the problem by adding limits to the number of jobs in an array that can run at once. But that feels like fire fighting.

Are you using GPFS API over any administrative commands? Any problems with the network (being that Ethernet or IB)?

We're not as using the GPFS API, never got it working, which is a shame, I've never managed to figure out the setup, although it is on my to do list.

Network wise, We've just removed a great deal of noise from arp requests by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit network currently, we're currently looking at removing all the 1GBit nodes within the next few months and adding some new faster kit. The Storage is attached at 40GBit but it does not look to want to run much above 5Gbit I suspect due to Ethernet back off due to the mixed speeds.

While we do have some IB we don't currently run our storage over it.

Thanks in advance

Peter Childs

Sorry if I am un-announced here for the first time. But I would like to help if I can.

Jose Higino,
from NIWA
New Zealand

Cheers

On Sun, 22 Jul 2018 at 23:26, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:
Yes, we run mmbackup, using a snapshot.

The scan usally takes an hour, but for the last week has been taking many hours (i saw it take 12 last Tuesday)

It's speeded up again now back to its normal hour, but the high io jobs accessing the same file from many nodes also look to have come to an end for the time being.

I was trying to figure out howto control the bad io using mmchqos, to prioritise certain nodes over others but had not worked out if that was possible yet.

We've only previously seen this problem when we had some bad disks in our storage, which we replaced, I've checked and I can't see that issue currently.

Thanks for the help.

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

---- Yaron Daniel wrote ----

Hi

Do u run mmbackup on snapshot , which is read only ?

Regards

________________________________

Yaron Daniel     94 Em Ha'Moshavot Rd
[cid:_1_0C9372140C936C60006FF189C22582D1]

Storage Architect – IL Lab Services (Storage)    Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales     Israel

Phone:  +972-3-916-5672
Fax:    +972-3-916-5672
Mobile: +972-52-8395593
e-mail: yard at il.ibm.com<mailto:yard at il.ibm.com>
IBM Israel<http://www.ibm.com/il/he/>

[IBM Storage Strategy and Solutions v1][IBM Storage Management and Data Protection v1][cid:_1_0C9306EC0C92FECC006FF189C22582D1][cid:_1_0C9308F40C92FECC006FF189C22582D1] [https://acclaim-production-app.s3.amazonaws.com/images/6c2c3858-6df8-45be-ac2b-f93b8da74e20/Data%2BDriven%2BMulti%2BCloud%2BStrategy%2BV1%2Bver%2B4.png]       [Related image]

From:        Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>>
To:        "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        07/10/2018 05:51 PM
Subject:        [gpfsug-discuss] Same file opened by many nodes / processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________

We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss

--

Peter Childs
ITS Research Storage
Queen Mary, University of London

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180723/0f0b98c0/attachment.htm>