[gpfsug-discuss] Same file opened by many nodes / processes

Steve Crusan scrusan at ddn.com
Tue Jul 10 18:09:48 BST 2018


I’ve used ‘preferDesignatedMnode=1’ in the past, but that was for a specific usecase, and that would have to come from the direction of support.

I guess if you wanted to test your metanode theory, you could open that file (and keep it open) on node from a different remote cluster, or one of your local NSD servers and see what kind of results you get out of it.




----
Steve Crusan
scrusan at ddn.com<mailto:scrusan at ddn.com>
(719) 695-3190


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Marc A Kaplan <makaplan at us.ibm.com>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, July 10, 2018 at 11:16 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Same file opened by many nodes / processes

I would start by making sure that the application(s)... open the file O_RDONLY and then you may want to fiddle with the GPFS atime settings:

https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.ins.doc/bl1ins_atime.htm

At first I thought "uge" was a typo, but I guess you are referring to:
https://supcom.hgc.jp/english/utili_info/manual/uge.html

Still not begin familiar, it would be "interesting" to know from a file operations point of view, what's going on in terms of opens, reads, closes : per second.



From:        Peter Childs <p.childs at qmul.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        07/10/2018 10:51 AM
Subject:        [gpfsug-discuss] Same file opened by many nodes / processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________



We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

--
Peter Childs
ITS Research Storage
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180710/0a159af5/attachment.htm>


More information about the gpfsug-discuss mailing list