[gpfsug-discuss] Same file opened by many nodes / processes

Peter Childs p.childs at qmul.ac.uk
Tue Jul 10 22:06:27 BST 2018


The reason I think the metanode is moving around is I'd done a limited amount of trying to track it down using "mmfsadm saferdump file" and it moved before I'd tracked down the correct metanode. But I might have been chasing ghosts, so it may be operating normally and nothing to worry about.

The user reading the file only has read access to it from the file permissions,

Mmbackup has only slowed down while this job has been running. As I say the scan for what to backup usally takes 40-60<tel:40-60> minutes, but is currently taking 3-4 hours with these jobs running. I've seen it take 3 days when our storage went bad (slow and failing disks) but that is usally a sign of a bad disk and pulling the disk and rebuilding the RAID "fixed" that straight away. I cant see anything like that currently however.

It might be that its network congestion were suffering from and nothing to do with token management but as the mmpmon bytes read data is running very high with this job and the load is spread over 50+ nodes it's difficult to see one culprit. It's a mixed speed ethernet network mainly 10GB connected although the nodes in question are legacy with only 1GB connections (and 40GB to the back of the storage.

We're currently running 4.2.3-8<tel:4.2.3-8>

Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

---- IBM Spectrum Scale wrote ----

What is in the dump that indicates the metanode is moving around?  Could you please provide an example of what you are seeing?

You noted that the access is all read only, is the file opened for read only or for read and write?

What makes you state that this particular file is interfering with the scan done by mmbackup?  Reading a file, no matter how large should significantly impact a policy scan.

What version of Spectrum Scale are you running and how large is your cluster?

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of  Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact  1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.



From:        Peter Childs <p.childs at qmul.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date:        07/10/2018 10:51 AM
Subject:        [gpfsug-discuss] Same file opened by many nodes / processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________



We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

--
Peter Childs
ITS Research Storage
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180710/855879ae/attachment.htm>


More information about the gpfsug-discuss mailing list