[gpfsug-discuss] Same file opened by many nodes / processes
IBM Spectrum Scale
scale at us.ibm.com
Tue Jul 10 23:15:01 BST 2018
Regarding the permissions on the file I assume you are not using ACLs,
correct? If you are then you would need to check what the ACL allows.
Is your metadata on separate NSDs? Having metadata on separate NSDs, and
preferably fast NSDs, would certainly help your mmbackup scanning.
Have you looked at the information from netstat or similar network tools
to see how your network is performing? Faster networks generally require
a bit of OS tuning and some GPFS tuning to optimize their performance.
Regards, The Spectrum Scale (GPFS) team
------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.
The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.
From: Peter Childs <p.childs at qmul.ac.uk>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: 07/10/2018 05:23 PM
Subject: Re: [gpfsug-discuss] Same file opened by many nodes /
processes
Sent by: gpfsug-discuss-bounces at spectrumscale.org
Oh the cluster is 296 nodes currently with a set size of 300 (mmcrfs -n
300)
We're currently looking to upgrade the 1G connected nodes to 10G within
the next few months.
Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London
---- Peter Childs wrote ----
The reason I think the metanode is moving around is I'd done a limited
amount of trying to track it down using "mmfsadm saferdump file" and it
moved before I'd tracked down the correct metanode. But I might have been
chasing ghosts, so it may be operating normally and nothing to worry
about.
The user reading the file only has read access to it from the file
permissions,
Mmbackup has only slowed down while this job has been running. As I say
the scan for what to backup usally takes 40-60 minutes, but is currently
taking 3-4 hours with these jobs running. I've seen it take 3 days when
our storage went bad (slow and failing disks) but that is usally a sign of
a bad disk and pulling the disk and rebuilding the RAID "fixed" that
straight away. I cant see anything like that currently however.
It might be that its network congestion were suffering from and nothing to
do with token management but as the mmpmon bytes read data is running very
high with this job and the load is spread over 50+ nodes it's difficult to
see one culprit. It's a mixed speed ethernet network mainly 10GB connected
although the nodes in question are legacy with only 1GB connections (and
40GB to the back of the storage.
We're currently running 4.2.3-8
Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London
---- IBM Spectrum Scale wrote ----
What is in the dump that indicates the metanode is moving around? Could
you please provide an example of what you are seeing?
You noted that the access is all read only, is the file opened for read
only or for read and write?
What makes you state that this particular file is interfering with the
scan done by mmbackup? Reading a file, no matter how large should
significantly impact a policy scan.
What version of Spectrum Scale are you running and how large is your
cluster?
Regards, The Spectrum Scale (GPFS) team
------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479
.
If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.
The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.
From: Peter Childs <p.childs at qmul.ac.uk>
To: "gpfsug-discuss at spectrumscale.org"
<gpfsug-discuss at spectrumscale.org>
Date: 07/10/2018 10:51 AM
Subject: [gpfsug-discuss] Same file opened by many nodes /
processes
Sent by: gpfsug-discuss-bounces at spectrumscale.org
We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.
Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.
This is read only access to the file, I don't know the specifics about
the job.
It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)
I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?
Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.
Thanks in advance
Peter Childs
--
Peter Childs
ITS Research Storage
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180710/1bd47b73/attachment.htm>
More information about the gpfsug-discuss
mailing list