[gpfsug-discuss] Same file opened by many nodes / processes

Frederick Stock stockf at us.ibm.com
Mon Jul 23 13:06:22 BST 2018


Have you considered keeping the 1G network for daemon traffic and moving 
the data traffic to another network?

Given the description of your configuration with only 2 manager nodes 
handling mmbackup and other tasks my guess is that is where the problem 
lies regarding performance when mmbackup is running with the many nodes 
accessing a single file.  You said the fs managers were on hardware, does 
that mean other nodes in this cluster are VMs of some kind?

You stated that your NSD servers were under powered.  Did you address that 
problem in any way, that is adding memory/CPUs, or did you just move other 
GPFS activity off of those nodes?

Fred
__________________________________________________
Fred Stock | IBM Pittsburgh Lab | 720-430-8821
stockf at us.ibm.com



From:   Peter Childs <p.childs at qmul.ac.uk>
To:     "gpfsug-discuss at spectrumscale.org" 
<gpfsug-discuss at spectrumscale.org>
Date:   07/23/2018 07:06 AM
Subject:        Re: [gpfsug-discuss] Same file opened by many nodes / 
processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



On Mon, 2018-07-23 at 22:13 +1200, José Filipe Higino wrote:
I think the network problems need to be cleared first. Then I would 
investigate further. 

Buf if that is not a trivial path... 
Are you able to understand from the mmfslog what happens when the tipping 
point occurs?

mmfslog thats not a term I've come accross before, if you mean 
/var/adm/ras/mmfs.log.latest then I'm already there is not a lot there, In 
other words no expulsions or errors just a very slow filesystem, We've not 
seen any significantly long waiters either (mmdiag --waiters) so as far as 
I can see its just behaving like a very very busy filesystem.

We've already had IBM looking at the snaps due to the rather slow mmbackup 
process, all I've had back is to try increase -a ie the number of sort 
threads which has speed it up to a certain extent, But once again I think 
we're looking at the results of the issue not the cause.


In my view, when troubleshooting is not easy, the usual methods work/help 
to find the next step:
- Narrow the window of troubleshooting (by discarding "for now" events 
that did not happen within the same timeframe)
- Use "as precise" as possible, timebased events to read the reaction of 
the cluster (via log or others)  and make assumptions about other observed 
situations.
- If possible and when the problem is happening, run some traces, 
gpfs.snap and ask for support via PMR.

Also,

What is version of GPFS?

4.2.3-8 

How many quorum nodes?

4 Quorum nodes with tie breaker disks, however these are not the file 
system manager nodes as to fix a previous problem (with our nsd servers 
not being powerful enough) our fsmanager nodes are on hardware, We have 
two file system manager nodes (Which do token management, quota management 
etc) they also run the mmbackup.

How many filesystems?

1, although we do have a second that is accessed via multi-cluster from 
our older GPFS setup, (thats running 4.2.3-6 currently)

Is the management network the same as the daemon network?

Yes. the management network and the daemon network are the same network. 

Thanks in advance

Peter Childs



On Mon, 23 Jul 2018 at 20:37, Peter Childs <p.childs at qmul.ac.uk> wrote:
On Mon, 2018-07-23 at 00:51 +1200, José Filipe Higino wrote:

Hi there, 

Have you been able to create a test case (replicate the problem)? Can you 
tell us a bit more about the setup?

Not really, It feels like a perfect storm, any one of the tasks running on 
its own would be fine, Its the shear load, our mmpmon data says the 
storage has been flat lining when it occurs.

Its a reasonably standard (small) HPC cluster, with a very mixed work 
load, hence while we can usually find "bad" jobs from the point of view of 
io on this occasion we can see a few large array jobs all accessing the 
same file, the cluster runs fine until we get to a certain point and one 
more will tip the balance. We've been attempting to limit the problem by 
adding limits to the number of jobs in an array that can run at once. But 
that feels like fire fighting. 


Are you using GPFS API over any administrative commands? Any problems with 
the network (being that Ethernet or IB)?

We're not as using the GPFS API, never got it working, which is a shame, 
I've never managed to figure out the setup, although it is on my to do 
list.

Network wise, We've just removed a great deal of noise from arp requests 
by increasing the arp cache size on the nodes. Its a mixed 1GBit/10GBit 
network currently, we're currently looking at removing all the 1GBit nodes 
within the next few months and adding some new faster kit. The Storage is 
attached at 40GBit but it does not look to want to run much above 5Gbit I 
suspect due to Ethernet back off due to the mixed speeds. 

While we do have some IB we don't currently run our storage over it.

Thanks in advance

Peter Childs





Sorry if I am un-announced here for the first time. But I would like to 
help if I can.

Jose Higino,
from NIWA
New Zealand

Cheers

On Sun, 22 Jul 2018 at 23:26, Peter Childs <p.childs at qmul.ac.uk> wrote:
Yes, we run mmbackup, using a snapshot.

The scan usally takes an hour, but for the last week has been taking many 
hours (i saw it take 12 last Tuesday)

It's speeded up again now back to its normal hour, but the high io jobs 
accessing the same file from many nodes also look to have come to an end 
for the time being.

I was trying to figure out howto control the bad io using mmchqos, to 
prioritise certain nodes over others but had not worked out if that was 
possible yet.

We've only previously seen this problem when we had some bad disks in our 
storage, which we replaced, I've checked and I can't see that issue 
currently.

Thanks for the help.



Peter Childs
Research Storage
ITS Research and Teaching Support
Queen Mary, University of London

---- Yaron Daniel wrote ----

Hi

Do u run mmbackup on snapshot , which is read only ?

 
Regards
 


 
 
Yaron Daniel
 94 Em Ha'Moshavot Rd

Storage Architect – IL Lab Services (Storage)
 Petach Tiqva, 49527
IBM Global Markets, Systems HW Sales
 Israel
 
 
 
Phone:
+972-3-916-5672
 
 
Fax:
+972-3-916-5672
 
 
Mobile:
+972-52-8395593
 
 
e-mail:
yard at il.ibm.com
 
 
IBM Israel
 
 
 
 

 



From:        Peter Childs <p.childs at qmul.ac.uk>
To:        "gpfsug-discuss at spectrumscale.org" <
gpfsug-discuss at spectrumscale.org>
Date:        07/10/2018 05:51 PM
Subject:        [gpfsug-discuss] Same file opened by many nodes / 
processes
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



We have an situation where the same file is being read by around 5000
"jobs" this is an array job in uge with a tc set, so the file in
question is being opened by about 100 processes/jobs at the same time.

Its a ~200GB file so copying the file locally first is not an easy
answer, and these jobs are causing issues with mmbackup scanning the
file system, in that the scan is taking 3 hours instead of the normal
40-60 minutes.

This is read only access to the file, I don't know the specifics about
the job.

It looks like the metanode is moving around a fair amount (given what I
can see from mmfsadm saferdump file)

I'm wondering if we there is anything we can do to improve things or
that can be tuned within GPFS, I'm don't think we have an issue with
token management, but would increasing maxFileToCache on our token
manager node help say?

Is there anything else I should look at, to try and attempt to allow
GPFS to share this file better.

Thanks in advance

Peter Childs

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
Peter Childs
ITS Research Storage
Queen Mary, University of London
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180723/54a7a6b7/attachment.htm>


More information about the gpfsug-discuss mailing list