[gpfsug-discuss] gpfs performance monitoring

Sven Oehme oehmes at us.ibm.com
Thu Sep 4 01:50:25 BST 2014


> Hello everybody,

Hi

> here i come here again, this time to ask some hint about how to monitor 
GPFS.
> 
> I know about mmpmon, but the issue with its "fs_io_s" and "io_s" is 
> that they return number based only on the request done in the 
> current host, so i have to run them on all the clients ( over 600 
> nodes) so its quite unpractical.  Instead i would like to know from 
> the servers whats going on, and i came across the vio_s statistics 
> wich are less documented and i dont know exacly what they mean. 
> There is also this script "/usr/lpp/mmfs/samples/vdisk/viostat" that
> runs VIO_S.
> 
> My problems with the output of this command:
>  echo "vio_s" | /usr/lpp/mmfs/bin/mmpmon -r 1
> 
> mmpmon> mmpmon node 10.7.28.2 name gss01a vio_s OK VIOPS per second
> timestamp:                          1409763206/477366
> recovery group:                     *
> declustered array:                  *
> vdisk:                              *
> client reads:                          2584229
> client short writes:                  55299693
> client medium writes:                   190071
> client promoted full track writes:      465145
> client full track writes:                 9249
> flushed update writes:                 4187708
> flushed promoted full track writes:        123
> migrate operations:                        114
> scrub operations:                       450590
> log writes:                           28509602
> 
> it sais "VIOPS per second", but they seem to me just counters as 
> every time i re-run the command, the numbers increase by a bit..  
> Can anyone confirm if those numbers are counter or if they are OPS/sec.

the numbers are accumulative so everytime you run them they just show the 
value since start (or last reset) time.

> 
> On a closer eye about i dont understand what most of thosevalues 
> mean. For example, what exacly are "flushed promoted full track write" 
?? 
> I tried to find a documentation about this output , but could not 
> find any. can anyone point me a link where output of vio_s is explained?
> 
> Another thing i dont understand about those numbers is if they are 
> just operations, or the number of blocks that was read/write/etc . 

its just operations and if i would explain what the numbers mean i might 
confuse you even more because this is not what you are really looking for. 

what you are looking for is what the client io's look like on the Server 
side, while the VIO layer is the Server side to the disks, so one lever 
lower than what you are looking for from what i could read out of the 
description above. 

so the Layer you care about is the NSD Server layer, which sits on top of 
the VIO layer (which is essentially the SW RAID Layer in GNR) 

> I'm asking that because if they are just ops, i don't know how much 
> they could be usefull. For example one write operation could eman 
> write 1 block or write a file of 100GB. If those are oprations, 
> there is a way to have the oupunt in bytes or blocks?

there are multiple ways to get infos on the NSD layer, one would be to use 
the dstat plugin (see /usr/lpp/mmfs/sample/util) but thats counts again. 

the alternative option is to use mmdiag --iohist. this shows you a history 
of the last X numbers of io operations on either the client or the server 
side like on a client : 

# mmdiag --iohist

=== mmdiag: iohist ===

I/O history:

 I/O start time RW    Buf type disk:sectorNum     nSec  time ms qTime ms   
 RpcTimes ms  Type  Device/NSD ID         NSD server
--------------- -- ----------- ----------------- -----  ------- -------- 
-----------------  ---- ------------------ ---------------
14:25:22.169617  R  LLIndBlock    1:1075622848      64   13.073    0.000 
12.959    0.063  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:22.182723  R       inode    1:1071252480       8    6.970    0.000  
6.908    0.038  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:53.659918  R  LLIndBlock    1:1081202176      64    8.309    0.000  
8.210    0.046  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:53.668262  R       inode    2:1081373696       8   14.117    0.000 
14.032    0.058  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:53.682750  R  LLIndBlock    1:1065508736      64    9.254    0.000  
9.180    0.038  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:53.692019  R       inode    2:1064356608       8   14.899    0.000 
14.847    0.029  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:53.707100  R       inode    2:1077830152       8   16.499    0.000 
16.449    0.025  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:53.723788  R  LLIndBlock    1:1081202432      64    4.280    0.000  
4.203    0.040  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:53.728082  R       inode    2:1081918976       8    7.760    0.000  
7.710    0.027  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:57.877416  R    metadata    2:678978560       16   13.343    0.000 
13.254    0.053  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:57.891048  R  LLIndBlock    1:1065508608      64   15.491    0.000 
15.401    0.058  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:57.906556  R       inode    2:1083476520       8   11.723    0.000 
11.676    0.029  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:57.918516  R  LLIndBlock    1:1075622720      64    8.062    0.000  
8.001    0.032  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:57.926592  R       inode    1:1076503480       8    8.087    0.000  
8.043    0.026  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:57.934856  R  LLIndBlock    1:1071088512      64    6.572    0.000  
6.510    0.033  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:57.941441  R       inode    2:1069885984       8   11.686    0.000 
11.641    0.024  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:57.953294  R       inode    2:1083476936       8    8.951    0.000  
8.912    0.021  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:57.965475  R       inode    1:1076503504       8    0.477    0.000  
0.053    0.000  cli   C0A70401:53BEEA7F     192.167.4.1
14:25:57.965755  R       inode    2:1083476488       8    0.410    0.000  
0.061    0.321  cli   C0A70402:53BEEA5E     192.167.4.2
14:25:57.965787  R       inode    2:1083476512       8    0.439    0.000  
0.053    0.342  cli   C0A70402:53BEEA5E     192.167.4.2

you basically see if its a inode , data block , what size it has (in 
sectors) , which nsd server you did send this request to, etc. 

on the Server side you see the type , which physical disk it goes to and 
also what size of disk i/o it causes like : 

14:26:50.129995  R       inode   12:3211886376      64   14.261    0.000  
0.000    0.000  pd   sdis
14:26:50.137102  R       inode   19:3003969520      64    9.004    0.000  
0.000    0.000  pd   sdad
14:26:50.136116  R       inode   55:3591710992      64   11.057    0.000  
0.000    0.000  pd   sdoh
14:26:50.141510  R       inode   21:3066810504      64    5.909    0.000  
0.000    0.000  pd   sdaf
14:26:50.130529  R       inode   89:2962370072      64   17.437    0.000  
0.000    0.000  pd   sddi
14:26:50.131063  R       inode   78:1889457000      64   17.062    0.000  
0.000    0.000  pd   sdsj
14:26:50.143403  R       inode   36:3323035688      64    4.807    0.000  
0.000    0.000  pd   sdmw
14:26:50.131044  R       inode   37:2513579736     128   17.181    0.000  
0.000    0.000  pd   sddv
14:26:50.138181  R       inode   72:3868810400      64   10.951    0.000  
0.000    0.000  pd   sdbz
14:26:50.138188  R       inode  131:2443484784     128   11.792    0.000  
0.000    0.000  pd   sdug
14:26:50.138003  R       inode  102:3696843872      64   11.994    0.000  
0.000    0.000  pd   sdgp
14:26:50.137099  R       inode  145:3370922504      64   13.225    0.000  
0.000    0.000  pd   sdmi
14:26:50.141576  R       inode   62:2668579904      64    9.313    0.000  
0.000    0.000  pd   sdou
14:26:50.134689  R       inode  159:2786164648      64   16.577    0.000  
0.000    0.000  pd   sdpq
14:26:50.145034  R       inode   34:2097217320      64    7.409    0.000  
0.000    0.000  pd   sdmt
14:26:50.138140  R       inode  139:2831038792      64   14.898    0.000  
0.000    0.000  pd   sdlw
14:26:50.130954  R       inode  164:282120312       64   22.274    0.000  
0.000    0.000  pd   sdzd
14:26:50.137038  R       inode   41:3421909608      64   16.314    0.000  
0.000    0.000  pd   sdef
14:26:50.137606  R       inode  104:1870962416      64   16.644    0.000  
0.000    0.000  pd   sdgx
14:26:50.141306  R       inode   65:2276184264      64   16.593    0.000  
0.000    0.000  pd   sdrk


> 
> Last but not least.. and this is what i really would like to 
> accomplish, i would to be able to monitor the latency of metadata 
operations. 

you can't do this on the server side as you don't know how much time you 
spend on the client , network or anything between the app and the physical 
disk, so you can only reliably look at this from the client, the iohist 
output only shows you the Server disk i/o processing time, but that can be 
a fraction of the overall time (in other cases this obviously can also be 
the dominant part depending on your workload).

the easiest way on the client is to run 

mmfsadm vfsstats enable
from now on vfs stats are collected until you restart GPFS. 

then run :

vfs statistics currently enabled
started at: Fri Aug 29 13:15:05.380 2014
  duration: 448446.970 sec

 name                    calls  time per call     total time
 -------------------- -------- -------------- --------------
 statfs                      9       0.000002       0.000021
 startIO              246191176       0.005853 1441049.976740

to dump what ever you collected so far on this node. 

> In my environment there are users that litterally overhelm our 
> storages with metadata request, so even if there is no massive 
> throughput or huge waiters, any "ls" could take ages. I would like 
> to be able to monitor metadata behaviour. There is a way to to do 
> that from the NSD servers?

not this simple as described above. 

> 
> Thanks in advance for any tip/help.
> 
> Regards,
> Salvatore_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140903/d9a18474/attachment.htm>


More information about the gpfsug-discuss mailing list