[gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Giovanni Bracco giovanni.bracco at enea.it
Fri Jun 5 14:53:23 BST 2020


answer in the text

On 05/06/20 14:58, Jan-Frode Myklebust wrote:
> 
> Could maybe be interesting to drop the NSD servers, and let all nodes 
> access the storage via srp ?

no we can not: the production clusters fabric is a mix of a QDR based 
cluster and a OPA based cluster and NSD nodes provide the service to both.

> 
> Maybe turn off readahead, since it can cause performance degradation 
> when GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead 
> always reads too much. This might be the cause of the slow read seen — 
> maybe you’ll also overflow it if reading from both NSD-servers at the 
> same time?

I have switched the readahead off and this produced a small (~10%) 
increase of performances when reading from a NSD server, but no change 
in the bad behaviour for the GPFS clients

> 
> 
> Plus.. it’s always nice to give a bit more pagepool to hhe clients than 
> the default.. I would prefer to start with 4 GB.

we'll do also that and we'll let you know!

Giovanni

> 
> 
> 
>    -jf
> 
> fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco 
> <giovanni.bracco at enea.it <mailto:giovanni.bracco at enea.it>>:
> 
>     In our lab we have received two storage-servers, Super micro
>     SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
>     controller (2 GB cache) and before putting them in production for other
>     purposes we have setup a small GPFS test cluster to verify if they can
>     be used as storage (our gpfs production cluster has the licenses based
>     on the NSD sockets, so it would be interesting to expand the storage
>     size just by adding storage-servers in a infiniband based SAN, without
>     changing the number of NSD servers)
> 
>     The test cluster consists of:
> 
>     1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale
>     each.
>     2) a Mellanox FDR switch used as a SAN switch
>     3) a Truescale QDR switch as GPFS cluster switch
>     4) two GPFS clients (Supermicro AMD nodes) one port QDR each.
> 
>     All the nodes run CentOS 7.7.
> 
>     On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
>     configured and it is exported via infiniband as an iSCSI target so that
>     both appear as devices accessed by the srp_daemon on the NSD servers,
>     where multipath (not really necessary in this case) has been configured
>     for these two LIO-ORG devices.
> 
>     GPFS version 5.0.4-0 has been installed and the RDMA has been properly
>     configured
> 
>     Two NSD disk have been created and a GPFS file system has been
>     configured.
> 
>     Very simple tests have been performed using lmdd serial write/read.
> 
>     1) storage-server local performance: before configuring the RAID6
>     volume
>     as NSD disk, a local xfs file system was created and lmdd write/read
>     performance for 100 GB file was verified to be about 1 GB/s
> 
>     2) once the GPFS cluster has been created write/read test have been
>     performed directly from one of the NSD server at a time:
> 
>     write performance 2 GB/s, read performance 1 GB/s for 100 GB file
> 
>     By checking with iostat, it was observed that the I/O in this case
>     involved only the NSD server where the test was performed, so when
>     writing, the double of base performances was obtained,  while in
>     reading
>     the same performance as on a local file system, this seems correct.
>     Values are stable when the test is repeated.
> 
>     3) when the same test is performed from the GPFS clients the lmdd
>     result
>     for a 100 GB file are:
> 
>     write - 900 MB/s and stable, not too bad but half of what is seen from
>     the NSD servers.
> 
>     read - 30 MB/s to 300 MB/s: very low and unstable values
> 
>     No tuning of any kind in all the configuration of the involved system,
>     only default values.
> 
>     Any suggestion to explain the very bad  read performance from a GPFS
>     client?
> 
>     Giovanni
> 
>     here are the configuration of the virtual drive on the storage-server
>     and the file system configuration in GPFS
> 
> 
>     Virtual drive
>     ==============
> 
>     Virtual Drive: 2 (Target Id: 2)
>     Name                :
>     RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
>     Size                : 81.856 TB
>     Sector Size         : 512
>     Is VD emulated      : Yes
>     Parity Size         : 18.190 TB
>     State               : Optimal
>     Strip Size          : 256 KB
>     Number Of Drives    : 11
>     Span Depth          : 1
>     Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
>     Bad BBU
>     Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
>     Bad BBU
>     Default Access Policy: Read/Write
>     Current Access Policy: Read/Write
>     Disk Cache Policy   : Disabled
> 
> 
>     GPFS file system from mmlsfs
>     ============================
> 
>     mmlsfs vsd_gexp2
>     flag                value                    description
>     ------------------- ------------------------
>     -----------------------------------
>        -f                 8192                     Minimum fragment
>     (subblock) size in bytes
>        -i                 4096                     Inode size in bytes
>        -I                 32768                    Indirect block size
>     in bytes
>        -m                 1                        Default number of
>     metadata
>     replicas
>        -M                 2                        Maximum number of
>     metadata
>     replicas
>        -r                 1                        Default number of data
>     replicas
>        -R                 2                        Maximum number of data
>     replicas
>        -j                 cluster                  Block allocation type
>        -D                 nfs4                     File locking
>     semantics in
>     effect
>        -k                 all                      ACL semantics in effect
>        -n                 512                      Estimated number of
>     nodes
>     that will mount file system
>        -B                 1048576                  Block size
>        -Q                 user;group;fileset       Quotas accounting enabled
>                           user;group;fileset       Quotas enforced
>                           none                     Default quotas enabled
>        --perfileset-quota No                       Per-fileset quota
>     enforcement
>        --filesetdf        No                       Fileset df enabled?
>        -V                 22.00 (5.0.4.0)          File system version
>        --create-time      Fri Apr  3 19:26:27 2020 File system creation time
>        -z                 No                       Is DMAPI enabled?
>        -L                 33554432                 Logfile size
>        -E                 Yes                      Exact mtime mount option
>        -S                 relatime                 Suppress atime mount
>     option
>        -K                 whenpossible             Strict replica
>     allocation
>     option
>        --fastea           Yes                      Fast external attributes
>     enabled?
>        --encryption       No                       Encryption enabled?
>        --inode-limit      134217728                Maximum number of inodes
>        --log-replicas     0                        Number of log replicas
>        --is4KAligned      Yes                      is4KAligned?
>        --rapid-repair     Yes                      rapidRepair enabled?
>        --write-cache-threshold 0                   HAWC Threshold (max
>     65536)
>        --subblocks-per-full-block 128              Number of subblocks per
>     full block
>        -P                 system                   Disk storage pools in
>     file
>     system
>        --file-audit-log   No                       File Audit Logging
>     enabled?
>        --maintenance-mode No                       Maintenance Mode enabled?
>        -d                 nsdfs4lun2;nsdfs5lun2    Disks in file system
>        -A                 yes                      Automatic mount option
>        -o                 none                     Additional mount options
>        -T                 /gexp2                   Default mount point
>        --mount-priority   0                        Mount priority
> 
> 
>     -- 
>     Giovanni Bracco
>     phone  +39 351 8804788
>     E-mail giovanni.bracco at enea.it <mailto:giovanni.bracco at enea.it>
>     WWW http://www.afs.enea.it/bracco
> 
> 
>     ==================================================
> 
>     Questo messaggio e i suoi allegati sono indirizzati esclusivamente
>     alle persone indicate e la casella di posta elettronica da cui e'
>     stata inviata e' da qualificarsi quale strumento aziendale.
>     La diffusione, copia o qualsiasi altra azione derivante dalla
>     conoscenza di queste informazioni sono rigorosamente vietate (art.
>     616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
>     Qualora abbiate ricevuto questo documento per errore siete
>     cortesemente pregati di darne immediata comunicazione al mittente e
>     di provvedere alla sua distruzione. Grazie.
> 
>     This e-mail and any attachments is confidential and may contain
>     privileged information intended for the addressee(s) only.
>     Dissemination, copying, printing or use by anybody else is
>     unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent
>     amendments and GDPR UE 2016/679).
>     If you are not the intended recipient, please delete this message
>     and any attachments and advise the sender by return e-mail. Thanks.
> 
>     ==================================================
> 
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco at enea.it
WWW http://www.afs.enea.it/bracco



More information about the gpfsug-discuss mailing list