[gpfsug-discuss] Write performances and filesystem size

Jan-Frode Myklebust janfrode at tanso.net
Thu Nov 16 02:34:57 GMT 2017


Olaf, this looks like a Lenovo «ESS GLxS» version. Should be using same
number of spindles for any size filesystem, so I would also expect them to
perform the same.



-jf


ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <olaf.weiser at de.ibm.com>:

>  to add a comment ...  .. very simply... depending on how you allocate the
> physical block storage .... if you - simply - using less physical resources
> when reducing the capacity (in the same ratio) .. you get , what you
> see....
>
> so you need to tell us, how you allocate your block-storage .. (Do you
> using RAID controllers , where are your LUNs coming from, are then less
> RAID groups involved, when reducing the capacity ?...)
>
> GPFS can be configured to give you pretty as much as what the hardware can
> deliver.. if you reduce resource.. ... you'll get less , if you enhance
> your hardware .. you get more... almost regardless of the total capacity in
> #blocks ..
>
>
>
>
>
>
> From:        "Kumaran Rajaram" <kums at us.ibm.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        11/15/2017 11:56 AM
> Subject:        Re: [gpfsug-discuss] Write performances and filesystem
> size
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi,
>
> >>Am I missing something? Is this an expected behaviour and someone has an
> explanation for this?
>
> Based on your scenario, write degradation as the file-system is populated
> is possible if you had formatted the file-system with "-j cluster".
>
> For consistent file-system performance, we recommend *mmcrfs "-j scatter"
> layoutMap.*   Also, we need to ensure the mmcrfs "-n"  is set properly.
>
> [snip from mmcrfs]
>
>
> *# mmlsfs <fs> | egrep 'Block allocation| Estimated number' -j
>     scatter                  Block allocation type -n                 128
>                     Estimated number of nodes that will mount file system*
> [/snip]
>
>
> [snip from man mmcrfs]
> * layoutMap={scatter|** cluster}*
>
>
>
>
>
>
>
>
>
>
>
> *                  Specifies the block allocation map type. When
>        allocating blocks for a given file, GPFS first                  uses
> a round‐robin algorithm to spread the data                  across all
> disks in the storage pool. After a                  disk is selected, the
> location of the data                  block on the disk is determined by
> the block                  allocation map type. If cluster is
>    specified, GPFS attempts to allocate blocks in
>  clusters. Blocks that belong to a particular                  file are
> kept adjacent to each other within                  each cluster. If
> scatter is specified,                  the location of the block is chosen
> randomly.*
>
>
>
>
>
>
>
>
> *                 The cluster allocation method may provide
>    better disk performance for some disk                  subsystems in
> relatively small installations.                  The benefits of clustered
> block allocation                  diminish when the number of nodes in the
>                  cluster or the number of disks in a file system
>        increases, or when the file system’s free space
>  becomes fragmented. **The cluster*
>
>
> *                  allocation method is the default for GPFS
>    clusters with eight or fewer nodes and for file                  systems
> with eight or fewer disks.*
>
>
>
>
>
>
> *                 The scatter allocation method provides
>  more consistent file system performance by                  averaging out
> performance variations due to                  block location (for many
> disk subsystems, the                  location of the data relative to the
> disk edge                  has a substantial effect on performance).*
>
>
>
> *This                  allocation method is appropriate in most cases
>              and is the default for GPFS clusters with more
>  than eight nodes or file systems with more than                  eight
> disks.*
>
>
> *                  The block allocation map type cannot be changed
>          after the storage pool has been created.*
>
>
> *-n** NumNodes*
>
>
>
>
>
>
>
>
> *         The estimated number of nodes that will mount the file
> system in the local cluster and all remote clusters.         This is used
> as a best guess for the initial size of         some file system data
> structures. The default is 32.         This value can be changed after the
> file system has been         created but it does not change the existing
> data         structures. Only the newly created data structure is
> affected by the new value. For example, new storage         pool.*
>
>
>
>
>
>
>
>
>
>
>
> *         When you create a GPFS file system, you might want to
> overestimate the number of nodes that will mount the         file system.
> GPFS uses this information for creating         data structures that are
> essential for achieving maximum         parallelism in file system
> operations (For more         information, see GPFS architecture in IBM
> Spectrum         Scale: Concepts, Planning, and Installation Guide ). If
>       you are sure there will never be more than 64 nodes,         allow
> the default value to be applied. If you are         planning to add nodes
> to your system, you should specify         a number larger than the
> default.*
>
> [/snip from man mmcrfs]
>
> Regards,
> -Kums
>
>
>
>
>
> From:        Ivano Talamo <Ivano.Talamo at psi.ch>
> To:        <gpfsug-discuss at spectrumscale.org>
> Date:        11/15/2017 11:25 AM
> Subject:        [gpfsug-discuss] Write performances and filesystem size
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hello everybody,
>
> together with my colleagues we are actually running some tests on a new
> DSS G220 system and we see some unexpected behaviour.
>
> What we actually see is that write performances (we did not test read
> yet) decreases with the decrease of filesystem size.
>
> I will not go into the details of the tests, but here are some numbers:
>
> - with a filesystem using the full 1.2 PB space we get 14 GB/s as the
> sum of the disk activity on the two IO servers;
> - with a filesystem using half of the space we get 10 GB/s;
> - with a filesystem using 1/4 of the space we get 5 GB/s.
>
> We also saw that performances are not affected by the vdisks layout, ie.
> taking the full space with one big vdisk or 2 half-size vdisks per RG
> gives the same performances.
>
> To our understanding the IO should be spread evenly across all the
> pdisks in the declustered array, and looking at iostat all disks seem to
> be accessed. But so there must be some other element that affects
> performances.
>
> Am I missing something? Is this an expected behaviour and someone has an
> explanation for this?
>
> Thank you,
> Ivano
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20171116/e3da3d99/attachment.html>


More information about the gpfsug-discuss mailing list