[gpfsug-discuss] Write performances and filesystem size

Olaf Weiser olaf.weiser at de.ibm.com
Thu Nov 16 12:03:16 GMT 2017


Rjx, that makes it a bit clearer.. as  your vdisk  is big enough to span over all pdisks  in each of your test 1/1 or 1/2 or 1/4  of capacity... should bring the same performance. .. 
 
You mean something about vdisk Layout. .. 
So in your test,  for the full capacity test, you use just one vdisk per RG - so 2 in total for 'data' - right? 
 
What about Md .. did you create separate vdisk for MD  / what size then ?    

Gesendet von IBM Verse


   Ivano Talamo --- Re: [gpfsug-discuss] Write performances and filesystem size --- 
    Von:"Ivano Talamo" <Ivano.Talamo at psi.ch>An:"gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>Datum:Do. 16.11.2017 03:49Betreff:Re: [gpfsug-discuss] Write performances and filesystem size
  
    Hello Olaf,yes, I confirm that is the Lenovo version of the ESS GL2, so 2 enclosures/4 drawers/166 disks in total.Each recovery group has one declustered array with all disks inside, so vdisks use all the physical ones, even in the case of a vdisk that is 1/4 of the total size.Regarding the layout allocation we used scatter.The tests were done on the just created filesystem, so no close-to-full effect. And we run gpfsperf write seq.Thanks,IvanoIl 16/11/17 04:42, Olaf Weiser ha scritto:> Sure... as long we assume that really all physical disk are used .. the> fact that  was told 1/2  or 1/4  might turn out that one / two complet> enclosures 're eliminated ... ?  ..that s why I was asking for  more> details ..>> I dont see this degration in my environments. . as long the vdisks are> big enough to span over all pdisks ( which should be the case for> capacity in a range of TB ) ... the performance stays the same>> Gesendet von IBM Verse>> Jan-Frode Myklebust --- Re: [gpfsug-discuss] Write performances and> filesystem size --->> Von:    "Jan-Frode Myklebust" <janfrode at tanso.net>> An:    "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>> Datum:    Mi. 15.11.2017 21:35> Betreff:    Re: [gpfsug-discuss] Write performances and filesystem size>> ------------------------------------------------------------------------>> Olaf, this looks like a Lenovo «ESS GLxS» version. Should be using same> number of spindles for any size filesystem, so I would also expect them> to perform the same.>>>> -jf>>> ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <olaf.weiser at de.ibm.com> <mailto:olaf.weiser at de.ibm.com>>:>>      to add a comment ...  .. very simply... depending on how you>     allocate the physical block storage .... if you - simply - using>     less physical resources when reducing the capacity (in the same>     ratio) .. you get , what you see....>>     so you need to tell us, how you allocate your block-storage .. (Do>     you using RAID controllers , where are your LUNs coming from, are>     then less RAID groups involved, when reducing the capacity ?...)>>     GPFS can be configured to give you pretty as much as what the>     hardware can deliver.. if you reduce resource.. ... you'll get less>     , if you enhance your hardware .. you get more... almost regardless>     of the total capacity in #blocks ..>>>>>>>     From:        "Kumaran Rajaram" <kums at us.ibm.com>     <mailto:kums at us.ibm.com>>>     To:        gpfsug main discussion list>     <gpfsug-discuss at spectrumscale.org>     <mailto:gpfsug-discuss at spectrumscale.org>>>     Date:        11/15/2017 11:56 AM>     Subject:        Re: [gpfsug-discuss] Write performances and>     filesystem size>     Sent by:        gpfsug-discuss-bounces at spectrumscale.org>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>>     ------------------------------------------------------------------------>>>>     Hi,>>     >>Am I missing something? Is this an expected behaviour and someone>     has an explanation for this?>>     Based on your scenario, write degradation as the file-system is>     populated is possible if you had formatted the file-system with "-j>     cluster".>>     For consistent file-system performance, we recommend *mmcrfs "-j>     scatter" layoutMap.*   Also, we need to ensure the mmcrfs "-n"  is>     set properly.>>     [snip from mmcrfs]/>     # mmlsfs <fs> | egrep 'Block allocation| Estimated number'>     -j                 scatter                  Block allocation type>     -n                 128                       Estimated number of>     nodes that will mount file system/>     [/snip]>>>     [snip from man mmcrfs]/>     *layoutMap={scatter|*//*cluster}*//>                      Specifies the block allocation map type. When>                      allocating blocks for a given file, GPFS first>                      uses a round‐robin algorithm to spread the data>                      across all disks in the storage pool. After a>                      disk is selected, the location of the data>                      block on the disk is determined by the block>                      allocation map type*. If cluster is>                      specified, GPFS attempts to allocate blocks in>                      clusters. Blocks that belong to a particular>                      file are kept adjacent to each other within>                      each cluster. If scatter is specified,>                      the location of the block is chosen randomly.*/>     />                  *  The cluster allocation method may provide>                      better disk performance for some disk>                      subsystems in relatively small installations.>                      The benefits of clustered block allocation>                      diminish when the number of nodes in the>                      cluster or the number of disks in a file system>                      increases, or when the file system’s free space>                      becomes fragmented. *//The *cluster*//>                      allocation method is the default for GPFS>                      clusters with eight or fewer nodes and for file>                      systems with eight or fewer disks./>     />                     *The scatter allocation method provides>                      more consistent file system performance by>                      averaging out performance variations due to>                      block location (for many disk subsystems, the>                      location of the data relative to the disk edge>                      has a substantial effect on performance).*//This>                      allocation method is appropriate in most cases>                      and is the default for GPFS clusters with more>                      than eight nodes or file systems with more than>                      eight disks./>     />                      The block allocation map type cannot be changed>                      after the storage pool has been created./>>     */>     -n/*/*NumNodes*//>             The estimated number of nodes that will mount the file>             system in the local cluster and all remote clusters.>             This is used as a best guess for the initial size of>             some file system data structures. The default is 32.>             This value can be changed after the file system has been>             created but it does not change the existing data>             structures. Only the newly created data structure is>             affected by the new value. For example, new storage>             pool./>     />             When you create a GPFS file system, you might want to>             overestimate the number of nodes that will mount the>             file system. GPFS uses this information for creating>             data structures that are essential for achieving maximum>             parallelism in file system operations (For more>             information, see GPFS architecture in IBM Spectrum>             Scale: Concepts, Planning, and Installation Guide ). If>             you are sure there will never be more than 64 nodes,>             allow the default value to be applied. If you are>             planning to add nodes to your system, you should specify>             a number larger than the default./>>     [/snip from man mmcrfs]>>     Regards,>     -Kums>>>>>>     From:        Ivano Talamo <Ivano.Talamo at psi.ch>     <mailto:Ivano.Talamo at psi.ch>>>     To:        <gpfsug-discuss at spectrumscale.org>     <mailto:gpfsug-discuss at spectrumscale.org>>>     Date:        11/15/2017 11:25 AM>     Subject:        [gpfsug-discuss] Write performances and filesystem size>     Sent by:        gpfsug-discuss-bounces at spectrumscale.org>     <mailto:gpfsug-discuss-bounces at spectrumscale.org>>     ------------------------------------------------------------------------>>>>     Hello everybody,>>     together with my colleagues we are actually running some tests on a new>     DSS G220 system and we see some unexpected behaviour.>>     What we actually see is that write performances (we did not test read>     yet) decreases with the decrease of filesystem size.>>     I will not go into the details of the tests, but here are some numbers:>>     - with a filesystem using the full 1.2 PB space we get 14 GB/s as the>     sum of the disk activity on the two IO servers;>     - with a filesystem using half of the space we get 10 GB/s;>     - with a filesystem using 1/4 of the space we get 5 GB/s.>>     We also saw that performances are not affected by the vdisks layout,>     ie.>     taking the full space with one big vdisk or 2 half-size vdisks per RG>     gives the same performances.>>     To our understanding the IO should be spread evenly across all the>     pdisks in the declustered array, and looking at iostat all disks>     seem to>     be accessed. But so there must be some other element that affects>     performances.>>     Am I missing something? Is this an expected behaviour and someone>     has an>     explanation for this?>>     Thank you,>     Ivano>     _______________________________________________>     gpfsug-discuss mailing list>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>_>     __https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=_>>>     _______________________________________________>     gpfsug-discuss mailing list>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss>>>>     _______________________________________________>     gpfsug-discuss mailing list>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss>>>>> _______________________________________________> gpfsug-discuss mailing list> gpfsug-discuss at spectrumscale.org> http://gpfsug.org/mailman/listinfo/gpfsug-discuss>_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171116/159a067c/attachment.htm>


More information about the gpfsug-discuss mailing list