[gpfsug-discuss] Write performances and filesystem size
Ivano Talamo
Ivano.Talamo at psi.ch
Thu Nov 16 13:51:51 GMT 2017
Hi,
as additional information I past the recovery group information in the
full and half size cases.
In both cases:
- data is on sf_g_01_vdisk01
- metadata on sf_g_01_vdisk02
- sf_g_01_vdisk07 is not used in the filesystem.
This is with the full-space filesystem:
declustered current allowable
recovery group arrays vdisks pdisks format version format
version
----------------- ----------- ------ ------ --------------
--------------
sf-g-01 3 6 86 4.2.2.0 4.2.2.0
declustered needs replace
scrub background activity
array service vdisks pdisks spares threshold free space
duration task progress priority
----------- ------- ------ ------ ------ --------- ----------
-------- -------------------------
NVR no 1 2 0,0 1 3632 MiB
14 days scrub 95% low
DA1 no 4 83 2,44 1 57 TiB
14 days scrub 0% low
SSD no 1 1 0,0 1 372 GiB
14 days scrub 79% low
declustered
checksum
vdisk RAID code array vdisk size block
size granularity state remarks
------------------ ------------------ ----------- ----------
---------- ----------- ----- -------
sf_g_01_logTip 2WayReplication NVR 48 MiB 2
MiB 4096 ok logTip
sf_g_01_logTipBackup Unreplicated SSD 48 MiB
2 MiB 4096 ok logTipBackup
sf_g_01_logHome 4WayReplication DA1 144 GiB 2
MiB 4096 ok log
sf_g_01_vdisk02 3WayReplication DA1 103 GiB 1
MiB 32 KiB ok
sf_g_01_vdisk07 3WayReplication DA1 103 GiB 1
MiB 32 KiB ok
sf_g_01_vdisk01 8+2p DA1 540 TiB 16
MiB 32 KiB ok
config data declustered array spare space remarks
------------------ ------------------ ------------- -------
rebuild space DA1 53 pdisk
increasing VCD spares is suggested
config data disk group fault tolerance remarks
------------------ --------------------------------- -------
rg descriptor 1 enclosure + 1 drawer + 2 pdisk limited by
rebuild space
system index 1 enclosure + 1 drawer + 2 pdisk limited by
rebuild space
vdisk disk group fault tolerance remarks
------------------ --------------------------------- -------
sf_g_01_logTip 1 pdisk
sf_g_01_logTipBackup 0 pdisk
sf_g_01_logHome 1 enclosure + 1 drawer + 1 pdisk limited by
rebuild space
sf_g_01_vdisk02 1 enclosure + 1 drawer limited by
rebuild space
sf_g_01_vdisk07 1 enclosure + 1 drawer limited by
rebuild space
sf_g_01_vdisk01 2 pdisk
This is with the half-space filesystem:
declustered current allowable
recovery group arrays vdisks pdisks format version format
version
----------------- ----------- ------ ------ --------------
--------------
sf-g-01 3 6 86 4.2.2.0 4.2.2.0
declustered needs replace
scrub background activity
array service vdisks pdisks spares threshold free space
duration task progress priority
----------- ------- ------ ------ ------ --------- ----------
-------- -------------------------
NVR no 1 2 0,0 1 3632 MiB
14 days scrub 4% low
DA1 no 4 83 2,44 1 395 TiB
14 days scrub 0% low
SSD no 1 1 0,0 1 372 GiB
14 days scrub 79% low
declustered
checksum
vdisk RAID code array vdisk size block
size granularity state remarks
------------------ ------------------ ----------- ----------
---------- ----------- ----- -------
sf_g_01_logTip 2WayReplication NVR 48 MiB 2
MiB 4096 ok logTip
sf_g_01_logTipBackup Unreplicated SSD 48 MiB
2 MiB 4096 ok logTipBackup
sf_g_01_logHome 4WayReplication DA1 144 GiB 2
MiB 4096 ok log
sf_g_01_vdisk02 3WayReplication DA1 103 GiB 1
MiB 32 KiB ok
sf_g_01_vdisk07 3WayReplication DA1 103 GiB 1
MiB 32 KiB ok
sf_g_01_vdisk01 8+2p DA1 270 TiB 16
MiB 32 KiB ok
config data declustered array spare space remarks
------------------ ------------------ ------------- -------
rebuild space DA1 68 pdisk
increasing VCD spares is suggested
config data disk group fault tolerance remarks
------------------ --------------------------------- -------
rg descriptor 1 node + 3 pdisk limited by
rebuild space
system index 1 node + 3 pdisk limited by
rebuild space
vdisk disk group fault tolerance remarks
------------------ --------------------------------- -------
sf_g_01_logTip 1 pdisk
sf_g_01_logTipBackup 0 pdisk
sf_g_01_logHome 1 node + 2 pdisk limited by
rebuild space
sf_g_01_vdisk02 1 node + 1 pdisk limited by
rebuild space
sf_g_01_vdisk07 1 node + 1 pdisk limited by
rebuild space
sf_g_01_vdisk01 2 pdisk
Thanks,
Ivano
Il 16/11/17 13:03, Olaf Weiser ha scritto:
> Rjx, that makes it a bit clearer.. as your vdisk is big enough to span
> over all pdisks in each of your test 1/1 or 1/2 or 1/4 of capacity...
> should bring the same performance. ..
>
> You mean something about vdisk Layout. ..
> So in your test, for the full capacity test, you use just one vdisk per
> RG - so 2 in total for 'data' - right?
>
> What about Md .. did you create separate vdisk for MD / what size then
> ?
>
> Gesendet von IBM Verse
>
> Ivano Talamo --- Re: [gpfsug-discuss] Write performances and filesystem
> size ---
>
> Von: "Ivano Talamo" <Ivano.Talamo at psi.ch>
> An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
> Datum: Do. 16.11.2017 03:49
> Betreff: Re: [gpfsug-discuss] Write performances and filesystem size
>
> ------------------------------------------------------------------------
>
> Hello Olaf,
>
> yes, I confirm that is the Lenovo version of the ESS GL2, so 2
> enclosures/4 drawers/166 disks in total.
>
> Each recovery group has one declustered array with all disks inside, so
> vdisks use all the physical ones, even in the case of a vdisk that is
> 1/4 of the total size.
>
> Regarding the layout allocation we used scatter.
>
> The tests were done on the just created filesystem, so no close-to-full
> effect. And we run gpfsperf write seq.
>
> Thanks,
> Ivano
>
>
> Il 16/11/17 04:42, Olaf Weiser ha scritto:
>> Sure... as long we assume that really all physical disk are used .. the
>> fact that was told 1/2 or 1/4 might turn out that one / two complet
>> enclosures 're eliminated ... ? ..that s why I was asking for more
>> details ..
>>
>> I dont see this degration in my environments. . as long the vdisks are
>> big enough to span over all pdisks ( which should be the case for
>> capacity in a range of TB ) ... the performance stays the same
>>
>> Gesendet von IBM Verse
>>
>> Jan-Frode Myklebust --- Re: [gpfsug-discuss] Write performances and
>> filesystem size ---
>>
>> Von: "Jan-Frode Myklebust" <janfrode at tanso.net>
>> An: "gpfsug main discussion list" <gpfsug-discuss at spectrumscale.org>
>> Datum: Mi. 15.11.2017 21:35
>> Betreff: Re: [gpfsug-discuss] Write performances and filesystem size
>>
>> ------------------------------------------------------------------------
>>
>> Olaf, this looks like a Lenovo «ESS GLxS» version. Should be using same
>> number of spindles for any size filesystem, so I would also expect them
>> to perform the same.
>>
>>
>>
>> -jf
>>
>>
>> ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <olaf.weiser at de.ibm.com
>> <mailto:olaf.weiser at de.ibm.com>>:
>>
>> to add a comment ... .. very simply... depending on how you
>> allocate the physical block storage .... if you - simply - using
>> less physical resources when reducing the capacity (in the same
>> ratio) .. you get , what you see....
>>
>> so you need to tell us, how you allocate your block-storage .. (Do
>> you using RAID controllers , where are your LUNs coming from, are
>> then less RAID groups involved, when reducing the capacity ?...)
>>
>> GPFS can be configured to give you pretty as much as what the
>> hardware can deliver.. if you reduce resource.. ... you'll get less
>> , if you enhance your hardware .. you get more... almost regardless
>> of the total capacity in #blocks ..
>>
>>
>>
>>
>>
>>
>> From: "Kumaran Rajaram" <kums at us.ibm.com
>> <mailto:kums at us.ibm.com>>
>> To: gpfsug main discussion list
>> <gpfsug-discuss at spectrumscale.org
>> <mailto:gpfsug-discuss at spectrumscale.org>>
>> Date: 11/15/2017 11:56 AM
>> Subject: Re: [gpfsug-discuss] Write performances and
>> filesystem size
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>>
> ------------------------------------------------------------------------
>>
>>
>>
>> Hi,
>>
>> >>Am I missing something? Is this an expected behaviour and someone
>> has an explanation for this?
>>
>> Based on your scenario, write degradation as the file-system is
>> populated is possible if you had formatted the file-system with "-j
>> cluster".
>>
>> For consistent file-system performance, we recommend *mmcrfs "-j
>> scatter" layoutMap.* Also, we need to ensure the mmcrfs "-n" is
>> set properly.
>>
>> [snip from mmcrfs]/
>> # mmlsfs <fs> | egrep 'Block allocation| Estimated number'
>> -j scatter Block allocation type
>> -n 128 Estimated number of
>> nodes that will mount file system/
>> [/snip]
>>
>>
>> [snip from man mmcrfs]/
>> *layoutMap={scatter|*//*cluster}*//
>> Specifies the block allocation map type. When
>> allocating blocks for a given file, GPFS first
>> uses a round‐robin algorithm to spread the data
>> across all disks in the storage pool. After a
>> disk is selected, the location of the data
>> block on the disk is determined by the block
>> allocation map type*. If cluster is
>> specified, GPFS attempts to allocate blocks in
>> clusters. Blocks that belong to a particular
>> file are kept adjacent to each other within
>> each cluster. If scatter is specified,
>> the location of the block is chosen randomly.*/
>> /
>> * The cluster allocation method may provide
>> better disk performance for some disk
>> subsystems in relatively small installations.
>> The benefits of clustered block allocation
>> diminish when the number of nodes in the
>> cluster or the number of disks in a file system
>> increases, or when the file system’s free space
>> becomes fragmented. *//The *cluster*//
>> allocation method is the default for GPFS
>> clusters with eight or fewer nodes and for file
>> systems with eight or fewer disks./
>> /
>> *The scatter allocation method provides
>> more consistent file system performance by
>> averaging out performance variations due to
>> block location (for many disk subsystems, the
>> location of the data relative to the disk edge
>> has a substantial effect on performance).*//This
>> allocation method is appropriate in most cases
>> and is the default for GPFS clusters with more
>> than eight nodes or file systems with more than
>> eight disks./
>> /
>> The block allocation map type cannot be changed
>> after the storage pool has been created./
>>
>> */
>> -n/*/*NumNodes*//
>> The estimated number of nodes that will mount the file
>> system in the local cluster and all remote clusters.
>> This is used as a best guess for the initial size of
>> some file system data structures. The default is 32.
>> This value can be changed after the file system has been
>> created but it does not change the existing data
>> structures. Only the newly created data structure is
>> affected by the new value. For example, new storage
>> pool./
>> /
>> When you create a GPFS file system, you might want to
>> overestimate the number of nodes that will mount the
>> file system. GPFS uses this information for creating
>> data structures that are essential for achieving maximum
>> parallelism in file system operations (For more
>> information, see GPFS architecture in IBM Spectrum
>> Scale: Concepts, Planning, and Installation Guide ). If
>> you are sure there will never be more than 64 nodes,
>> allow the default value to be applied. If you are
>> planning to add nodes to your system, you should specify
>> a number larger than the default./
>>
>> [/snip from man mmcrfs]
>>
>> Regards,
>> -Kums
>>
>>
>>
>>
>>
>> From: Ivano Talamo <Ivano.Talamo at psi.ch
>> <mailto:Ivano.Talamo at psi.ch>>
>> To: <gpfsug-discuss at spectrumscale.org
>> <mailto:gpfsug-discuss at spectrumscale.org>>
>> Date: 11/15/2017 11:25 AM
>> Subject: [gpfsug-discuss] Write performances and filesystem
> size
>> Sent by: gpfsug-discuss-bounces at spectrumscale.org
>> <mailto:gpfsug-discuss-bounces at spectrumscale.org>
>>
> ------------------------------------------------------------------------
>>
>>
>>
>> Hello everybody,
>>
>> together with my colleagues we are actually running some tests on
> a new
>> DSS G220 system and we see some unexpected behaviour.
>>
>> What we actually see is that write performances (we did not test read
>> yet) decreases with the decrease of filesystem size.
>>
>> I will not go into the details of the tests, but here are some
> numbers:
>>
>> - with a filesystem using the full 1.2 PB space we get 14 GB/s as the
>> sum of the disk activity on the two IO servers;
>> - with a filesystem using half of the space we get 10 GB/s;
>> - with a filesystem using 1/4 of the space we get 5 GB/s.
>>
>> We also saw that performances are not affected by the vdisks layout,
>> ie.
>> taking the full space with one big vdisk or 2 half-size vdisks per RG
>> gives the same performances.
>>
>> To our understanding the IO should be spread evenly across all the
>> pdisks in the declustered array, and looking at iostat all disks
>> seem to
>> be accessed. But so there must be some other element that affects
>> performances.
>>
>> Am I missing something? Is this an expected behaviour and someone
>> has an
>> explanation for this?
>>
>> Thank you,
>> Ivano
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>_
>>
> __https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=_
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
More information about the gpfsug-discuss
mailing list