[gpfsug-discuss] mmrestripefs "No space left on device"

John Hanks griznog at gmail.com
Thu Nov 2 17:16:36 GMT 2017


We do have different amounts of space in the system pool which had the
changes applied:

[root at scg4-hn01 ~]# mmdf gsfs0 -P system
disk                disk size  failure holds    holds              free KB
           free KB
name                    in KB    group metadata data        in full blocks
      in fragments
--------------- ------------- -------- -------- ----- --------------------
-------------------
Disks in storage pool: system (Maximum disk size allowed is 3.6 TB)
VD000               377487360      100 Yes      No        143109120 ( 38%)
    35708688 ( 9%)
DMD_NSD_804         377487360      100 Yes      No         79526144 ( 21%)
     2924584 ( 1%)
VD002               377487360      100 Yes      No        143067136 ( 38%)
    35713888 ( 9%)
DMD_NSD_802         377487360      100 Yes      No         79570432 ( 21%)
     2926672 ( 1%)
VD004               377487360      100 Yes      No        143107584 ( 38%)
    35727776 ( 9%)
DMD_NSD_805         377487360      200 Yes      No         79555584 ( 21%)
     2940040 ( 1%)
VD001               377487360      200 Yes      No        142964992 ( 38%)
    35805384 ( 9%)
DMD_NSD_803         377487360      200 Yes      No         79580160 ( 21%)
     2919560 ( 1%)
VD003               377487360      200 Yes      No        143132672 ( 38%)
    35764200 ( 9%)
DMD_NSD_801         377487360      200 Yes      No         79550208 ( 21%)
     2915232 ( 1%)
                -------------                         --------------------
-------------------
(pool total)       3774873600                            1113164032 ( 29%)
   193346024 ( 5%)


and mmldisk shows that there is a problem with replication:

...
Number of quorum disks: 5
Read quorum value:      3
Write quorum value:     3
Attention: Due to an earlier configuration change the file system
is no longer properly replicated.


I thought a 'mmrestripe -r' would fix this, not that I have to fix it first
before restriping?

jbh


On Thu, Nov 2, 2017 at 9:45 AM, Frederick Stock <stockf at us.ibm.com> wrote:

> Assuming you are replicating data and metadata have you confirmed that all
> failure groups have the same free space?  That is could it be that one of
> your failure groups has less space than the others?  You can verify this
> with the output of mmdf and look at the NSD sizes and space available.
>
> Fred
> __________________________________________________
> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 <(720)%20430-8821>
> stockf at us.ibm.com
>
>
>
> From:        John Hanks <griznog at gmail.com>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        11/02/2017 12:20 PM
> Subject:        Re: [gpfsug-discuss] mmrestripefs "No space left on
> device"
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Addendum to last message:
>
> We haven't upgraded recently as far as I know (I just inherited this a
> couple of months ago.) but am planning an outage soon to upgrade from
> 4.2.0-4 to 4.2.3-5.
>
> My growing collection of output files generally contain something like
>
> This inode list was generated in the Parallel Inode Traverse on Thu Nov  2
> 08:34:22 2017
> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID
> MEMO(INODE_FLAGS FILE_TYPE [ERROR])
>  53506        0:0        0           1                 0
> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device
>
> With that inode varying slightly.
>
> jbh
>
> On Thu, Nov 2, 2017 at 8:55 AM, Scott Fadden <*sfadden at us.ibm.com*
> <sfadden at us.ibm.com>> wrote:
> Sorry just reread as I hit send and saw this was mmrestripe, in my case it
> was mmdeledisk.
>
> Did you try running the command on just one pool. Or using -B instead?
>
> What is the file it is complaining about in "/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711"
> ?
>
> Looks like it could be related to the maxfeaturelevel of the cluster. Have
> you recently upgraded? Is everything up to the same level?
>
> Scott Fadden
> Spectrum Scale - Technical Marketing
> Phone: *(503) 880-5833* <(503)%20880-5833>
> *sfadden at us.ibm.com* <sfadden at us.ibm.com>
> *http://www.ibm.com/systems/storage/spectrum/scale*
> <http://www.ibm.com/systems/storage/spectrum/scale>
>
>
> ----- Original message -----
> From: Scott Fadden/Portland/IBM
> To: *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>
> Cc: *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device"
> Date: Thu, Nov 2, 2017 8:44 AM
>
> I opened a defect on this the other day, in my case it was an incorrect
> error message. What it meant to say was,"The pool is not empty." Are you
> trying to remove the last disk in a pool? If so did you empty the pool with
> a MIGRATE policy first?
>
>
> Scott Fadden
> Spectrum Scale - Technical Marketing
> Phone: *(503) 880-5833* <(503)%20880-5833>
> *sfadden at us.ibm.com* <sfadden at us.ibm.com>
> *http://www.ibm.com/systems/storage/spectrum/scale*
> <http://www.ibm.com/systems/storage/spectrum/scale>
>
>
> ----- Original message -----
> From: John Hanks <*griznog at gmail.com* <griznog at gmail.com>>
> Sent by: *gpfsug-discuss-bounces at spectrumscale.org*
> <gpfsug-discuss-bounces at spectrumscale.org>
> To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
> <gpfsug-discuss at spectrumscale.org>>
> Cc:
> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device"
> Date: Thu, Nov 2, 2017 8:34 AM
>
> We have no snapshots ( they were the first to go when we initially hit the
> full metadata NSDs).
>
> I've increased quotas so that no filesets have hit a space quota.
>
> Verified that there are no inode quotas anywhere.
>
> mmdf shows the least amount of free space on any nsd to be 9% free.
>
> Still getting this error:
>
> [root at scg-gs0 ~]# mmrestripefs gsfs0 -r -N scg-gs0,scg-gs1,scg-gs2,scg-gs3
> Scanning file system metadata, phase 1 ...
> Scan completed successfully.
> Scanning file system metadata, phase 2 ...
> Scanning file system metadata for sas0 storage pool
> Scanning file system metadata for sata0 storage pool
> Scan completed successfully.
> Scanning file system metadata, phase 3 ...
> Scan completed successfully.
> Scanning file system metadata, phase 4 ...
> Scan completed successfully.
> Scanning user file metadata ...
> Error processing user file metadata.
> No space left on device
> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711' on
> scg-gs0 for inodes with broken disk addresses or failures.
> mmrestripefs: Command failed. Examine previous error messages to determine
> cause.
>
> I should note too that this fails almost immediately, far to quickly to
> fill up any location it could be trying to write to.
>
> jbh
>
> On Thu, Nov 2, 2017 at 7:57 AM, David Johnson <*david_johnson at brown.edu*
> <david_johnson at brown.edu>> wrote:
> One thing that may be relevant is if you have snapshots, depending on your
> release level,
> inodes in the snapshot may considered immutable, and will not be
> migrated.  Once the snapshots
> have been deleted, the inodes are freed up and you won’t see the (somewhat
> misleading) message
> about no space.
>
>  — ddj
> Dave Johnson
> Brown University
>
> On Nov 2, 2017, at 10:43 AM, John Hanks <*griznog at gmail.com*
> <griznog at gmail.com>> wrote:
> Thanks all for the suggestions.
>
> Having our metadata NSDs fill up was what prompted this exercise, but
> space was previously feed up on those by switching them from metadata+data
> to metadataOnly and using a policy to migrate files out of that pool. So
> these now have about 30% free space (more if you include fragmented space).
> The restripe attempt is just to make a final move of any remaining data off
> those devices. All the NSDs now have free space on them.
>
> df -i shows inode usage at about 84%, so plenty of free inodes for the
> filesystem as a whole.
>
> We did have old  .quota files laying around but removing them didn't have
> any impact.
>
> mmlsfileset fs -L -i is taking a while to complete, I'll let it simmer
> while getting to work.
>
> mmrepquota does show about a half-dozen filesets that have hit their quota
> for space (we don't set quotas on inodes). Once I'm settled in this morning
> I'll try giving them a little extra space and see what happens.
>
> jbh
>
>
> On Thu, Nov 2, 2017 at 4:19 AM, Oesterlin, Robert <
> *Robert.Oesterlin at nuance.com* <Robert.Oesterlin at nuance.com>> wrote:
> One thing that I’ve run into before is that on older file systems you had
> the “*.quota” files in the file system root. If you upgraded the file
> system to a newer version (so these files aren’t used) - There was a bug at
> one time where these didn’t get properly migrated during a restripe.
> Solution was to just remove them
>
>
>
>
>
> Bob Oesterlin
>
> Sr Principal Storage Engineer, Nuance
>
>
>
> *From: *<*gpfsug-discuss-bounces at spectrumscale.org*
> <gpfsug-discuss-bounces at spectrumscale.org>> on behalf of John Hanks <
> *griznog at gmail.com* <griznog at gmail.com>>
> *Reply-To: *gpfsug main discussion list <
> *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>>
> *Date: *Wednesday, November 1, 2017 at 5:55 PM
> *To: *gpfsug <*gpfsug-discuss at spectrumscale.org*
> <gpfsug-discuss at spectrumscale.org>>
> *Subject: *[EXTERNAL] [gpfsug-discuss] mmrestripefs "No space left on
> device"
>
>
>
> Hi all,
>
>
>
> I'm trying to do a restripe after setting some nsds to metadataOnly and I
> keep running into this error:
>
>
>
> Scanning user file metadata ...
>
>    0.01 % complete on Wed Nov  1 15:36:01 2017  (     40960 inodes with
> total     531689 MB data processed)
>
> Error processing user file metadata.
>
> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779708' on
> scg-gs0 for inodes with broken disk addresses or failures.
>
> mmrestripefs: Command failed. Examine previous error messages to determine
> cause.
>
>
>
> The file it points to says:
>
>
>
> This inode list was generated in the Parallel Inode Traverse on Wed Nov  1
> 15:36:06 2017
>
> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID
> MEMO(INODE_FLAGS FILE_TYPE [ERROR])
>
>  53504        0:0        0           1                 0
> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device
>
>
>
>
>
> /var on the node I am running this on has > 128 GB free, all the NSDs have
> plenty of free space, the filesystem being restriped has plenty of free
> space and if I watch the node while running this no filesystem on it even
> starts to get full. Could someone tell me where mmrestripefs is attempting
> to write and/or how to point it at a different location?
>
>
>
> Thanks,
>
>
>
> jbh
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=>
> *http://gpfsug.org/mailman*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=>
> /listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=>
> *http://gpfsug.org/mailman*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=>
> /listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=>
>
> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.
> org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_
> iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=
> uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_
> FF6spqHVpo_0joLY&e=
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171102/5ec444b1/attachment.htm>


More information about the gpfsug-discuss mailing list