[gpfsug-discuss] mmrestripefs "No space left on device"

John Hanks griznog at gmail.com
Thu Nov 2 18:18:27 GMT 2017


Yep, looks like Robert Oesterlin was right, it was the old quota files
causing the snag. Now sure how "mv *.quota" managed to move the group file
and not the user file, but I'll let that remain a mystery of the universe.
In any case I have a restripe running now and have learned a LOT about all
the bits in the process. Many thanks to everyone who replied, I learn
something from this list every time I get near it.

Thank you,

jbh

On Thu, Nov 2, 2017 at 11:14 AM, John Hanks <griznog at gmail.com> wrote:

> tsfindiconde tracked the file to user.quota, which somehow escaped my
> previous attempt to "mv *.quota /elsewhere/" I've moved that now and
> verified it is actually gone and will retry once the current restripe on
> the sata0 pool is wrapped up.
>
> jbh
>
> On Thu, Nov 2, 2017 at 10:57 AM, Frederick Stock <stockf at us.ibm.com>
> wrote:
>
>> Did you run the tsfindinode command to see where that file is located?
>> Also, what does the mmdf show for your other pools notably the sas0 storage
>> pool?
>>
>> Fred
>> __________________________________________________
>> Fred Stock | IBM Pittsburgh Lab | 720-430-8821 <(720)%20430-8821>
>> stockf at us.ibm.com
>>
>>
>>
>> From:        John Hanks <griznog at gmail.com>
>> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Date:        11/02/2017 01:17 PM
>> Subject:        Re: [gpfsug-discuss] mmrestripefs "No space left on
>> device"
>> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
>> ------------------------------
>>
>>
>>
>> We do have different amounts of space in the system pool which had the
>> changes applied:
>>
>> [root at scg4-hn01 ~]# mmdf gsfs0 -P system
>> disk                disk size  failure holds    holds              free
>> KB             free KB
>> name                    in KB    group metadata data        in full
>> blocks        in fragments
>> --------------- ------------- -------- -------- -----
>> -------------------- -------------------
>> Disks in storage pool: system (Maximum disk size allowed is 3.6 TB)
>> VD000               377487360      100 Yes      No        143109120 (
>> 38%)      35708688 ( 9%)
>> DMD_NSD_804         377487360      100 Yes      No         79526144 (
>> 21%)       2924584 ( 1%)
>> VD002               377487360      100 Yes      No        143067136 (
>> 38%)      35713888 ( 9%)
>> DMD_NSD_802         377487360      100 Yes      No         79570432 (
>> 21%)       2926672 ( 1%)
>> VD004               377487360      100 Yes      No        143107584 (
>> 38%)      35727776 ( 9%)
>> DMD_NSD_805         377487360      200 Yes      No         79555584 (
>> 21%)       2940040 ( 1%)
>> VD001               377487360      200 Yes      No        142964992 (
>> 38%)      35805384 ( 9%)
>> DMD_NSD_803         377487360      200 Yes      No         79580160 (
>> 21%)       2919560 ( 1%)
>> VD003               377487360      200 Yes      No        143132672 (
>> 38%)      35764200 ( 9%)
>> DMD_NSD_801         377487360      200 Yes      No         79550208 (
>> 21%)       2915232 ( 1%)
>>                 -------------
>>  -------------------- -------------------
>> (pool total)       3774873600                            1113164032 (
>> 29%)     193346024 ( 5%)
>>
>>
>> and mmldisk shows that there is a problem with replication:
>>
>> ...
>> Number of quorum disks: 5
>> Read quorum value:      3
>> Write quorum value:     3
>> Attention: Due to an earlier configuration change the file system
>> is no longer properly replicated.
>>
>>
>> I thought a 'mmrestripe -r' would fix this, not that I have to fix it
>> first before restriping?
>>
>> jbh
>>
>>
>> On Thu, Nov 2, 2017 at 9:45 AM, Frederick Stock <*stockf at us.ibm.com*
>> <stockf at us.ibm.com>> wrote:
>> Assuming you are replicating data and metadata have you confirmed that
>> all failure groups have the same free space?  That is could it be that one
>> of your failure groups has less space than the others?  You can verify this
>> with the output of mmdf and look at the NSD sizes and space available.
>>
>> Fred
>> __________________________________________________
>> Fred Stock | IBM Pittsburgh Lab | *720-430-8821* <(720)%20430-8821>
>> *stockf at us.ibm.com* <stockf at us.ibm.com>
>>
>>
>>
>> From:        John Hanks <*griznog at gmail.com* <griznog at gmail.com>>
>> To:        gpfsug main discussion list <
>> *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>>
>> Date:        11/02/2017 12:20 PM
>> Subject:        Re: [gpfsug-discuss] mmrestripefs "No space left on
>> device"
>> Sent by:        *gpfsug-discuss-bounces at spectrumscale.org*
>> <gpfsug-discuss-bounces at spectrumscale.org>
>> ------------------------------
>>
>>
>>
>> Addendum to last message:
>>
>> We haven't upgraded recently as far as I know (I just inherited this a
>> couple of months ago.) but am planning an outage soon to upgrade from
>> 4.2.0-4 to 4.2.3-5.
>>
>> My growing collection of output files generally contain something like
>>
>> This inode list was generated in the Parallel Inode Traverse on Thu Nov
>> 2 08:34:22 2017
>> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID
>> MEMO(INODE_FLAGS FILE_TYPE [ERROR])
>>  53506        0:0        0           1                 0
>> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device
>>
>> With that inode varying slightly.
>>
>> jbh
>>
>> On Thu, Nov 2, 2017 at 8:55 AM, Scott Fadden <*sfadden at us.ibm.com*
>> <sfadden at us.ibm.com>> wrote:
>> Sorry just reread as I hit send and saw this was mmrestripe, in my case
>> it was mmdeledisk.
>>
>> Did you try running the command on just one pool. Or using -B instead?
>>
>> What is the file it is complaining about in "/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711"
>> ?
>>
>> Looks like it could be related to the maxfeaturelevel of the cluster.
>> Have you recently upgraded? Is everything up to the same level?
>>
>> Scott Fadden
>> Spectrum Scale - Technical Marketing
>> Phone: *(503) 880-5833* <(503)%20880-5833>
>> *sfadden at us.ibm.com* <sfadden at us.ibm.com>
>> *http://www.ibm.com/systems/storage/spectrum/scale*
>> <http://www.ibm.com/systems/storage/spectrum/scale>
>>
>>
>> ----- Original message -----
>> From: Scott Fadden/Portland/IBM
>> To: *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>
>> Cc: *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device"
>> Date: Thu, Nov 2, 2017 8:44 AM
>>
>> I opened a defect on this the other day, in my case it was an incorrect
>> error message. What it meant to say was,"The pool is not empty." Are you
>> trying to remove the last disk in a pool? If so did you empty the pool with
>> a MIGRATE policy first?
>>
>>
>> Scott Fadden
>> Spectrum Scale - Technical Marketing
>> Phone: *(503) 880-5833* <(503)%20880-5833>
>> *sfadden at us.ibm.com* <sfadden at us.ibm.com>
>> *http://www.ibm.com/systems/storage/spectrum/scale*
>> <http://www.ibm.com/systems/storage/spectrum/scale>
>>
>>
>> ----- Original message -----
>> From: John Hanks <*griznog at gmail.com* <griznog at gmail.com>>
>> Sent by: *gpfsug-discuss-bounces at spectrumscale.org*
>> <gpfsug-discuss-bounces at spectrumscale.org>
>> To: gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
>> <gpfsug-discuss at spectrumscale.org>>
>> Cc:
>> Subject: Re: [gpfsug-discuss] mmrestripefs "No space left on device"
>> Date: Thu, Nov 2, 2017 8:34 AM
>>
>> We have no snapshots ( they were the first to go when we initially hit
>> the full metadata NSDs).
>>
>> I've increased quotas so that no filesets have hit a space quota.
>>
>> Verified that there are no inode quotas anywhere.
>>
>> mmdf shows the least amount of free space on any nsd to be 9% free.
>>
>> Still getting this error:
>>
>> [root at scg-gs0 ~]# mmrestripefs gsfs0 -r -N scg-gs0,scg-gs1,scg-gs2,scg-gs
>> 3
>> Scanning file system metadata, phase 1 ...
>> Scan completed successfully.
>> Scanning file system metadata, phase 2 ...
>> Scanning file system metadata for sas0 storage pool
>> Scanning file system metadata for sata0 storage pool
>> Scan completed successfully.
>> Scanning file system metadata, phase 3 ...
>> Scan completed successfully.
>> Scanning file system metadata, phase 4 ...
>> Scan completed successfully.
>> Scanning user file metadata ...
>> Error processing user file metadata.
>> No space left on device
>> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779711' on
>> scg-gs0 for inodes with broken disk addresses or failures.
>> mmrestripefs: Command failed. Examine previous error messages to
>> determine cause.
>>
>> I should note too that this fails almost immediately, far to quickly to
>> fill up any location it could be trying to write to.
>>
>> jbh
>>
>> On Thu, Nov 2, 2017 at 7:57 AM, David Johnson <*david_johnson at brown.edu*
>> <david_johnson at brown.edu>> wrote:
>> One thing that may be relevant is if you have snapshots, depending on
>> your release level,
>> inodes in the snapshot may considered immutable, and will not be
>> migrated.  Once the snapshots
>> have been deleted, the inodes are freed up and you won’t see the
>> (somewhat misleading) message
>> about no space.
>>
>>  — ddj
>> Dave Johnson
>> Brown University
>>
>> On Nov 2, 2017, at 10:43 AM, John Hanks <*griznog at gmail.com*
>> <griznog at gmail.com>> wrote:
>> Thanks all for the suggestions.
>>
>> Having our metadata NSDs fill up was what prompted this exercise, but
>> space was previously feed up on those by switching them from metadata+data
>> to metadataOnly and using a policy to migrate files out of that pool. So
>> these now have about 30% free space (more if you include fragmented space).
>> The restripe attempt is just to make a final move of any remaining data off
>> those devices. All the NSDs now have free space on them.
>>
>> df -i shows inode usage at about 84%, so plenty of free inodes for the
>> filesystem as a whole.
>>
>> We did have old  .quota files laying around but removing them didn't have
>> any impact.
>>
>> mmlsfileset fs -L -i is taking a while to complete, I'll let it simmer
>> while getting to work.
>>
>> mmrepquota does show about a half-dozen filesets that have hit their
>> quota for space (we don't set quotas on inodes). Once I'm settled in this
>> morning I'll try giving them a little extra space and see what happens.
>>
>> jbh
>>
>>
>> On Thu, Nov 2, 2017 at 4:19 AM, Oesterlin, Robert <
>> *Robert.Oesterlin at nuance.com* <Robert.Oesterlin at nuance.com>> wrote:
>> One thing that I’ve run into before is that on older file systems you had
>> the “*.quota” files in the file system root. If you upgraded the file
>> system to a newer version (so these files aren’t used) - There was a bug at
>> one time where these didn’t get properly migrated during a restripe.
>> Solution was to just remove them
>>
>>
>>
>>
>>
>> Bob Oesterlin
>>
>> Sr Principal Storage Engineer, Nuance
>>
>>
>>
>> *From: *<*gpfsug-discuss-bounces at spectrumscale.org*
>> <gpfsug-discuss-bounces at spectrumscale.org>> on behalf of John Hanks <
>> *griznog at gmail.com* <griznog at gmail.com>>
>> *Reply-To: *gpfsug main discussion list <
>> *gpfsug-discuss at spectrumscale.org* <gpfsug-discuss at spectrumscale.org>>
>> *Date: *Wednesday, November 1, 2017 at 5:55 PM
>> *To: *gpfsug <*gpfsug-discuss at spectrumscale.org*
>> <gpfsug-discuss at spectrumscale.org>>
>> *Subject: *[EXTERNAL] [gpfsug-discuss] mmrestripefs "No space left on
>> device"
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I'm trying to do a restripe after setting some nsds to metadataOnly and I
>> keep running into this error:
>>
>>
>>
>> Scanning user file metadata ...
>>
>>    0.01 % complete on Wed Nov  1 15:36:01 2017  (     40960 inodes with
>> total     531689 MB data processed)
>>
>> Error processing user file metadata.
>>
>> Check file '/var/mmfs/tmp/gsfs0.pit.interestingInodes.12888779708' on
>> scg-gs0 for inodes with broken disk addresses or failures.
>>
>> mmrestripefs: Command failed. Examine previous error messages to
>> determine cause.
>>
>>
>>
>> The file it points to says:
>>
>>
>>
>> This inode list was generated in the Parallel Inode Traverse on Wed Nov
>> 1 15:36:06 2017
>>
>> INODE_NUMBER DUMMY_INFO SNAPSHOT_ID ISGLOBAL_SNAPSHOT INDEPENDENT_FSETID
>> MEMO(INODE_FLAGS FILE_TYPE [ERROR])
>>
>>  53504        0:0        0           1                 0
>> illreplicated REGULAR_FILE RESERVED Error: 28 No space left on device
>>
>>
>>
>>
>>
>> /var on the node I am running this on has > 128 GB free, all the NSDs
>> have plenty of free space, the filesystem being restriped has plenty of
>> free space and if I watch the node while running this no filesystem on it
>> even starts to get full. Could someone tell me where mmrestripefs is
>> attempting to write and/or how to point it at a different location?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> jbh
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at *spectrumscale.org*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=>
>> *http://gpfsug.org/mailman*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=>
>> /listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at *spectrumscale.org*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=n5P1NWESV2GUb3EXICXGj62_QDAPfSAWVPz_i59CNKk&e=>
>> *http://gpfsug.org/mailman*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=zARWNuUgVecPk0qJwJdRIi0l_U9K7Z-xnnr5vNm1IZo&e=>
>> /listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at *spectrumscale.org*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=>
>>
>> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=WDtkF9zLTGGYqFnVnJ3rywZM6KHROA4FpMYi6cUkkKY&m=hKtOnoUDijNQoFnSlxQfek9m6h2qKbqjcCswbjHg2-E&s=j7eYU1VnwYXrTnflbJki13EfnMjqAro0RdCiLkVrgzE&e=>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at *spectrumscale.org*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=DGJeAf81dkJPqeCYJhjPiOUDTCAVRO-KEsvBx-HSzUM&e=>
>> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at *spectrumscale.org*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=WvredVor59NfZe-GxK5qa27t7_OT-zg1uOs__CSYmJM&e=>
>>
>> *https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=uLFESUsuxpmf07haYD3Sl-DpeYkm3t_r0WVV2AZ9Jk0&s=RGgSZEisfDpxsKl3PFUWh6DtzD_FF6spqHVpo_0joLY&e=>
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at *spectrumscale.org*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__spectrumscale.org&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=WvredVor59NfZe-GxK5qa27t7_OT-zg1uOs__CSYmJM&e=>
>> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=yDRpuvz3LOTwvP2pkIJEU7NWUxwMOcYHyXBRoWCPF-s&e=>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.o
>> rg_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObT
>> bx-siA1ZOg&r=p_1XEUyoJ7-VJxF_w8h9gJh8_Wj0Pey73LCLLoxodpw&m=
>> XPw1EyoosGN5bt3yLIT1JbUJ73B6iWH2gBaDJ2xHW8M&s=yDRpuvz3LOTwvP
>> 2pkIJEU7NWUxwMOcYHyXBRoWCPF-s&e=
>>
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171102/099d8767/attachment.htm>


More information about the gpfsug-discuss mailing list