[gpfsug-discuss] Lost disks

Uwe Falke UWEFALKE at de.ibm.com
Thu Jul 27 15:18:02 BST 2017


"Just doing something" makes things worse usually. Whether a 3rd party 
tool knows how to handle GPFS NSDs can be doubted (as long as it is not 
dedicated to that purpose). 

First, I'd look what is actually on the sectors where the NSD headers used 
to be, and try to find  whether data beyond that area were also modified 
(if the latter is the case, restoring the NSDs does not make much sense as 
data and/or metadata (depending on disk usage)  would also be corrupted. 
If you are sure that just the NSD header area has been affected, you might 
try to trick GPFS in getting just the information into the header area 
needed that GPFS recognises the devices as the NSDs they were. 

The first 4 kiB of a v1 NSD from a VM on my laptop look like 

$ cat nsdv1head | od --address-radix=x -xc
000000    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000200    cf70    4192    0000    0100    0000    3000    e930    a028
         p 317 222   A  \0  \0  \0 001  \0  \0  \0   0   0 351   ( 240
000210    a8c0    ce7a    a251    1f92    a251    1a92    0000    0800
       300 250   z 316   Q 242 222 037   Q 242 222 032  \0  \0  \0  \b
000220    0000    f20f    0000    0000    0000    0000    0000    0000
        \0  \0 017 362  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
000230    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000400    93d2    7885    0000    0100    0000    0002    141e    64a8
       322 223 205   x  \0  \0  \0 001  \0  \0 002  \0 036 024 250   d
000410    a8c0    ce7a    a251    3490    0000    fa0f    0000    0800
       300 250   z 316   Q 242 220   4  \0  \0 017 372  \0  \0  \0  \b
000420    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000480    534e    2044    6564    6373    6972    7470    726f    6620
         N   S   D       d   e   s   c   r   i   p   t   o   r       f
000490    726f    2f20    6564    2f76    6476    2062    7263    6165
         o   r       /   d   e   v   /   v   d   b       c   r   e   a
0004a0    6574    2064    7962    4720    4650    2053    6f4d    206e
         t   e   d       b   y       G   P   F   S       M   o   n 
0004b0    614d    2079    3732    3020    3a30    3434    303a    2034
         M   a   y       2   7       0   0   :   4   4   :   0   4 
0004c0    3032    3331    000a    0000    0000    0000    0000    0000
         2   0   1   3  \n  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0004d0    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
000e00    4c5f    4d56    0000    017d    0000    017d    0000    017d
         _   L   V   M  \0  \0   } 001  \0  \0   } 001  \0  \0   } 001
000e10    0000    017d    0000    0000    0000    0000    0000    0000
        \0  \0   } 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
000e20    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
000e30    0000    0000    0000    0000    0000    0000    017d    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   } 001  \0  \0
000e40    0000    0000    0000    0000    0000    0000    0000    0000
        \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
001000

I suppose, the important area starts at 0x0200 (ie. with the second 
512Byte sector) and ends at 0x04df (which would be within the 3rd 512Bytes 
sector, hence the 2nd and 3rd sectors appear crucial). I think that there 
is some more space before the  payload area starts.  Without knowledge 
what exactly has to go into the header, I'd try to create an NSD on one or 
two (new) disks, save the headers, then create an FS on them, save the 
headers again, check if anything has changed. 
So, creating some new NSDs, checking what keys might appear there and in 
the cluster configuration could get you very close to craft the header 
information which is gone. Of course, that depends on how dear the data on 
the gone FS AKA SG are and how hard it'd be to rebuild them otherwise 
(replay from backup, recalculate, ...) 

It seems not a bad idea to set aside the NSD headers of your NSDs  in a 
back up :-)
And also now: Before amending any blocks on your disks, save them!

 
Mit freundlichen Grüßen / Kind regards

 
Dr. Uwe Falke
 
IT Specialist
High Performance Computing Services / Integrated Technology Services / 
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: uwefalke at de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: 
Andreas Hasse, Thomas Wolter
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, 
HRB 17122 




From:   Jonathan Buzzard <jonathan.buzzard at strath.ac.uk>
To:     gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:   07/27/2017 01:59 PM
Subject:        Re: [gpfsug-discuss] Lost disks
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



On Thu, 2017-07-27 at 07:28 -0400, RICHARD RUPP wrote:
> If you are under IBM support, leverage IBM for help. A third party
> utility has the possibility of making it worse. 
> 

The chances of recovery are slim in the first place from this sort of
problem. At least with v1 NSD descriptors. Further IBM have *ALREADY*
told him the data is lost, I quote 

    But in their PMR they were told that all that data is lost now
    and that the disk headers didn?t appear as GPFS disk headers. 

So in this scenario you have little to loose trying something because
you are now on your own. Worst case scenario is that whatever you try
does not work, which leave you no worse of than you are now. Well apart
from lost time for the restore, but you might have started that already
to somewhere else.

I was once told by IBM (nine years ago now) that my GPFS file system was
caput and to arrange a restore from tape. At which point some fiddling
by myself fixed the problem and a 100TB restore was no longer required.
However this was not due to overwritten NSD descriptors. When that
happened the two file systems effected had to be restored. Well
bizarrely one was still mounted and I was able to rsync the data off. 

However the point is that at this stage fiddling with third party tools
is the only option left.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss







More information about the gpfsug-discuss mailing list