[gpfsug-discuss] data integrity documentation

Stijn De Weirdt stijn.deweirdt at ugent.be
Wed Aug 2 20:20:14 BST 2017


hi sven,

the data is not corrupted. mmfsck compares 2 inodes, says they don't
match, but checking the data with tbdbfs reveals they are equal.
(one replica has to be fetched over the network; the nsds cannot access
all disks)

with some nsdChksum... settings we get during this mmfsck a lot of
"Encountered XYZ checksum errors on network I/O to NSD Client disk"

ibm support says these are hardware issues, but wrt to mmfsck false
positives.

anyway, our current question is: if these are hardware issues, is there
anything in gpfs client->nsd (on the network side) that would detect
such errors. ie can we trust the data (and metadata).
i was under the impression that client to disk is not covered, but i
assumed that at least client to nsd (the network part) was checksummed.

stijn


On 08/02/2017 09:10 PM, Sven Oehme wrote:
> ok, i think i understand now, the data was already corrupted. the config
> change i proposed only prevents a potentially known future on the wire
> corruption, this will not fix something that made it to the disk already.
> 
> Sven
> 
> 
> 
> On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
> wrote:
> 
>> yes ;)
>>
>> the system is in preproduction, so nothing that can't stopped/started in
>> a few minutes (current setup has only 4 nsds, and no clients).
>> mmfsck triggers the errors very early during inode replica compare.
>>
>>
>> stijn
>>
>> On 08/02/2017 08:47 PM, Sven Oehme wrote:
>>> How can you reproduce this so quick ?
>>> Did you restart all daemons after that ?
>>>
>>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
>>> wrote:
>>>
>>>> hi sven,
>>>>
>>>>
>>>>> the very first thing you should check is if you have this setting set :
>>>> maybe the very first thing to check should be the faq/wiki that has this
>>>> documented?
>>>>
>>>>>
>>>>> mmlsconfig envVar
>>>>>
>>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
>>>>> MLX5_USE_MUTEX 1
>>>>>
>>>>> if that doesn't come back the way above you need to set it :
>>>>>
>>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
>>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"
>>>> i just set this (wasn't set before), but problem is still present.
>>>>
>>>>>
>>>>> there was a problem in the Mellanox FW in various versions that was
>> never
>>>>> completely addressed (bugs where found and fixed, but it was never
>> fully
>>>>> proven to be addressed) the above environment variables turn code on in
>>>> the
>>>>> mellanox driver that prevents this potential code path from being used
>> to
>>>>> begin with.
>>>>>
>>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in
>> Scale
>>>>> that even you don't set this variables the problem can't happen anymore
>>>>> until then the only choice you have is the envVar above (which btw
>> ships
>>>> as
>>>>> default on all ESS systems).
>>>>>
>>>>> you also should be on the latest available Mellanox FW & Drivers as not
>>>> all
>>>>> versions even have the code that is activated by the environment
>>>> variables
>>>>> above, i think at a minimum you need to be at 3.4 but i don't remember
>>>> the
>>>>> exact version. There had been multiple defects opened around this area,
>>>> the
>>>>> last one i remember was  :
>>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from
>>>> dell, and the fw is a bit behind. i'm trying to convince dell to make
>>>> new one. mellanox used to allow to make your own, but they don't
>> anymore.
>>>>
>>>>>
>>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on
>>>> pthread_spin_lock
>>>>>
>>>>> you may ask your mellanox representative if they can get you access to
>>>> this
>>>>> defect. while it was found on ESS , means on PPC64 and with ConnectX-3
>>>>> cards its a general issue that affects all cards and on intel as well
>> as
>>>>> Power.
>>>> ok, thanks for this. maybe such a reference is enough for dell to update
>>>> their firmware.
>>>>
>>>> stijn
>>>>
>>>>>
>>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <
>> stijn.deweirdt at ugent.be>
>>>>> wrote:
>>>>>
>>>>>> hi all,
>>>>>>
>>>>>> is there any documentation wrt data integrity in spectrum scale:
>>>>>> assuming a crappy network, does gpfs garantee somehow that data
>> written
>>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from the
>>>>>> nsd gpfs daemon to disk.
>>>>>>
>>>>>> and wrt crappy network, what about rdma on crappy network? is it the
>>>> same?
>>>>>>
>>>>>> (we are hunting down a crappy infiniband issue; ibm support says it's
>>>>>> network issue; and we see no errors anywhere...)
>>>>>>
>>>>>> thanks a lot,
>>>>>>
>>>>>> stijn
>>>>>> _______________________________________________
>>>>>> gpfsug-discuss mailing list
>>>>>> gpfsug-discuss at spectrumscale.org
>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at spectrumscale.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at spectrumscale.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
> 
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 



More information about the gpfsug-discuss mailing list