[gpfsug-discuss] Bizarre fcntl locking behavior
Aaron Knister
aaron.s.knister at nasa.gov
Thu Dec 6 18:56:44 GMT 2018
Just for the sake of completeness, when the test program fails in the
expected fashion this is the message it prints:
Opening file 'read' in /gpfs/aaronFS/testFile mode. stride = 1048576
l_len = 262144
Non-zero return from fcntl. errno = 37 (No locks available)
Aborted
-Aaron
On 12/6/18 1:47 PM, Aaron Knister wrote:
> I've been trying to chase down an error one of our users periodically
> sees with Intel MPI. The body of the error is this:
>
> This requires fcntl(2) to be implemented. As of 8/25/2011 it is not.
> Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd F,cmd
> F_SETLKW/7,type F_RDLCK/0,whence 0) with return value FFFFFFFF and errno
> 25.
> - If the file system is NFS, you need to use NFS version 3, ensure that
> the lockd daemon is running on all the machines, and mount the directory
> with the 'noac' option (no attribute caching).
> - If the file system is LUSTRE, ensure that the directory is mounted
> with the 'flock' option.
> ADIOI_Set_lock:: No locks available
> ADIOI_Set_lock:offset 0, length 8
>
> When this happens, a new job is reading back-in the checkpoint files a
> previous job wrote. Consistently it's the reading in of previously
> written files that triggers this although the occurrence is sporadic and
> if the job retries enough times the error will go away.
>
> The really curious thing, is there is only one byte range lock per file
> per-node open at any time, so the error 37 (I know it says 25 but that's
> actually in hex even though it's not prefixed with 0x) of being out of
> byte range locks is a little odd to me. The default is 200 but we should
> be no way near that.
>
> I've been trying to frantically chase this down with various MPI
> reproducers but alas I came up short, until this morning, when I gave up
> on the MPI approach and tried something a little more simple. I've
> discovered that when:
>
> - A file is opened by node A (a key requirement to reproduce seems to be
> that node A is *also* the metanode for the file. I've not been able to
> reproduce if node A is *not* the metanode)
> - Node A Acquires a bunch of write locks in the file
> - Node B then also acquires a bunch of write locks in the file
> - Node B then acquires a bunch of read locks in the file
> - Node A then also acquires a bunch of read locks in the file
>
> At that last step, Node A will experience the errno 37 attempting to
> acquire read locks.
>
> Here are the actual commands to reproduce this (source code for
> fcntl_stress.c is attached):
>
> Node A: rm /gpfs/aaronFS/testFile; dd if=/dev/zero
> of=/gpfs/aaronFS/testFile bs=1M count=4000
> Node A: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024))
> $((256*1024)) 1
> Node B: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024))
> $((256*1024)) 1
> Node B: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024))
> Node A: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024))
>
> Now that I've typed this out, I realize this really should be a PMR not
> a post to the mailing list :) but I thought it was interesting and
> wanted to share.
>
> -Aaron
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list