[gpfsug-discuss] Bizarre fcntl locking behavior

Aaron Knister aaron.s.knister at nasa.gov
Thu Dec 6 18:56:44 GMT 2018


Just for the sake of completeness, when the test program fails in the 
expected fashion this is the message it prints:

Opening file 'read' in /gpfs/aaronFS/testFile mode. stride = 1048576 
l_len = 262144
Non-zero return from fcntl. errno = 37 (No locks available)
Aborted

-Aaron

On 12/6/18 1:47 PM, Aaron Knister wrote:
> I've been trying to chase down an error one of our users periodically 
> sees with Intel MPI. The body of the error is this:
> 
> This requires fcntl(2) to be implemented. As of 8/25/2011 it is not. 
> Generic MPICH Message: File locking failed in ADIOI_Set_lock(fd F,cmd 
> F_SETLKW/7,type F_RDLCK/0,whence 0) with return value FFFFFFFF and errno 
> 25.
> - If the file system is NFS, you need to use NFS version 3, ensure that 
> the lockd daemon is running on all the machines, and mount the directory 
> with the 'noac' option (no attribute caching).
> - If the file system is LUSTRE, ensure that the directory is mounted 
> with the 'flock' option.
> ADIOI_Set_lock:: No locks available
> ADIOI_Set_lock:offset 0, length 8
> 
> When this happens, a new job is reading back-in the checkpoint files a 
> previous job wrote. Consistently it's the reading in of previously 
> written files that triggers this although the occurrence is sporadic and 
> if the job retries enough times the error will go away.
> 
> The really curious thing, is there is only one byte range lock per file 
> per-node open at any time, so the error 37 (I know it says 25 but that's 
> actually in hex even though it's not prefixed with 0x) of being out of 
> byte range locks is a little odd to me. The default is 200 but we should 
> be no way near that.
> 
> I've been trying to frantically chase this down with various MPI 
> reproducers but alas I came up short, until this morning, when I gave up 
> on the MPI approach and tried something a little more simple. I've 
> discovered that when:
> 
> - A file is opened by node A (a key requirement to reproduce seems to be 
> that node A is *also* the metanode for the file. I've not been able to 
> reproduce if node A is *not* the metanode)
> - Node A Acquires a bunch of write locks in the file
> - Node B then also acquires a bunch of write locks in the file
> - Node B then acquires a bunch of read locks in the file
> - Node A then also acquires a bunch of read locks in the file
> 
> At that last step, Node A will experience the errno 37 attempting to 
> acquire read locks.
> 
> Here are the actual commands to reproduce this (source code for 
> fcntl_stress.c is attached):
> 
> Node A: rm /gpfs/aaronFS/testFile; dd if=/dev/zero 
> of=/gpfs/aaronFS/testFile bs=1M count=4000
> Node A: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) 
> $((256*1024)) 1
> Node B: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) 
> $((256*1024)) 1
> Node B: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024))
> Node A: ./fcntl_stress /gpfs/aaronFS/testFile $((1024*1024)) $((256*1024))
> 
> Now that I've typed this out, I realize this really should be a PMR not 
> a post to the mailing list :) but I thought it was interesting and 
> wanted to share.
> 
> -Aaron
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list