[gpfsug-discuss] Use AFM for migration of many small files

Billich Heinrich Rainer (PSI) heiner.billich at psi.ch
Wed Sep 6 17:16:18 BST 2017


Hello Venkateswara, Edward,

Thank you for the comments on how to speed up AFM prefetch with small files. We run 4.2.2-3 and the AFM mode is RO and we have just a single gateway, i.e. no parallel reads for large files. We will try to increase the value of afmNumFlushThreads. It wasn’t clear to me that these threads do read from home, too - at least for prefetch. First I will try a plain NFS mount and see how parallel reads of many small files  scale the throughput. Next I will try AFM prefetch. I don’t do nice benchmarking, just watching dstat output. We prefetch 100’000 files in one bunch, so there is ample time to observe. 

The basic issue is that we get just about 45MB/s for sequential read of  many 1000 files with 1MB per file on the home cluster. I.e. we read one file at a time before we switch to the next. This is no surprise. Each read takes about 20ms to complete, so at max we get 50 reads of 1MB per second. We’ve seen this on classical raid storage and on DSS/ESS systems. It’s likely just the physics of spinning disks and the fact that we do one read at a time and don’t allow any parallelism. We wait for one or two I/Os to single disks to complete before we continue  With larger files prefetch jumps in and fires many reads in parallel … To get 1’000MB/s I need to do 1’000 read/s  and need to have ~20 reads in progress in parallel  all the time … we’ll see how close we get to 1’000MB/s with ‘many small files’.

Kind regards,

Heiner
--
Paul Scherrer Institut
Science IT
Heiner Billich
WHGA 106
CH 5232  Villigen PSI
056 310 36 02
https://www.psi.ch
 



More information about the gpfsug-discuss mailing list