[gpfsug-discuss] Mitigating Poor Small-file I/O Performance
Sven Oehme
oehmes at us.ibm.com
Wed May 21 22:42:51 BST 2014
Hi Alex, Stewart,
the problem is more likely a latency and lack of parallelism of the
transfer.
if you have 1 thread that transfers files of 1k in size you don't get a
lot of bandwidth as it transfers one at a time and the latency kills the
bandwidth. to explain this on an example :
assume your network between client and server is 10 gigabit and both nodes
are capable of pushing this.
if it takes 1 ms for reading + writing each 1 kb file you get ~ 1.02
MB/sec
if your filesize changes to 4k, even you don't do anything else it goes up
to 4.06 MB/sec
if you can reduce latency on read or write to lets say 100us the same
process would transfer 37.86 MB/sec
so this is a latency / parallelism problem and this has nothing to do with
GPFS, you would have exactly the same issue.
the solution is to copy the data with a tool that does it multi threaded
as the numbers above are based on 1 thread.
if you would have 100us read+write time and transfer 4k files with 10
threads in parallel the same transfer would be close to 400 MB/sec.
hope that helps. Sven
------------------------------------------
Sven Oehme
Scalable Storage Research
email: oehmes at us.ibm.com
Phone: +1 (408) 824-8904
IBM Almaden Research Lab
------------------------------------------
From: Alex Chekholko <chekh at stanford.edu>
To: gpfsug-discuss at gpfsug.org
Date: 05/21/2014 01:05 PM
Subject: Re: [gpfsug-discuss] Mitigating Poor Small-file I/O
Performance
Sent by: gpfsug-discuss-bounces at gpfsug.org
Hi Stewart,
First, a good simple reproducible benchmark is fdtree:
https://computing.llnl.gov/?set=code&page=sio_downloads
Something simple like this should take a min or two:
bash fdtree.bash -l 3 -s 64
Or the same exact small run can take up to hours on a slow system.
For GPFS, since it's a clustered filesystem, first you need to make sure
you're looking at the aggregate performance and not just on one client.
Perhaps your filesystem is performing great, but it's maxed out at
that moment when you run your test from your single client. So you need
to be able to monitor the disk system.
In general, the answer to your question is, in order of simplicity: add
more spindles, possibly also separate the metadata out to separate
storage, possibly make your filesystem block size smaller.
The first you can do by adding more hardware, the second is easier when
you design your whole system, though possible to do on a running
filesystem. The third can only be done at filesystem creation.
For "small files", how "small" is "small". I guess generally we mean
smaller than filesystem block size.
Regards,
Alex
On 5/20/14, 7:17 AM, Howard, Stewart Jameson wrote:
> Hi All,
>
> My name is Stewart Howard and I work for Indiana University as an admin
> on a two-site replicated GPFS cluster. I'm a new member of this mailing
> list and this is my first post :)
>
> Recently, we've discovered that small-file performance on our system is
> pretty lack-luster. For comparison, here are some numbers:
>
> 1) When transferring large files (~2 GB), we get outstanding
> performance and can typically saturate the client's network connection.
> We generally see about 490 MB/s over a 10Gb line, which should be about
> right, given that we lose half of our bandwidth to replication.
>
> 2) When transferring a large number of small files, we get a very poor
> transfer rate, generally on the order of 2 MB/s, writing from a client
> node *inside* the GPFS cluster.
>
> I'm wondering if anyone else has experience with similar performance
> issues and what ended up being the cause/solution. Also, I would be
> interested in hearing any general rules-of-thumb that the group has
> found helpful in balancing performance between large-file and small-file
> I/O.
>
> We have gathered some diagnostic information while performing various
> small-file I/O operations, as well as a variety of metadata operations
> in quick succession. I'd be happy to share results of the diagnostics,
> if it would help provide context.
>
> Thank you so much for all of your help!
>
> Stewart Howard
> Indiana University
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
chekh at stanford.edu 347-401-4860
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140521/08d101d4/attachment.htm>
More information about the gpfsug-discuss
mailing list