[gpfsug-discuss] GPFS architecture choice: large servers or directly-attached clients?
mark.bergman at uphs.upenn.edu
mark.bergman at uphs.upenn.edu
Mon Mar 11 19:26:58 GMT 2013
I'm in the process of planning a new HPC cluster, and I'd appreciate getting
some feedback on different approaches to the GPFS architecture.
The cluster will have about 25~50 nodes initially (up to 1000 CPU-cores),
expected to grow to about 50~80 nodes.
The jobs are primarily independent, single-threaded, with a mixture of
small- to medium-sized IO, and a lot of random access. It is very common to
have 100s or 1000s of jobs on different cores and nodes each accessing
the same directories, often with an overlap of the same data files.
For example, many jobs on different nodes will use the same executable
and the same baseline data models, but will differ in individual data
files to compare to the model.
My goal is to ensure reasonable performance, particularly when there's a lot
of contention from multiple jobs accessing the same meta-data and some of the
same data.
My question here is in a choice between two GPFS archicture designs
(the storage array configurations, drive types, RAID types, etc. are
also being examined separately). I'd really like to hear any suggestions
about these (or other) configurations:
[1] Large GPFS servers
About 5 GPFS servers with significant RAM. Each GPFS server would
be connected to storage via an 8Gb/s fibre SAN (multiple paths)
to storage arrays.
Each GPFS server would provide NSDs via 10Gb/s and 1Gb/s (for legacy
servers) ethernet to GPFS clients (computational compute nodes).
Questions:
Since the GPFS clients would not be SAN attached
with direct access to block storage, and many
clients (~50) will access similar data (and the
same directories) for many jobs, it seems like it
would make sense to do a lot of caching on the
GPFS servers. Multiple clients would benefit by
reading from the same cached data on the servers.
I'm thinking of sizing caches to handle 1~2GB
per core in the compute nodes, divided by the
number of GPFS servers. This would mean caching
(maxFilesToCache, pagepool, maxStatCache) on the
GPFS servers of about 200GB+ on each GPFS server.
Is there any way to configure GPFS so that the
GPFS servers can do a large amount of caching
without requiring the same resources on the
GPFS clients?
Is there any way to configure the GPFS clients
so that their RAM can be used primarily for
computational jobs?
[2] Direct-attached GPFS clients
About 3~5 GPFS servers with modest resources (8CPU-cores, ~60GB RAM).
Each GPFS server and client (HPC compute node) would be directly
connected to the SAN (8Gb/s fibre, iSCSI over 10Gb/s ethernet,
FCoE over 10Gb/s ethernet).
Either 10Gb/s or 1Gb/s ethernet for communication between GPFS nodes.
Since this is a relatively small cluster in terms of the total
node count, the increased cost in terms of HBAs, switches, and
cabling for direct-connecting all nodes to the storage shouldn't
be excessive.
Ideas? Suggestions? Things I'm overlooking?
Thanks,
Mark
More information about the gpfsug-discuss
mailing list