[gpfsug-discuss] Preferred NSD

Wed Mar 14 15:42:37 GMT 2018

I agree. We have a small Gluster filesystem we use to perform failover of
our job scheduler, but it predates our use of GPFS. We've run into a number
of strange failures and "soft failures" (i.e. filesystem admin tools don't
work but the filesystem is available), and the logging is much more cryptic
and jumbled than mmfs.log. We'll soon be retiring it in favor of GPFS.

On Wed, Mar 14, 2018 at 11:28:53AM -0400, Aaron Knister wrote:
> I don't want to start a religious filesystem war, but I'd give pause to
> GlusterFS based on a number of operational issues I've personally
> experienced and seen others experience with it.
> 
> I'm curious how glusterfs would resolve the issue here of multiple clients
> failing simultaneously (unless you're talking about using disperse volumes)?
> That does, actually, bring up an interesting question to IBM which is --
> when will mestor see the light of day? This is admittedly something other
> filesystems can do that GPFS cannot.
> 
> -Aaron
> 
> On 3/14/18 6:57 AM, Michal Zacek wrote:
> > Hi,
> > 
> > I don't think the GPFS is good choice for your setup. Did you consider
> > GlusterFS? It's used at Max Planck Institute at Dresden for HPC
> > computing of  Molecular Biology data. They have similar setup,  tens
> > (hundreds) of computers with shared local storage in glusterfs. But you
> > will need 10Gb network.
> > 
> > Michal
> > 
> > 
> > Dne 12.3.2018 v 16:23 Lukas Hejtmanek napsal(a):
> > > On Mon, Mar 12, 2018 at 11:18:40AM -0400, valdis.kletnieks at vt.edu wrote:
> > > > On Mon, 12 Mar 2018 15:51:05 +0100, Lukas Hejtmanek said:
> > > > > I don't think like 5 or more data/metadata replicas are practical here. On the
> > > > > other hand, multiple node failures is something really expected.
> > > > Umm.. do I want to ask *why*, out of only 60 nodes, multiple node
> > > > failures are an expected event - to the point that you're thinking
> > > > about needing 5 replicas to keep things running?
> > > as of my experience with cluster management, we have multiple nodes down on
> > > regular basis. (HW failure, SW maintenance and so on.)
> > > 
> > > I'm basically thinking that 2-3 replicas might not be enough while 5 or more
> > > are becoming too expensive (both disk space and required bandwidth being
> > > scratch space - high i/o load expected).
> > > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> 
> -- 
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine