[gpfsug-discuss] mmfsd write behavior

Sven Oehme oehmes at gmail.com
Mon Oct 9 22:07:10 BST 2017


Hi,

yeah sorry i intended to reply back before my vacation and forgot about it
the the vacation flushed it all away :-D
so right now the assumption in Scale/GPFS is that the underlying storage
doesn't have any form of enabled volatile write cache. the problem seems to
be that even if we set REQ_FUA some stacks or devices may not have
implemented that at all or correctly, so even we would set it there is no
guarantee that it will do what you think it does. the benefit of adding the
flag at least would allow us to blame everything on the underlying
stack/device , but i am not sure that will make somebody happy if bad
things happen, therefore the requirement of a non-volatile device will
still be required at all times underneath Scale.
so if you think we should do this, please open a PMR with the details of
your test so it can go its regular support path. you can mention me in the
PMR as a reference as we already looked at the places the request would
have to be added.

Sven


On Mon, Oct 9, 2017 at 1:47 PM Aaron Knister <aaron.s.knister at nasa.gov>
wrote:

> Hi Sven,
>
> Just wondering if you've had any additional thoughts/conversations about
> this.
>
> -Aaron
>
> On 9/8/17 5:21 PM, Sven Oehme wrote:
> > Hi,
> >
> > the code assumption is that the underlying device has no volatile write
> > cache, i was absolute sure we have that somewhere in the FAQ, but i
> > couldn't find it, so i will talk to somebody to correct this.
> > if i understand
> >
> https://www.kernel.org/doc/Documentation/block/writeback_cache_control.txt
>  correct
> > one could enforce this by setting REQ_FUA, but thats not explicitly set
> > today, at least i can't see it. i will discuss this with one of our devs
> > who owns this code and come back.
> >
> > sven
> >
> >
> > On Thu, Sep 7, 2017 at 8:05 PM Aaron Knister <aaron.s.knister at nasa.gov
> > <mailto:aaron.s.knister at nasa.gov>> wrote:
> >
> >     Thanks Sven. I didn't think GPFS itself was caching anything on that
> >     layer, but it's my understanding that O_DIRECT isn't sufficient to
> force
> >     I/O to be flushed (e.g. the device itself might have a volatile
> caching
> >     layer). Take someone using ZFS zvol's as NSDs. I can write() all day
> log
> >     to that zvol (even with O_DIRECT) but there is absolutely no
> guarantee
> >     those writes have been committed to stable storage and aren't just
> >     sitting in RAM until an fsync() occurs (or some other bio function
> that
> >     causes a flush). I also don't believe writing to a SATA drive with
> >     O_DIRECT will force cache flushes of the drive's writeback cache..
> >     although I just tested that one and it seems to actually trigger a
> scsi
> >     cache sync. Interesting.
> >
> >     -Aaron
> >
> >     On 9/7/17 10:55 PM, Sven Oehme wrote:
> >      > I am not sure what exactly you are looking for but all
> >     blockdevices are
> >      > opened with O_DIRECT , we never cache anything on this layer .
> >      >
> >      >
> >      > On Thu, Sep 7, 2017, 7:11 PM Aaron Knister
> >     <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>
> >      > <mailto:aaron.s.knister at nasa.gov
> >     <mailto:aaron.s.knister at nasa.gov>>> wrote:
> >      >
> >      >     Hi Everyone,
> >      >
> >      >     This is something that's come up in the past and has recently
> >     resurfaced
> >      >     with a project I've been working on, and that is-- it seems
> >     to me as
> >      >     though mmfsd never attempts to flush the cache of the block
> >     devices its
> >      >     writing to (looking at blktrace output seems to confirm
> >     this). Is this
> >      >     actually the case? I've looked at the gpl headers for linux
> >     and I don't
> >      >     see any sign of blkdev_fsync, blkdev_issue_flush,
> WRITE_FLUSH, or
> >      >     REQ_FLUSH. I'm sure there's other ways to trigger this
> >     behavior that
> >      >     GPFS may very well be using that I've missed. That's why I'm
> >     asking :)
> >      >
> >      >     I figure with FPO being pushed as an HDFS replacement using
> >     commodity
> >      >     drives this feature has *got* to be in the code somewhere.
> >      >
> >      >     -Aaron
> >      >
> >      >     --
> >      >     Aaron Knister
> >      >     NASA Center for Climate Simulation (Code 606.2)
> >      >     Goddard Space Flight Center
> >      > (301) 286-2776 <tel:(301)%20286-2776>
> >      >     _______________________________________________
> >      >     gpfsug-discuss mailing list
> >      >     gpfsug-discuss at spectrumscale.org
> >     <http://spectrumscale.org> <http://spectrumscale.org>
> >      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >      >
> >      >
> >      >
> >      > _______________________________________________
> >      > gpfsug-discuss mailing list
> >      > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> >      > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >      >
> >
> >     --
> >     Aaron Knister
> >     NASA Center for Climate Simulation (Code 606.2)
> >     Goddard Space Flight Center
> >     (301) 286-2776 <tel:(301)%20286-2776>
> >     _______________________________________________
> >     gpfsug-discuss mailing list
> >     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> >     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20171009/ab5ac8d0/attachment.htm>


More information about the gpfsug-discuss mailing list