[gpfsug-discuss] Odd behavior with cat followed by grep.

John Hanks griznog at gmail.com
Wed Feb 14 19:17:19 GMT 2018


Thanks Bryan, mystery solved :)

We also stumbled across these related items, in case anyone else wanders
into this thread.

http://bug-grep.gnu.narkive.com/Y8cfvWDt/bug-27666-grep-on-gpfs-filesystem-seek-hole-problem

https://www.ibm.com/developerworks/community/forums/html/topic?id=c2a94433-9ec0-4a4b-abfe-d0a1e721d630

GPFS, the gift that keeps on giving ... me more things to do instead of
doing the things I want to be doing.

Thanks all,

jbh

On Wed, Feb 14, 2018 at 10:48 AM, Bryan Banister <bbanister at jumptrading.com>
wrote:

> Hi all,
>
>
>
> We found this a while back and IBM fixed it.  Here’s your answer:
> http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385
>
>
>
> Cheers,
>
> -Bryan
>
>
>
> *From:* gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-
> bounces at spectrumscale.org] *On Behalf Of *John Hanks
> *Sent:* Wednesday, February 14, 2018 12:31 PM
> *To:* gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> *Subject:* Re: [gpfsug-discuss] Odd behavior with cat followed by grep.
>
>
>
> *Note: External Email*
> ------------------------------
>
> Straces are interesting, but don't immediately open my eyes:
>
>
>
> strace of grep on NFS (works as expected)
>
>
>
> openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3
>
> fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
>
> ioctl(3, TCGETS, 0x7ffe2c26b0b0)        = -1 ENOTTY (Inappropriate ioctl
> for device)
>
> read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
>
> lseek(3, 32768, SEEK_HOLE)              = 530721
>
> lseek(3, 32768, SEEK_SET)               = 32768
>
> fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0
>
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7f3bf6c43000
>
> write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1
>
>
>
> strace on GPFS (thinks file is binary)
>
>
>
> openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3
>
> fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
>
> ioctl(3, TCGETS, 0x7ffc9b52caa0)        = -1 ENOTTY (Inappropriate ioctl
> for device)
>
> read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
>
> lseek(3, 32768, SEEK_HOLE)              = 262144
>
> lseek(3, 32768, SEEK_SET)               = 32768
>
> fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0
>
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7fd45ee88000
>
> close(3)                                = 0
>
> write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file
> /srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches
>
> ) = 72
>
>
>
> Do the lseek() results indicate that the grep on the GPFS mounted version
> thinks the file is a sparse file? For comparison I strace'd md5sum in place
> of the grep and it does not lseek() with SEEK_HOLE, it's access in both
> cases look identical, like:
>
>
>
> open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3
>
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
>
> fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
>
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7fb7d2c2b000
>
> read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
>
> ...[reads clipped]...
>
> read(3, "", 24576)                      = 0
>
> lseek(3, 0, SEEK_CUR)                   = 530721
>
> close(3)                                = 0
>
>
>
>
>
> jbh
>
>
>
>
>
> On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister <aaron.s.knister at nasa.gov>
> wrote:
>
> Just speculating here (also known as making things up) but I wonder if
> grep is somehow using the file's size in its determination of binary
> status. I also see mmap in the strace so maybe there's some issue with
> mmap where some internal GPFS buffer is getting truncated
> inappropriately but leaving a bunch of null values which gets returned
> to grep.
>
> -Aaron
>
> On 2/14/18 10:21 AM, John Hanks wrote:
> > Hi Valdis,
> >
> > I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think
> > this is a data integrity issue, thankfully:
> >
> > $ ./pipetestls.sh
> > 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16
> > /srv/gsfs0/projects/pipetest.tmp.txt
> > 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16
> /home/griznog/pipetest.tmp.txt
> >
> > $ ./pipetestmd5.sh
> > 15cb81a85c9e450bdac8230309453a0a  /srv/gsfs0/projects/pipetest.tmp.txt
> > 15cb81a85c9e450bdac8230309453a0a  /home/griznog/pipetest.tmp.txt
> >
> > And replacing grep with 'file' even properly sees the files as ASCII:
> > $ ./pipetestfile.sh
> > /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines
> > /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines
> >
> > I'll poke a little harder at grep next and see what the difference in
> > strace of each reveals.
> >
> > Thanks,
> >
> > jbh
> >
> >
> >
> >
> > On Wed, Feb 14, 2018 at 7:08 AM, <valdis.kletnieks at vt.edu
>
> > <mailto:valdis.kletnieks at vt.edu>> wrote:
> >
> >     On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said:
> >
> >     > #  ls -aln /srv/gsfs0/projects/pipetest.tmp.txt
> $HOME/pipetest.tmp.txt
> >     > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10
> /home/griznog/pipetest.tmp.txt
> >     > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10
> >     > /srv/gsfs0/projects/pipetest.tmp.txt
> >     >
> >     > We can "fix" the user case that exposed this by not using a temp
> file or
> >     > inserting a sleep, but I'd still like to know why GPFS is behaving
> this way
> >     > and make it stop.
> >
> >     May be related to replication, or other behind-the-scenes behavior.
> >
> >     Consider this example - 4.2.3.6, data and metadata replication both
> >     set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with
> >     a full
> >     fiberchannel mesh to 3 Dell MD34something arrays.
> >
> >     % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test;
> >     sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test
> >     4096+0 records in
> >     4096+0 records out
> >     4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s
> >     2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> >     8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> >     8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
> >
> >     Notice that the first /bin/ls shouldn't be starting until after the
> >     dd has
> >     completed - at which point it's only allocated half the blocks
> >     needed to hold
> >     the 4M of data at one site.  5 seconds later, it's allocated the
> >     blocks at both
> >     sites and thus shows the full 8M needed for 2 copies.
> >
> >     I've also seen (but haven't replicated it as I write this) a small
> >     file (4-8K
> >     or so) showing first one full-sized block, then a second full-sized
> >     block, and
> >     then dropping back to what's needed for 2 1/32nd fragments.  That
> had me
> >     scratching my head
> >
> >     Having said that, that's all metadata fun and games, while your case
> >     appears to have some problems with data integrity (which is a whole
> lot
> >     scarier).  It would be *really* nice if we understood the problem
> here.
> >
> >     The scariest part is:
> >
> >     > The first grep | wc -l returns 1, because grep outputs  "Binary
> file /path/to/
> >     > gpfs/mount/test matches"
> >
> >     which seems to be implying that we're failing on semantic
> consistency.
> >     Basically, your 'cat' command is completing and closing the file,
> >     but then a
> >     temporally later open of the same find is reading something other
> >     that only the
> >     just-written data.  My first guess is that it's a race condition
> >     similar to the
> >     following: The cat command is causing a write on one NSD server, and
> >     the first
> >     grep results in a read from a *different* NSD server, returning the
> >     data that
> >     *used* to be in the block because the read actually happens before
> >     the first
> >     NSD server actually completes the write.
> >
> >     It may be interesting to replace the grep's with pairs of 'ls -ls /
> >     dd' commands to grab the
> >     raw data and its size, and check the following:
> >
> >     1) does the size (both blocks allocated and logical length) reported
> by
> >     ls match the amount of data actually read by the dd?
> >
> >     2) Is the file length as actually read equal to the written length,
> >     or does it
> >     overshoot and read all the way to the next block boundary?
> >
> >     3) If the length is correct, what's wrong with the data that's
> >     telling grep that
> >     it's a binary file?  ( od -cx is your friend here).
> >
> >     4) If it overshoots, is the remainder all-zeros (good) or does it
> >     return semi-random
> >     "what used to be there" data (bad, due to data exposure issues)?
> >
> >     (It's certainly not the most perplexing data consistency issue I've
> >     hit in 4 decades - the
> >     winner *has* to be a intermittent data read corruption on a GPFS 3.5
> >     cluster that
> >     had us, IBM, SGI, DDN, and at least one vendor of networking gear
> >     all chasing our
> >     tails for 18 months before we finally tracked it down. :)
> >
> >     _______________________________________________
> >     gpfsug-discuss mailing list
>
> >     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> >     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> ------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged information.
> If you are not the intended recipient, you are hereby notified that any
> review, dissemination or copying of this email is strictly prohibited, and
> to please notify the sender immediately and destroy this email and any
> attachments. Email transmission cannot be guaranteed to be secure or
> error-free. The Company, therefore, does not make any guarantees as to the
> completeness or accuracy of this email or any attachments. This email is
> for informational purposes only and does not constitute a recommendation,
> offer, request or solicitation of any kind to buy, sell, subscribe, redeem
> or perform any type of transaction of a financial product.
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180214/d62fc203/attachment.htm>


More information about the gpfsug-discuss mailing list