[gpfsug-discuss] HAWC/LROC in Ganesha server

Tue Mar 22 12:44:57 GMT 2016

It's worth sharing that we have seen two problems with CES providing NFS via ganesha in a similar deployment:

1.  multicluster cache invalidation: ganesha's FSAL upcall for invalidation of its file descriptor cache by GPFS doesn't appear to work for remote GPFS filesystems.  As mentioned by Simon, this is unsupported, though the problem can be worked around with some effort though by disabling ganesha's FD cache entirely.

2.  Readdir bad cookie bug: an interaction we're still providing info to IBM about between certain linux NFS clients and ganesha in which readdir calls may sporadically return empty results for directories containing files, without any corresponding error result code.

Given our multicluster requirements and the problems associated with the readdir bug, we've reverted to using CNFS for now.

Thx
Paul

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon Thompson (Research Computing - IT Services)
Sent: Tuesday, March 22, 2016 6:05 AM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] HAWC/LROC in Ganesha server

Hi Martin,

We have LROC enabled on our CES protocol nodes for SMB:

# mmdiag --lroc

=== mmdiag: lroc ===

LROC Device(s): '0A0A001755E9634D#/dev/sdb;0A0A001755E96350#/dev/sdc;'

status Running

Cache inodes 1 dirs 1 data 1  Config: maxFile 0 stubFile 0 Max capacity: 486370 MB, currently in use: 1323 MB Statistics from: Thu Feb 25 11:18:25 2016

Total objects stored 338690236 (2953113 MB) recalled 336905443 (1326912 MB)

      objects failed to store 0 failed to recall 94 failed to inval 0

      objects queried 0 (0 MB) not found 0 = 0.00 %

      objects invalidated 338719563 (3114191 MB)

      Inode objects stored 336876572 (1315923 MB) recalled 336884262

(1315948 MB) = 100.00 %

      Inode objects queried 0 (0 MB) = 0.00 % invalidated 336910469

(1316052 MB)

      Inode objects failed to store 0 failed to recall 0 failed to query 0 failed to inval 0

      Directory objects stored 2896 (115 MB) recalled 564 (29 MB) = 19.48 %

      Directory objects queried 0 (0 MB) = 0.00 % invalidated 2857 (725 MB)

      Directory objects failed to store 0 failed to recall 2 failed to query 0 failed to inval 0

      Data objects stored 1797127 (1636968 MB) recalled 16057 (10907 MB) =

0.89 %

      Data objects queried 0 (0 MB) = 0.00 % invalidated 1805234 (1797405

MB)

      Data objects failed to store 0 failed to recall 92 failed to query 0 failed to inval 0

  agent inserts=389305528, reads=337261110

        response times (usec):

        insert min/max/avg=1/47705/11

        read   min/max/avg=1/3145728/54

  ssd   writeIOs=5906506, writePages=756033024

        readIOs=44692016, readPages=44692610

        response times (usec):

        write  min/max/avg=3072/1117534/3253

        read   min/max/avg=56/3145728/364

So mostly it is inode objects being used form the cache. Whether this is small data-in-inode or plain inode (stat) type operations, pass.

We don't use HAWC on our protocol nodes, the HAWC pool needs to exist in the cluster where the NSD data is written and we multi-cluster to the protocol nodes (technically this isn't supported, but works fine for us).

On HAWC, we did test it out in another of our clusters using SSDs in the nodes, but we er, had a few issues when we should a rack of kit down which included all the HAWC devices which were in nodes. You probably want to think a bit carefully about how HAWC is implemented in your environment.

We are about to implement in one of our clusters, but that will be HAWC devices available to the NSD servers rather than on client nodes.

Simon

On 22/03/2016, 09:45, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Martin Gasthuber<mailto:gpfsug-discuss-bounces at spectrumscale.org%20on%20behalf%20of%20Martin%20Gasthuber>" <gpfsug-discuss-bounces at spectrumscale.org on behalf of martin.gasthuber at desy.de<mailto:gpfsug-discuss-bounces at spectrumscale.org%20on%20behalf%20of%20martin.gasthuber at desy.de>> wrote:

>Hi,

>

>  we're looking for a powerful (and cost efficient) machine config to

>optimally support the new CES services, especially Ganesha. In more

>detail, we're wondering if somebody has already got some experience

>running these services on machines with HAWC and/or LROC enabled HW,

>resulting in a clearer understanding of the benefits of that config. We

>will have ~300 client boxes accessing GPFS via NFS and planning for 2

>nodes initially.

>

>best regards,

>  Martin

>

>_______________________________________________

>gpfsug-discuss mailing list

>gpfsug-discuss at spectrumscale.org

>http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________

gpfsug-discuss mailing list

gpfsug-discuss at spectrumscale.org

http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160322/d66fc20f/attachment.htm>