[gpfsug-discuss] Filesystem access issues via CES NFS

Andreas Mattsson andreas.mattsson at maxiv.lu.se
Fri Jan 4 09:09:03 GMT 2019


Just reporting back that the issue we had seems to have been solved. In our case it was fixed by applying hotfix-packages from IBM. Did this in December and I can no longer trigger the issue. Hopefully, it'll stay fixed when we get full production load on the system again now in January.

Also, as far as I can see, it looks like Scale 5.0.2.2 includes these packages already.


Regards,

Andreas mattsson

____________________________________________

[X]

Andreas Mattsson

Systems Engineer



MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

________________________________
Från: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> för Ulrich Sibiller <u.sibiller at science-computing.de>
Skickat: den 13 december 2018 14:52:42
Till: gpfsug-discuss at spectrumscale.org
Ämne: Re: [gpfsug-discuss] Filesystem access issues via CES NFS

On 23.11.2018 14:41, Andreas Mattsson wrote:
> Yes, this is repeating.
>
> We’ve ascertained that it has nothing to do at all with file operations on the GPFS side.
>
> Randomly throughout the filesystem mounted via NFS, ls or file access will give
>
>>
>  > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument
>
>>
> Trying again later might work on that folder, but might fail somewhere else.
>
> We have tried exporting the same filesystem via a standard kernel NFS instead of the CES
> Ganesha-NFS, and then the problem doesn’t exist.
>
> So it is definitely related to the Ganesha NFS server, or its interaction with the file system.
>  > Will see if I can get a tcpdump of the issue.

We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with
debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is
the culprit.

Here some FULL_DEBUG output:
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash   , RWrw,
3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash   , RWrw,
3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash   , RWrw,
3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :EXPORT          (options=03303002              ,     ,    ,
      ,               , -- Deleg,                ,                )
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash   , ----, 3--, ---,
TCP, ----, Manage_Gids   ,         , anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash   , ----, 34-, UDP,
TCP, ----, No Manage_Gids, -- Deleg, anon_uid=    -2, anon_gid=    -2, none, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :Final options   (options=42102002root_squash   , ----, 3--, ---,
TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute
:DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport,
vers=3, proc=18

The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for
"netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is
NOT a member of "netgroup1".

I have also opened a support case at IBM for this.

@Malahal: Looks like you have written the netgroup caching code, feel free to ask for further
details if required.

Kind regards,

Ulrich Sibiller

--
Dipl.-Inf. Ulrich Sibiller           science + computing ag
System Administration                    Hagellocher Weg 73
                                     72070 Tuebingen, Germany
                           https://atos.net/de/deutschland/sc
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190104/2084ad3b/attachment.htm>


More information about the gpfsug-discuss mailing list