[gpfsug-discuss] Filesystem access issues via CES NFS

Jeno Cram jcram at ddn.com
Fri Dec 14 14:45:52 GMT 2018


Are you using Extended attributes on the directories in question? 

Jeno Cram | Systems Engineer
Mobile: 517-980-0495
jcram at ddn.com
DDN.com
 

On 12/13/18, 9:02 AM, "Ulrich Sibiller" <u.sibiller at science-computing.de> wrote:

    On 23.11.2018 14:41, Andreas Mattsson wrote:
    > Yes, this is repeating.
    > 
    > We’ve ascertained that it has nothing to do at all with file operations on the GPFS side.
    > 
    > Randomly throughout the filesystem mounted via NFS, ls or file access will give
    > 
    > ”
    > 
    >  > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument
    > 
    > “
    > 
    > Trying again later might work on that folder, but might fail somewhere else.
    > 
    > We have tried exporting the same filesystem via a standard kernel NFS instead of the CES 
    > Ganesha-NFS, and then the problem doesn’t exist.
    > 
    > So it is definitely related to the Ganesha NFS server, or its interaction with the file system.
    >  > Will see if I can get a tcpdump of the issue.
    
    We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with 
    debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is 
    the culprit.
    
    Here some FULL_DEBUG output:
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] 
    export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match 
    :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash   , RWrw, 
    3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get 
    :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match 
    :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash   , RWrw, 
    3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get 
    :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match 
    :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash   , RWrw, 
    3--, ---, TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get 
    :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] 
    export_check_access :EXPORT :M_DBG :EXPORT          (options=03303002              ,     ,    , 
          ,               , -- Deleg,                ,                )
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] 
    export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash   , ----, 3--, ---, 
    TCP, ----, Manage_Gids   ,         , anon_uid=    -2, anon_gid=    -2, sys)
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] 
    export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash   , ----, 34-, UDP, 
    TCP, ----, No Manage_Gids, -- Deleg, anon_uid=    -2, anon_gid=    -2, none, sys)
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] 
    export_check_access :EXPORT :M_DBG :Final options   (options=42102002root_squash   , ----, 3--, ---, 
    TCP, ----, Manage_Gids   , -- Deleg, anon_uid=    -2, anon_gid=    -2, sys)
    2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute 
    :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, 
    vers=3, proc=18
    
    The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for 
    "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is 
    NOT a member of "netgroup1".
    
    I have also opened a support case at IBM for this.
    
    @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further 
    details if required.
    
    Kind regards,
    
    Ulrich Sibiller
    
    -- 
    Dipl.-Inf. Ulrich Sibiller           science + computing ag
    System Administration                    Hagellocher Weg 73
                                         72070 Tuebingen, Germany
                               https://atos.net/de/deutschland/sc
    -- 
    Science + Computing AG
    Vorstandsvorsitzender/Chairman of the board of management:
    Dr. Martin Matzke
    Vorstand/Board of Management:
    Matthias Schempp, Sabine Hohenstein
    Vorsitzender des Aufsichtsrats/
    Chairman of the Supervisory Board:
    Philippe Miltin
    Aufsichtsrat/Supervisory Board:
    Martin Wibbe, Ursula Morgenstern
    Sitz/Registered Office: Tuebingen
    Registergericht/Registration Court: Stuttgart
    Registernummer/Commercial Register No.: HRB 382196
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    



More information about the gpfsug-discuss mailing list