[gpfsug-discuss] data interface and management infercace.
Salvatore Di Nardo
sdinardo at ebi.ac.uk
Wed Jul 22 14:51:04 BST 2015
Hello,
no, still didn't anything because we have to drain 2PB data , into a
slower storage.. so it will take few weeks. I expect doing it the second
half of August.
Will let you all know the results once done and properly tested.
Salvatore
On 22/07/15 13:58, Muhammad Habib wrote:
> did you implement it ? looks ok. All daemon traffic should be going
> through black network including inter-cluster daemon traffic ( assume
> black subnet routable). All data traffic should be going through the
> blue network. You may need to run iptrace or tcpdump to make sure
> proper network are in use. You can always open a PMR if you having
> issue during the configuration .
>
> Thanks
>
> On Wed, Jul 15, 2015 at 5:19 AM, Salvatore Di Nardo
> <sdinardo at ebi.ac.uk <mailto:sdinardo at ebi.ac.uk>> wrote:
>
> Thanks for the input.. this is actually very interesting!
>
> Reading here:
> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview
> <https://www.ibm.com/developerworks/community/wikis/home?lang=en#%21/wiki/General+Parallel+File+System+%28GPFS%29/page/GPFS+Network+Communication+Overview>
> ,
> specifically the " Using more than one network" part it seems to
> me that this way we should be able to split the lease/token/ping
> from the data.
>
> Supposing that I implement a GSS cluster with only NDS and a
> second cluster with only clients:
>
>
>
> As far i understood if on the NDS cluster add first the subnet
> 10.20.0.0/16 <http://10.20.0.0/16> and then 10.30.0.0 is should
> use the internal network for all the node-to-node comunication,
> leaving the 10.30.0.0/30 <http://10.30.0.0/30> only for data
> traffic witht he remote cluster ( the clients). Similarly, in the
> client cluster, adding first 10.10.0.0/16 <http://10.10.0.0/16>
> and then 10.30.0.0, will guarantee than the node-to-node
> comunication pass trough a different interface there the data is
> passing. Since the client are just "clients" the traffic trough
> 10.10.0.0/16 <http://10.10.0.0/16> should be minimal (only token
> ,lease, ping and so on ) and not affected by the rest. Should be
> possible at this point move aldo the "admin network" on the
> internal interface, so we effectively splitted all the "non data"
> traffic on a dedicated interface.
>
> I'm wondering if I'm missing something, and in case i didn't, what
> could be the real traffic in the internal (black) networks ( 1g
> link its fine or i still need 10g for that). Another thing I I'm
> wondering its the load of the "non data" traffic between the
> clusters.. i suppose some "daemon traffic" goes trough the blue
> interface for the inter-cluster communication.
>
>
> Any thoughts ?
>
> Salvatore
>
> On 13/07/15 18:19, Muhammad Habib wrote:
>> Did you look at "subnets" parameter used with "mmchconfig"
>> command. I think you can use order list of subnets for daemon
>> communication and then actual daemon interface can be used for
>> data transfer. When the GPFS will start it will use actual
>> daemon interface for communication , however , once its started ,
>> it will use the IPs from the subnet list whichever coming first
>> in the list. To further validate , you can put network sniffer
>> before you do actual implementation or alternatively you can open
>> a PMR with IBM.
>>
>> If your cluster having expel situation , you may fine tune your
>> cluster e.g. increase ping timeout period , having multiple NSD
>> servers and distributing filesystems across these NSD servers.
>> Also critical servers can have HBA cards installed for direct I/O
>> through fiber.
>>
>> Thanks
>>
>> On Mon, Jul 13, 2015 at 11:22 AM, Jason Hick <jhick at lbl.gov
>> <mailto:jhick at lbl.gov>> wrote:
>>
>> Hi,
>>
>> Yes having separate data and management networks has been
>> critical for us for keeping health monitoring/communication
>> unimpeded by data movement.
>>
>> Not as important, but you can also tune the networks
>> differently (packet sizes, buffer sizes, SAK, etc) which can
>> help.
>>
>> Jason
>>
>> On Jul 13, 2015, at 7:25 AM, Vic Cornell
>> <viccornell at gmail.com <mailto:viccornell at gmail.com>> wrote:
>>
>>> Hi Salvatore,
>>>
>>> I agree that that is what the manual - and some of the wiki
>>> entries say.
>>>
>>> However , when we have had problems (typically congestion)
>>> with ethernet networks in the past (20GbE or 40GbE) we have
>>> resolved them by setting up a separate “Admin” network.
>>>
>>> The before and after cluster health we have seen measured in
>>> number of expels and waiters has been very marked.
>>>
>>> Maybe someone “in the know” could comment on this split.
>>>
>>> Regards,
>>>
>>> Vic
>>>
>>>
>>>> On 13 Jul 2015, at 14:29, Salvatore Di Nardo
>>>> <sdinardo at ebi.ac.uk <mailto:sdinardo at ebi.ac.uk>> wrote:
>>>>
>>>> Hello Vic.
>>>> We are currently draining our gpfs to do all the recabling
>>>> to add a management network, but looking what the admin
>>>> interface does ( man mmchnode ) it says something different:
>>>>
>>>> --admin-interface={hostname | ip_address}
>>>> Specifies the name of the node to be used by GPFS
>>>> administration commands when communicating between
>>>> nodes. The admin node name must be specified as an IP
>>>> address or a hostname that is resolved by the host
>>>> command to the desired IP address. If the keyword
>>>> DEFAULT is specified, the admin interface for the
>>>> node is set to be equal to the daemon interface for
>>>> the node.
>>>>
>>>>
>>>> So, seems used only for commands propagation, hence have
>>>> nothing to do with the node-to-node traffic. Infact the
>>>> other interface description is:
>>>>
>>>> --daemon-interface={hostname | ip_address}
>>>> Specifies the host name or IP address _*to be used
>>>> by the GPFS daemons for node-to-node
>>>> communication*_. The host name or IP address must
>>>> refer to the commu-
>>>> nication adapter over which the GPFS daemons
>>>> communicate. Alias interfaces are not allowed. Use
>>>> the original address or a name that is resolved
>>>> by the
>>>> host command to that original address.
>>>>
>>>>
>>>> The "expired lease" issue and file locking mechanism a(
>>>> most of our expells happens when 2 clients try to write in
>>>> the same file) are exactly node-to node-comunication, so
>>>> im wondering what's the point to separate the "admin
>>>> network". I want to be sure to plan the right changes
>>>> before we do a so massive task. We are talking about adding
>>>> a new interface on 700 clients, so the recabling work its
>>>> not small.
>>>>
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>> On 13/07/15 14:00, Vic Cornell wrote:
>>>>> Hi Salavatore,
>>>>>
>>>>> Does your GSS have the facility for a 1GbE “management”
>>>>> network? If so I think that changing the “admin” node
>>>>> names of the cluster members to a set of IPs on the
>>>>> management network would give you the split that you need.
>>>>>
>>>>> What about the clients? Can they also connect to a
>>>>> separate admin network?
>>>>>
>>>>> Remember that if you are using multi-cluster all of the
>>>>> nodes in both networks must share the same admin network.
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Vic
>>>>>
>>>>>
>>>>>> On 13 Jul 2015, at 13:31, Salvatore Di Nardo
>>>>>> <sdinardo at ebi.ac.uk <mailto:sdinardo at ebi.ac.uk>> wrote:
>>>>>>
>>>>>> Anyone?
>>>>>>
>>>>>> On 10/07/15 11:07, Salvatore Di Nardo wrote:
>>>>>>> Hello guys.
>>>>>>> Quite a while ago i mentioned that we have a big expel
>>>>>>> issue on our gss ( first gen) and white a lot people
>>>>>>> suggested that the root cause could be that we use the
>>>>>>> same interface for all the traffic, and that we should
>>>>>>> split the data network from the admin network. Finally
>>>>>>> we could plan a downtime and we are migrating the data
>>>>>>> out so, i can soon safelly play with the change, but
>>>>>>> looking what exactly i should to do i'm a bit puzzled.
>>>>>>> Our mmlscluster looks like this:
>>>>>>>
>>>>>>> GPFS cluster information
>>>>>>> ========================
>>>>>>> GPFS cluster name: GSS.ebi.ac.uk
>>>>>>> <http://gss.ebi.ac.uk/>
>>>>>>> GPFS cluster id: 17987981184946329605
>>>>>>> GPFS UID domain: GSS.ebi.ac.uk
>>>>>>> <http://gss.ebi.ac.uk/>
>>>>>>> Remote shell command: /usr/bin/ssh
>>>>>>> Remote file copy command: /usr/bin/scp
>>>>>>>
>>>>>>> GPFS cluster configuration servers:
>>>>>>> -----------------------------------
>>>>>>> Primary server: gss01a.ebi.ac.uk
>>>>>>> <http://gss01a.ebi.ac.uk/>
>>>>>>> Secondary server: gss02b.ebi.ac.uk
>>>>>>> <http://gss02b.ebi.ac.uk/>
>>>>>>>
>>>>>>> Node Daemon node name IP address Admin
>>>>>>> node name Designation
>>>>>>> -----------------------------------------------------------------------
>>>>>>> 1 gss01a.ebi.ac.uk
>>>>>>> <http://gss01a.ebi.ac.uk/> 10.7.28.2
>>>>>>> gss01a.ebi.ac.uk <http://gss01a.ebi.ac.uk/>
>>>>>>> quorum-manager
>>>>>>> 2 gss01b.ebi.ac.uk
>>>>>>> <http://gss01b.ebi.ac.uk/> 10.7.28.3
>>>>>>> gss01b.ebi.ac.uk <http://gss01b.ebi.ac.uk/>
>>>>>>> quorum-manager
>>>>>>> 3 gss02a.ebi.ac.uk
>>>>>>> <http://gss02a.ebi.ac.uk/> 10.7.28.67
>>>>>>> gss02a.ebi.ac.uk <http://gss02a.ebi.ac.uk/>
>>>>>>> quorum-manager
>>>>>>> 4 gss02b.ebi.ac.uk
>>>>>>> <http://gss02b.ebi.ac.uk/> 10.7.28.66
>>>>>>> gss02b.ebi.ac.uk <http://gss02b.ebi.ac.uk/>
>>>>>>> quorum-manager
>>>>>>> 5 gss03a.ebi.ac.uk
>>>>>>> <http://gss03a.ebi.ac.uk/> 10.7.28.34
>>>>>>> gss03a.ebi.ac.uk <http://gss03a.ebi.ac.uk/>
>>>>>>> quorum-manager
>>>>>>> 6 gss03b.ebi.ac.uk
>>>>>>> <http://gss03b.ebi.ac.uk/> 10.7.28.35
>>>>>>> gss03b.ebi.ac.uk <http://gss03b.ebi.ac.uk/>
>>>>>>> quorum-manager
>>>>>>>
>>>>>>>
>>>>>>> It was my understanding that the "admin node" should use
>>>>>>> a different interface ( a 1g link copper should be
>>>>>>> fine), while the daemon node is where the data was
>>>>>>> passing , so should point to the bonded 10g interfaces.
>>>>>>> but when i read the mmchnode man page i start to be
>>>>>>> quite confused. It says:
>>>>>>>
>>>>>>> --daemon-interface={hostname | ip_address}
>>>>>>> Specifies the host name or IP address _*to be used by
>>>>>>> the GPFS daemons for node-to-node communication*_. The
>>>>>>> host name or IP address must refer to the communication
>>>>>>> adapter over which the GPFS daemons communicate.
>>>>>>> Alias interfaces are not allowed. Use the
>>>>>>> original address or a name that is resolved by the host
>>>>>>> command to that original address.
>>>>>>>
>>>>>>> --admin-interface={hostname | ip_address}
>>>>>>> Specifies the name of the node to be used by GPFS
>>>>>>> administration commands when communicating between
>>>>>>> nodes. The admin node name must be specified as an IP
>>>>>>> address or a hostname that is resolved by the host command
>>>>>>> tothe desired IP address. If the keyword
>>>>>>> DEFAULT is specified, the admin interface for the node
>>>>>>> is set to be equal to the daemon interface for the node.
>>>>>>>
>>>>>>> What exactly means "node-to node-communications" ?
>>>>>>> Means DATA or also the "lease renew", and the token
>>>>>>> communication between the clients to get/steal the locks
>>>>>>> to be able to manage concurrent write to thr same file?
>>>>>>> Since we are getting expells ( especially when several
>>>>>>> clients contends the same file ) i assumed i have to
>>>>>>> split this type of packages from the data stream, but
>>>>>>> reading the documentation it looks to me that those
>>>>>>> internal comunication between nodes use the
>>>>>>> daemon-interface wich i suppose are used also for the
>>>>>>> data. so HOW exactly i can split them?
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>> Salvatore
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> gpfsug-discuss mailing list
>>>>>>> gpfsug-discuss atgpfsug.org <http://gpfsug.org/>
>>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>>
>>>>>> _______________________________________________
>>>>>> gpfsug-discuss mailing list
>>>>>> gpfsug-discuss at gpfsug.org <http://gpfsug.org/>
>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>>
>> --
>> This communication contains confidential information intended
>> only for the persons to whom it is addressed. Any other
>> distribution, copying or disclosure is strictly prohibited. If
>> you have received this communication in error, please notify the
>> sender and delete this e-mail message immediately.
>>
>> Le présent message contient des renseignements de nature
>> confidentielle réservés uniquement à l'usage du destinataire.
>> Toute diffusion, distribution, divulgation, utilisation ou
>> reproduction de la présente communication, et de tout fichier qui
>> y est joint, est strictement interdite. Si vous avez reçu le
>> présent message électronique par erreur, veuillez informer
>> immédiatement l'expéditeur et supprimer le message de votre
>> ordinateur et de votre serveur.
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss atgpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> --
> This communication contains confidential information intended only for
> the persons to whom it is addressed. Any other distribution, copying
> or disclosure is strictly prohibited. If you have received this
> communication in error, please notify the sender and delete this
> e-mail message immediately.
>
> Le présent message contient des renseignements de nature
> confidentielle réservés uniquement à l'usage du destinataire. Toute
> diffusion, distribution, divulgation, utilisation ou reproduction de
> la présente communication, et de tout fichier qui y est joint, est
> strictement interdite. Si vous avez reçu le présent message
> électronique par erreur, veuillez informer immédiatement l'expéditeur
> et supprimer le message de votre ordinateur et de votre serveur.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150722/c54553d5/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 28904 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150722/c54553d5/attachment.jpe>
More information about the gpfsug-discuss
mailing list