[gpfsug-discuss] How Zimon/Grafana-bridge process data

Dorigo Alvise (PSI) alvise.dorigo at psi.ch
Tue Aug 21 15:48:15 BST 2018


More precisely the problem is the following:

If I set period=1 for a "rate" sensor (network speed, NSD read/write speed, PDisk read/write speed) everything is correct because every second the sensors get the valuess of the cumulative counters (and do not divide it by 1, which is not affecting anything for 1 second).
If I set the period=2, the "rate" sensors collect the values from the cumulative counters every two seconds but they do not divide by 2 those values (because pmsensors do not actually divide; they seem to silly report what they read which is understand-able from a performance point of view); then grafana receives as double as the real speed.

I've to correct myself: here the point is not how sampling/downsampling is done by grafana/grafana-bridge/whatever as I wrongly wrote in my first email.
The point is: if I collect data every N seconds (because I do not want to overloads the pmcollector node), how can I divide (in grafana) the reported collected data by N to get real avg speed in that N-seconds time interval ??

At the moment it seems that the only option is using N=1, which is bad because, as I stated, it overloads the collector when many nodes run many pmsensors...

  A

________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of IBM Spectrum Scale [scale at us.ibm.com]
Sent: Friday, July 27, 2018 8:27 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] How Zimon/Grafana-bridge process data


Hi,
as there are more often similar questions rising, we just put an article about the topic on the Spectrum Scale Wiki
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Downsampling%2C%20Upsampling%20and%20Aggregation%20of%20the%20performance%20data

While there will be some minor updates on the article in the next time, it might already explain your questions.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------
If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.

If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries.

The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team.

[Inactive hide details for "Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 s]"Dorigo Alvise (PSI)" ---13.07.2018 12:08:59---Hi, I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes.

From: "Dorigo Alvise (PSI)" <alvise.dorigo at psi.ch>
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: 13.07.2018 12:08
Subject: [gpfsug-discuss] How Zimon/Grafana-bridge process data
Sent by: gpfsug-discuss-bounces at spectrumscale.org

________________________________



Hi,
I've a GL2 cluster based on gpfs 4.2.3-6, with 1 support node and 2 IO/NSD nodes.

I've the following perfmon configuration for the metric-group GPFSNSDDisk:

{
name = "GPFSNSDDisk"
period = 2
restrict = "nsdNodes"
},

that, as far as I know sends data to the collector every 2 seconds (correct ?). But how ? does it send what it reads from the counter every two seconds ? or does it aggregated in some way ? or what else ?

In the collector node pmcollector, grafana-bridge and grafana-server run.

Now I need to understand how to play with the grafana parameters:
- Down sample (or Disable downsampling)
- Aggregator (following on the same row the metrics).

See attached picture 4s.png as reference.

In the past I had the period set to 1. And grafana used to display correct data (bytes/s for the metric gpfs_nsdds_bytes_written) with aggregator set to "sum", which AFAIK means "sum all that metrics that match the filter below" (again see the attached picture to see how the filter is set to only collect data from the IO nodes).

Today I've changed to "period=2"... and grafana started to display funny data rate (the double, or quad of the real rate).

I had to play (almost randomly) with "Aggregator" (from sum to avg, which as fas as I undestand doesn't mean anything in my case... average between the two IO nodes ? or what ?) and "Down sample" (from empty to 2s, and then to 4s) to get back real data rate which is compliant with what I do get with dstat.

Can someone kindly explain how to play with these parameters when zimon sensor's period is changed ?

Many thanks in advance
Regards,

Alvise Dorigo[attachment "4s.png" deleted by Manfred Haubrich/Germany/IBM] _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180821/4e7f9cde/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: graycol.gif
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180821/4e7f9cde/attachment.gif>


More information about the gpfsug-discuss mailing list