[gpfsug-discuss] GPFS GUI - DataPool_capUtil error

Buterbaugh, Kevin L Kevin.Buterbaugh at Vanderbilt.Edu
Mon Apr 9 18:17:52 BST 2018


Hi All,

I’m pretty new to using the GPFS GUI for health and performance monitoring, but am finding it very useful.  I’ve got an issue that I can’t figure out.  In my events I see:

Event name:pool-data_high_error
Component:File SystemEntity
type:PoolEntity
name: <redacted>
Event time:3/26/18 4:44:10 PM
Message:The pool <redacted> of file system <redacted> reached a nearly exhausted data level. DataPool_capUtilDescription:The pool reached a nearly exhausted level.
Cause:The pool reached a nearly exhausted level.
User action:Add more capacity to pool or move data to different pool or delete data and/or snapshots.
Reporting node:<redacted>
Event type:Active health state of an entity which is monitored by the system.

Now this is for a “capacity” pool … i.e. one that mmapplypolicy is going to fill up to 97% full.  Therefore, I’ve modified the thresholds:

### Threshold Rules ###
rule_name             metric                error  warn    direction  filterBy  groupBy                                            sensitivity
--------------------------------------------------------------------------------------------------------------------------------------------------
InodeCapUtil_Rule     Fileset_inode         90.0   80.0    high                 gpfs_cluster_name,gpfs_fs_name,gpfs_fset_name      300
MemFree_Rule          mem_memfree           50000  100000  low                  node                                               300
MetaDataCapUtil_Rule  MetaDataPool_capUtil  90.0   80.0    high                 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300
DataCapUtil_Rule      DataPool_capUtil      99.0   90.0    high                 gpfs_cluster_name,gpfs_fs_name,gpfs_diskpool_name  300

But it’s still in an “Error” state.  I see that the time of the event is March 26th at 4:44 PM, so I’m thinking this is something that’s just stale, but I can’t figure out how to clear it.  The mmhealth command shows the error, too, and from that message it appears as if the event was triggered prior to my adjusting the thresholds:

Event                     Parameter     Severity    Active Since             Event Message
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
pool-data_high_error      redacted        ERROR       2018-03-26 16:44:10      The pool redacted of file system redacted reached a nearly exhausted data level. 90.0

What do I need to do to get the GUI / mmhealth to recognize the new thresholds and clear this error?  I’ve searched and searched in the GUI for a way to clear it.  I’ve read the “Monitoring and Managing IBM Spectrum Scale Using the GUI” rebook pretty much cover to cover and haven’t found anything there about how to clear this.  Thanks...

Kevin
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180409/307de958/attachment.htm>


More information about the gpfsug-discuss mailing list