Cacti (home)ForumsDocumentation

Differences

This shows you the differences between two versions of the page.

manual:087:4_help.2_debugging [2009/02/21 20:37]
gandalf
manual:087:4_help.2_debugging [2012/11/30 15:14] (current)
gandalf tackling snmpbulkwalks
Line 1: Line 1:
-==== Debugging ====+===== Debugging =====
  
 Cacti users sometimes complain about NaN's in their graphs. Unfortunately, there are several reasons for this result. The following is a step-by-step procedure recommended for debugging. Cacti users sometimes complain about NaN's in their graphs. Unfortunately, there are several reasons for this result. The following is a step-by-step procedure recommended for debugging.
  
-=== Check Cacti Log File ===+==== Check Cacti Log File ====
  
 Please have a look at your cacti log file. Usually, you'll find it at //<path_cacti>/log/cacti.log//. Else see //Settings//, //Paths//. Check for this kind of error: Please have a look at your cacti log file. Usually, you'll find it at //<path_cacti>/log/cacti.log//. Else see //Settings//, //Paths//. Check for this kind of error:
  
-<code>CACTID: Host[...] DS[....] WARNING: SNMP timeout detected [500 ms], ignoring host '........'</code>+<code>SPINE: Host[...] DS[....] WARNING: SNMP timeout detected [500 ms], ignoring host '........'</code>
  
 For "reasonable" timeouts, this may be related to a snmpbulkwalk issue. To change this, see  //Settings//, //Poller// and lower the value for //The Maximum SNMP OID's Per SNMP Get Request//. Start at a value of 1 and increase it again, if the poller starts working. Some agent's don't have the horsepower to deliver that many OID's at a time. Therefore, we can reduce the number for those older/underpowered devices. For "reasonable" timeouts, this may be related to a snmpbulkwalk issue. To change this, see  //Settings//, //Poller// and lower the value for //The Maximum SNMP OID's Per SNMP Get Request//. Start at a value of 1 and increase it again, if the poller starts working. Some agent's don't have the horsepower to deliver that many OID's at a time. Therefore, we can reduce the number for those older/underpowered devices.
  
-=== Check Basic Data Gathering ===+==== Check Basic Data Gathering ====
  
 For scripts, run them as cactiuser from cli to check basic functionality. E.g. for a perl script named //your-perl-script.pl// with parameters "p1 p2" under *nix this would look like: For scripts, run them as cactiuser from cli to check basic functionality. E.g. for a perl script named //your-perl-script.pl// with parameters "p1 p2" under *nix this would look like:
Line 24: Line 24:
 .... (check output)</code> .... (check output)</code>
  
-=== Check Cacti's Poller ===+==== Check Cacti's Poller ====
  
-First make sure that crontab always shows poller.php. This program will either call cmd.php, the PHP based poller _or_ cactid, the fast alternative, written in C. Define the poller you're using at **Settings**, **Poller**. Cactid has to be implemented seperately, it does not come with cacti by default.+First make sure that crontab always shows poller.php. This program will either call cmd.php, the PHP based poller _or_ spine, the fast alternative, written in C. Define the poller you're using at **Settings**, **Poller**. Spine has to be implemented seperately, it does not come with cacti by default.
  
 Now, clear //./log/cacti.log// (or rename it to get a fresh start)Then, change **Settings**, **Poller Logging Level** to DEBUG for _one_ polling cycle. You may rename this log as well to avoid more stuff added to it with subsequent polling cycles. Now, clear //./log/cacti.log// (or rename it to get a fresh start)Then, change **Settings**, **Poller Logging Level** to DEBUG for _one_ polling cycle. You may rename this log as well to avoid more stuff added to it with subsequent polling cycles.
Line 40: Line 40:
 <code>php -q cmd.php <id> <id></code> <code>php -q cmd.php <id> <id></code>
  
-If you're using cactid, you may override logging level when calling the poller:+If you're using spine, you may override logging level when calling the poller:
  
-<code>./cactid --verbosity=5 <id> <id></code>+<code>./spine --verbosity=5 <id> <id></code>
  
 All output is printed to STDOUT in both cases. This procdure allows for repeated tests without waiting for the next polling interval. And there's no need to manually search for the failing host between hundreds of lines of output. All output is printed to STDOUT in both cases. This procdure allows for repeated tests without waiting for the next polling interval. And there's no need to manually search for the failing host between hundreds of lines of output.
  
-=== Check MySQL Update ===+==== Check Bulkwalk Behaviour (SNMP Data Queries only) ==== 
 + 
 +The goal of bulkwalks is to reduce SNMP traffic overhead. It works by cramming several SNMP requests/responses into a single IP packet. This feature is not available with SNMP version 1. Some SNMP enabled devices do not like snmpbulkwalks.  
 + 
 +Cacti supports this feature with SNMP enabled devices automatically when version 2 or version 3 has been selected. The field "max OIDs" for each hosts governs, how many packets are crammed together. Side note: In case too many SNMP packets will be crammed together, IP fragmentation takes care of splitting those into chunks manageable by the IP layer. 
 +You will see such an effect when e.g. your manual  
 +<code>snmpwalk -c <community string> -v 2c <target> <OID></code> 
 +produces a result but cacti poller output shows NaN. 
 + 
 +Now you have two different means to tackle such an issue: 
 +  - reduce "max OIDs" to 1: Cacti now will disable all Cacti-internal mechanisms to use snmpbulkwalk 
 +  - select SNMP version 1: as SNMP V1 does not support snmpbulkwalks, all Cacti-internal and Cacti-external bulkwalk mechanisms are disabled 
 + 
 +Discussion:\\  
 +**Cacti-internal bulkwalk mechanisms**: Spine checks the "max OIDs". In case they are set to a value higher than 1, we will use snmpbulkwalk-like code. Else, we use standard snmpwalks.\\  
 + 
 +**Cacti-external bulkwalk mechanisms**: It has been found, that php-snmp automatically uses snmpbulkwalk, even when only snmpwalk has been requested. As of current, php-snmp will join 20 requests/response. We can't change this setting externally. So the ultimate answer to this is to use SNMP version 1. The drawback of using SNMP version 1 is that e.g. COUNTER64 is **not** available with this setting. As a result, e.g. a **Verbose Query** from within the browser may fail while spine still works. Yes, crazy. 
 +==== Check MySQL Update ====
  
 In most cases, this step make be skipped. You may want to return to this step, if the next one fails (e.g. no rrdtool update to be found) In most cases, this step make be skipped. You may want to return to this step, if the next one fails (e.g. no rrdtool update to be found)
Line 52: Line 69:
 From debug log, please find the MySQL update statement for that host concerning table //poller_output//. On very rare occasions, this will fail. So please copy that sql statement and paste it to a mysql session started from cli. This may as well be done from some tool like phpMyAdmin. Check the sql return code. From debug log, please find the MySQL update statement for that host concerning table //poller_output//. On very rare occasions, this will fail. So please copy that sql statement and paste it to a mysql session started from cli. This may as well be done from some tool like phpMyAdmin. Check the sql return code.
  
-=== Check RRD File Update ===+==== Check RRD File Update ====
  
 Down in the same log, you should find some Down in the same log, you should find some
Line 62: Line 79:
 RRD files should be created by the poller. If it does not create them, it will not fill them either. If it does, please check your //Poller Cache// from Utilities and search for your target. Does the query show up here? RRD files should be created by the poller. If it does not create them, it will not fill them either. If it does, please check your //Poller Cache// from Utilities and search for your target. Does the query show up here?
  
-=== Check RRD File Ownership ===+==== Check RRD File Ownership ====
  
 If rrd files were created e.g. with root ownership, a poller running as cactiuser will not be able to update those files If rrd files were created e.g. with root ownership, a poller running as cactiuser will not be able to update those files
Line 78: Line 95:
 <code>chown cactiuser:cactiuser *.rrd</code> <code>chown cactiuser:cactiuser *.rrd</code>
  
-=== Check RRD File Numbers ===+==== Check RRD File Numbers ====
  
 You're perhaps wondering about this step, if the former was ok. But due to data sources MINIMUM and MAXIMUM definitions, it is possible, that valid updates for rrd files are suppressed, because MINIMUM was not reached or MAXIMUM was exceeded. You're perhaps wondering about this step, if the former was ok. But due to data sources MINIMUM and MAXIMUM definitions, it is possible, that valid updates for rrd files are suppressed, because MINIMUM was not reached or MAXIMUM was exceeded.
Line 105: Line 122:
 At this step, it is wise to check **step** and **heartbeat** of the rrd file as well. For standard 300 seconds polling intervals (step=300), it is wise to set **minimal_heartbeat** to 600 seconds. If a single update is missing and the next one occurs in less than 600 seconds from the last one, rrdtool will interpolate the missing update. Thus, gaps are "filled" automatically by interpolation. Be aware of the fact, that this is no "real" data! Again, this must be done in the Data Template itself and by using rrdtool tune for all existing rrd files of this type. At this step, it is wise to check **step** and **heartbeat** of the rrd file as well. For standard 300 seconds polling intervals (step=300), it is wise to set **minimal_heartbeat** to 600 seconds. If a single update is missing and the next one occurs in less than 600 seconds from the last one, rrdtool will interpolate the missing update. Thus, gaps are "filled" automatically by interpolation. Be aware of the fact, that this is no "real" data! Again, this must be done in the Data Template itself and by using rrdtool tune for all existing rrd files of this type.
  
-=== Check RRDTool Graph Statement ===+==== Check RRDTool Graph Statement ====
  
 Last resort would be to check, that the correct data sources are used. Goto //Graph Management// and select your Graph. Enable DEBUG Mode to find the whole //rrdtool graph// statement. You should notice the //DEF// statements. They specify the rrd file and data source to be used. You may check, that all of them are as wanted. Last resort would be to check, that the correct data sources are used. Goto //Graph Management// and select your Graph. Enable DEBUG Mode to find the whole //rrdtool graph// statement. You should notice the //DEF// statements. They specify the rrd file and data source to be used. You may check, that all of them are as wanted.
  
-=== Miscellaneous ===+==== Miscellaneous ====
  
 Up to cacti 0.8.6j, table //poller_output// may increase beyond reasonable size. Up to cacti 0.8.6j, table //poller_output// may increase beyond reasonable size.
Line 125: Line 142:
 From cacti 0.8.7 on, measures were taken on both issues (memory size, truncating poller_output). From cacti 0.8.7 on, measures were taken on both issues (memory size, truncating poller_output).
  
-=== RPM Installation? ===+==== RPM Installation? ====
  
 Most rpm installations will setup the crontab entry now. If you've followed the installation instructions to the letter (which you should always do ;-) ), you may now have two poller running. That's not a good thing, though. Most rpm installations will setup cron in ///etc/cron.d/cacti// Most rpm installations will setup the crontab entry now. If you've followed the installation instructions to the letter (which you should always do ;-) ), you may now have two poller running. That's not a good thing, though. Most rpm installations will setup cron in ///etc/cron.d/cacti//
Line 141: Line 158:
 */5 * * * *     /usr/bin/php -q /var/www/html/cacti/poller.php > /var/local/log/poller.log 2>&1</code> */5 * * * *     /usr/bin/php -q /var/www/html/cacti/poller.php > /var/local/log/poller.log 2>&1</code>
  
-=== Not NaN, but 0 (zero) values? ===+==== Not NaN, but 0 (zero) values? ====
  
 Pay attention to custom scripts. It is required, that external commands called from there are in the $PATH of the cactiuser running the poller. It is therefor recommended to provide **/full/path/to/external/command** Pay attention to custom scripts. It is required, that external commands called from there are in the $PATH of the cactiuser running the poller. It is therefor recommended to provide **/full/path/to/external/command**
Line 147: Line 164:
 User "criggie" reported an issue with running smartctl. It was complaining "you are not root" so a quick //chmod +s// on the script fixed that problem. User "criggie" reported an issue with running smartctl. It was complaining "you are not root" so a quick //chmod +s// on the script fixed that problem.
  
-Secondly, the script was taking several seconds to run. So cacti was logging a "U" for unparseable in the debug output, and was recording NAN. So my fix there was to make the script run faster - it has to complete in less than one second, and the age of my box made it difficult to accomplish.+Secondly, the script was taking several seconds to run. So cacti was logging a "U" for unparseable in the debug output, and was recording NAN. So my fix there was to make the script run faster, and the age of my box made it difficult to accomplish.  
 + 
 +The timeout setting is governed by "Settings -> Poller -> Script and Script Server Timeout Value". In general, it is recommended to make scripts faster to avoid that the poller does not finish in time.





Personal Tools