A good application monitoring alerting system significantly increases the reliability and reduces the amount of work that needs to be done to operate a client server– or cloud– application system. Without a good monitoring system, administrators must rely on hit or miss spot-checking of logs and available metrics or instrumentation. Without good monitoring, there is no way to be sure that the system is healthy. It is true that a good administrator should and will continue to spot check logs and other instrumentation even with good monitoring in place, but without an effective monitoring and alerting rig, there is no way to know if the application is working properly. Set aside 24/7, at any given minute there are just too many things for an application team to watch: transaction service response time, mail queue depth, transaction queue depth, replication delay, etc, etc, etc. on an application level, not to mention all the standard checks the OS team already should be monitoring like load average, disk space, and server room temperature.

Some monitoring tips:

  • Use something that can easily be updated with rapidly designed custom scripts. You don’t want to have to wait a year to get a simple custom check built. This should take a matter of hours or days for most custom developed service checks.
  • Audit your system regularly. It’s not the end-product, but the process that matters most here. Make a spreadsheet, group your servers by hostgroup, and then list hostgroups as rows and in the columns list the different types of alerts you would need for each hostgroup. Put an X in the boxes where you have monitor service checks in place. Where you don’t have X’s is your gaps. Try to fill all the gaps. Repeat every 3 to 6 months.
  • Dashboards are useful for many things. Some data, like connections per hour over time, for example, is understood much better when plotted on a simple line chart. Also a dashboard just might light up service checks that are OK in green and ones that are not in yellow or orange and/or red.

Good alerting ensures that your staff doesn’t get strain and stress from irrelevant or nonsensical alert messages. Attention is a finite commodity and you want to use it where it counts, and not waste it on where it doesn’t. Some good practices to follow on alerting:

  • Buy an SMS sending device — you can use an old mobile phone for this.
  • Instead of your own SMS sending device, you can use an email to SMS gateway.
  • Ensure a high probability of a problem before an alert is sent. This is possible by fine tuning the alerting system. There are several controls over the alerting mechanism in most alerting systems.
  • Connect your laptop to a dawn simulation alarm system, backed by a rigged ping pong gun to waken sleepy staff.  Most people will wake with a dawn simulation system. Few people can sleep through a ping-pong storm.
  • Adrenalin in the middle of the night on a sustained basis is actually not very healthy. Another tip would be to make sure that the humans in your system practice stress management techniques like good exercise and as regular sleep as possible. The healthiest team always wins.

© Copyright 2020 Rex Consulting, Inc. – All rights reserved