Metrics/Monitoring

This is a method to track the health of a system, service, or application. Ideally, alerting can be configured to warn users/developers of issues with the services before they become a critical problem. Instead of a Severity 1, complete application outage due to database storage being full, an alert can be set to warn when the storage reaches 80% capacity and the teams can jump in to increase size or clean up space if needed.

Monitoring and metrics are about prevention of outages. Utilzing the tools and alerting properly can mean the difference between happy end users and a loss of revenue. Metrics can be configured to track trends in services or applications. If an application sees a surge of users on certain days, a metric dashboard can help pinpoint what is driving the increase of activity.

Tracability is also important. Some monitoring solutions can help debug slow systems or bottlenecks. If a database query is taking particularly long, the monitoring solution can identify it in a lower environment and alert to the team that it took x number of seconds to complete. This leads to better quality code and faster turnarounds to fix issues for end users.