Logging helps for understanding how the system works.
(Google Stackdriver works well here).
Monitoring
You can hook up a monitoring system to slack, and have it graph
times where your system is unresponsive, or resulting in more 4XX
responses, or 5XX responses than the norm.