One problem that we encountered during development was a memory leak from PHP. To prevent another downtime (or at least help predict when the servers are about to go offline), I decided to use Server Density to implement an early warning system that would give us a heads up when the server's memory is about to run out. Server Density is a pretty great service. We've been using it for quite some time now. Server Density works by letting you install an agent inside of your servers, and the agent will then continuously push metrics to their server. Server Density then collects all these logs and displays them into nice graphs. If a metric crosses the threshold in one of your conditions (i.e., if the CPU load is >= 90% or disk space is <= 5GB), Server Density then sends you notification via e-mail, SMS, and Slack.
The Server Density agent can be installed during each EB deployment by executing this inside your deployment script at .ebextensions/:
curl https://archive.serverdensity.com/agent-install.sh | bash -s -- -a ACCOUNTNAME -t PUTYOURKEYHERE -g GroupNameForYourEB -p amazon -i
Determining the amount of free memory in Linux is somewhat tricky. I can't use the memory usage metric to trigger a notification because Linux always shows that nearly all of the memory is being used, even though in reality they're just being used as cache. To get a more reliable metric, I used swap space instead. The theory is that the instances are supposed to have enough RAM for the tasks; when memory runs out, Linux uses the swap space as a last resort. Thus, we can check if swap space is >= 1MB to trigger a notification.
Next problem: since EB regularly deploys and terminates the instances, Server Density ends up monitoring servers that no longer exist. We needed a way to automatically stop Server Density from monitoring instances that have been terminated.
Solution: I made a CloudWatch rule that triggers whenever instances are stopped or terminated. The events are then pushed to a Lambda function which calls Server Density's API to remove the monitoring.
Here's the architecture that I came up with:
I think CloudWatch has a way to monitor the swap space, but the last time I checked, AWS SNS (a separate AWS service that sends notifications) can't send SMS messages to Philippine numbers so I can't wake up (not joking, unfortunately haha) whenever there are server problems.
Update: Turns out that Linux' default swappiness value is 60, which means that it will use swap ahead of time even though around half (or 80%? the docs have conflicting calculations) of the RAM is still available. To avoid this situation, set the swappiness to 1. You can even set it to 0 if you want:
sysctl vm.swappiness=1
No comments:
Post a Comment