AWS : Automatically reboot the EC2 instance when number of database connections of the RDS reaches threshold.
A. Scenario:
- A website is using an EC2 instance to connect to an RDS instance.
2. CloudWatch is monitoring the metrics of the RDS instance ( number of database connections).
3. When the number of DB connections reaches a certain number, the website becomes unresponsive. Cloudwatch generates an alarm and sends an email notification.
4. Manually rebooting the EC2 instance brings the website back to normal.
B. Problem Statement:
Is it possible to use CloudWatch to automatically reboot the EC2 instance when the number of database connections of RDS reaches a certain value ?
C. High Level Solution:
D. AWS Services used for this solution:
- EC2
- IAM
- SNS (Always Free Tier)
- SQS (Always Free Tier)
- Lambda (Always Free Tier)
- Cloudwatch (Always Free Tier)
- RDS
- Cloudtrail
E. Solution Summary:
Cloudwatch alarm configured on RDS DatabaseConnections metrics → RDS DB threshold reached → Cloudwatch alarm triggers → Alarm published to SNS topic → Message read by SQS queue → Lambda function gets triggered → Lambda excutes the python code to restart EC2 instance
F. Solution Steps:
Step 1: Create a SNS Topic.
Step 2: Create a SQS Queue.
Step 3: Subscribe SQS queue to SNS topic created in step 1.
Step 4: Create the lambda function to trigger EC2 reboot.
Step 5: Create EC2 restart policy and role to be assigned to lambda function.
Step 6: Assign role to lambda function.
Step 8. Add trigger to the lambda function via SQS
Step 9. Create Alarm in cloudwatch to trigger alert when DB connection threshold is reached and publish the alarm to topic.
G. Detailed Solution:
Step 1: Create a SNS Topic
Step 2: Create a SQS Queue
Step 3: Subscribe SQS queue to SNS topic created in step 1
Step 4: Create the lambda function to trigger EC2 reboot
lambda_function.py
import boto3
import time
region = ‘xx-xxxxx-x’
instances = [‘i-xxxxxxxxxxxxxxx’]
ec2 = boto3.client(‘ec2’, region_name=region)
ec2res = boto3.resource(‘ec2’)
def lambda_handler(event, context):
for instance in ec2res.instances.all():
ec2_status = instance.state[‘Name’]
if ec2_status!=’stopped’:
ec2.stop_instances(InstanceIds=instances)
while ec2_status!=’stopped’:
print(‘ec2_status…’ + ec2_status)
time.sleep(10)
ec2.start_instances(InstanceIds=instances)
print(
“Id: {0}\nPlatform: {1}\nType: {2}\nPublic IPv4: {3}\nAMI: {4}\nState: {5}\nStatus: {6}\n”.format(
instance.id, instance.platform, instance.instance_type, instance.public_ip_address, instance.image.id, instance.state, instance.state[‘Name’]
)
)
Step 5: Create EC2 restart policy and role to be assigned to lambda function
{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“logs:CreateLogGroup”,
“logs:CreateLogStream”,
“logs:PutLogEvents”
],
“Resource”: “arn:aws:logs:*:*:*”
},
{
“Effect”: “Allow”,
“Action”: [
“ec2:Start*”,
“ec2:Stop*”,
“ec2:Reboot*”
],
“Resource”: “*”
}
]
}
Step 6: Assign role to lambda function
Step 7. Assign policy to role to allow lambda function read message from SQS
In clean solution, separate role be created for granting lambda function access to SQS. As a quick solution here, I have assigned the SQS policies to the same LambdaEC2Restart role we have created in previous steps.
Step 8. Add trigger to the lambda function
Step 9. Create Alarm to trigger alert when DB connection threshold is reached
F. Testing the Solution:
- Connect to DB
Login to DB. Following command can be used from EC2 instance to connect to DB.
mysql -h <mysqldbhost> -P <dbport>-u <username>-p <password>
2. Check the DB connections on the RDS DB dashboard.
3. Check cloudwatch alert state
4. Verify SQS Queue
We can verify in the alert queue, the alarm published the alert on the topic which was read by the queue and message arrived on the queue.
5. Check EC2 instance status
6. Cloudwatch alarm status after EC2 instance is restarted
7. RDS DB connection status
8. DB alert SNS queue status
9. Cloudtrail logs