AWS : Automatically reboot the EC2 instance when number of database connections of the RDS reaches threshold.

Prince Arora
6 min readJan 29, 2020

--

A. Scenario:

  1. A website is using an EC2 instance to connect to an RDS instance.

2. CloudWatch is monitoring the metrics of the RDS instance ( number of database connections).

3. When the number of DB connections reaches a certain number, the website becomes unresponsive. Cloudwatch generates an alarm and sends an email notification.

4. Manually rebooting the EC2 instance brings the website back to normal.

B. Problem Statement:

Is it possible to use CloudWatch to automatically reboot the EC2 instance when the number of database connections of RDS reaches a certain value ?

C. High Level Solution:

High Level Solution

D. AWS Services used for this solution:

  1. EC2
  2. IAM
  3. SNS (Always Free Tier)
  4. SQS (Always Free Tier)
  5. Lambda (Always Free Tier)
  6. Cloudwatch (Always Free Tier)
  7. RDS
  8. Cloudtrail

E. Solution Summary:

Cloudwatch alarm configured on RDS DatabaseConnections metrics → RDS DB threshold reached → Cloudwatch alarm triggers → Alarm published to SNS topic → Message read by SQS queue → Lambda function gets triggered → Lambda excutes the python code to restart EC2 instance

F. Solution Steps:

Step 1: Create a SNS Topic.
Step 2: Create a SQS Queue.
Step 3: Subscribe SQS queue to SNS topic created in step 1.
Step 4: Create the lambda function to trigger EC2 reboot.
Step 5: Create EC2 restart policy and role to be assigned to lambda function.
Step 6: Assign role to lambda function.
Step 8. Add trigger to the lambda function via SQS
Step 9. Create Alarm in cloudwatch to trigger alert when DB connection threshold is reached and publish the alarm to topic.

G. Detailed Solution:

Step 1: Create a SNS Topic

Open SNS service and Click create topic
Enter topic name and description. Click create
Topic Created

Step 2: Create a SQS Queue

2.1 Open SNS service and get started now
2.2 Enter queue name and click create
2.3 Queue created

Step 3: Subscribe SQS queue to SNS topic created in step 1

3.1 Select the queue → Click Queue actions → Click Subscribe Queue to SNS Topic
Select the topic and click subscribe
3.2 Select the topic and click subscribe
3.3 Queue is subscribed to topic

Step 4: Create the lambda function to trigger EC2 reboot

4.1 Open Lambda Service and Click Create Function
4.2 Select author from scratch
4.3 Enter function name, select runtime as python and click create function
4.4 Lambda function created. Scroll down to edit the code.
4.5 Edit the python code for EC2 instance restart. Take care of the indentation in python code. Provide the region and instance id details of your EC2 instance (highlighted in yellow)

lambda_function.py

import boto3
import time
region = ‘xx-xxxxx-x’
instances = [‘i-xxxxxxxxxxxxxxx’]
ec2 = boto3.client(‘ec2’, region_name=region)
ec2res = boto3.resource(‘ec2’)
def lambda_handler(event, context):

for instance in ec2res.instances.all():
ec2_status = instance.state[‘Name’]

if ec2_status!=’stopped’:
ec2.stop_instances(InstanceIds=instances)

while ec2_status!=’stopped’:
print(‘ec2_status…’ + ec2_status)
time.sleep(10)

ec2.start_instances(InstanceIds=instances)
print(
“Id: {0}\nPlatform: {1}\nType: {2}\nPublic IPv4: {3}\nAMI: {4}\nState: {5}\nStatus: {6}\n”.format(
instance.id, instance.platform, instance.instance_type, instance.public_ip_address, instance.image.id, instance.state, instance.state[‘Name’]
)
)

4.6 EC2 instance id can be retrieved from EC2 dashboard

Step 5: Create EC2 restart policy and role to be assigned to lambda function

5.1 Go to IAM console and click create policy
5.2 Use the JSON text below and paste under the JSON tab to create the policy

{
“Version”: “2012–10–17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“logs:CreateLogGroup”,
“logs:CreateLogStream”,
“logs:PutLogEvents”
],
“Resource”: “arn:aws:logs:*:*:*”
},
{
“Effect”: “Allow”,
“Action”: [
“ec2:Start*”,
“ec2:Stop*”,
“ec2:Reboot*”
],
“Resource”: “*”
}
]
}

5.3 Provide policy name and click create policy
5.4 Policy created successfully
5.5 Go to roles and click on create role
5.6 Select lambda service and click next: permissions
5.7 Select the policy created in previous step and click next
5.8 Enter tags and click next
5.9 Enter role name and click create role

Step 6: Assign role to lambda function

6.1 Open the lambda function created in previous steps. Go to Execution role section. In the select existing role drop down, select the role created in previous steps for lambda function. In case the role doesn’t appear in the drop click the refresh button besides the drop down.

Step 7. Assign policy to role to allow lambda function read message from SQS

In clean solution, separate role be created for granting lambda function access to SQS. As a quick solution here, I have assigned the SQS policies to the same LambdaEC2Restart role we have created in previous steps.

7.1 Go to roles and search for LambdaEC2Restart role we created previously. Select the role.
7.2 Click on attach policies.
7.3 Assign the highlighted SQS policy to the role.

Step 8. Add trigger to the lambda function

8.1 Open the lambda function created in previous steps. Click on Add trigger.
8.2 Search and select SQS
8.3 Select the queue name we had created in previous steps — queue_db_alert
8.4 Now lambda function will listen to trigger from SQS

Step 9. Create Alarm to trigger alert when DB connection threshold is reached

9.1 Go to cloudwatch and click on alarm
9.2. Select meteric
9.3 Click RDS
9.4 Click Per-Database Metrics
9.5 Select DatabaseConnections Metric and click select metric
9.6 Select Statistic as Minimum
9.7 Specify the condition and threshold value required. For the demo Ihave set it 1
9.8 In configure action use the topic ARN and specify the ARN for the topic topic_db_alert we created in the previous steps
9.9 you get get the topic ARN details from the Topics dashboard
9.10 Provide the alert name and click next
9.11 Click create alarm

F. Testing the Solution:

  1. Connect to DB
Number of DB connections is 0 initially

Login to DB. Following command can be used from EC2 instance to connect to DB.

mysql -h <mysqldbhost> -P <dbport>-u <username>-p <password>

2. Check the DB connections on the RDS DB dashboard.

Number of DB connection count = 1
Create another DB connection from second instance. Number of DB connection count = 2

3. Check cloudwatch alert state

Alarm was in OK state
After number of DB connection = 2 , the threshold was reached and alarm was triggered

4. Verify SQS Queue

We can verify in the alert queue, the alarm published the alert on the topic which was read by the queue and message arrived on the queue.

5. Check EC2 instance status

EC2 was in running state
Lambda function got triggered , when the message arrived in the Queue. EC2 instance was shut down.
Lambda function is starting the EC2 instance now
EC2 instance is restarted

6. Cloudwatch alarm status after EC2 instance is restarted

The alarm is back to OK state.

7. RDS DB connection status

Number of DB connections is set to 0 again.

8. DB alert SNS queue status

No messages on queue. All messages were processed by lambda function

9. Cloudtrail logs

We can verify, the alarm was triggered due to db connection threshold reached

--

--

Prince Arora
Prince Arora

Written by Prince Arora

IT professional | Technology enthusiast | Cloud | AWS | Docker | Machine Learning

No responses yet