Notifications and alarms in AWS are critical as they provide you with real-time insight into your resource utilization. You can set these alarms and notifications to trigger when certain conditions or thresholds are met.

AWS CloudWatch is a service that monitors resources and applications, collects data, tracks metrics, setups alarms, and sends notifications. By effectively configuring and managing CloudWatch, an AWS SysOps Administrator can effectively troubleshoot systems and take corrective actions promptly.

For instance, CloudWatch can monitor CPU usage, disk reads and writes, and network traffic. If any of these metrics exceed appointed thresholds, CloudWatch sends an alarm, allowing you to act before the issue escalates.

Table of Contents

Taking Corrective Action with AWS

After CloudWatch detects an issue and sends an alarm, the next logical step is troubleshooting and resolving the problem. In AWS, several services enable SysOps Administrators to do this, including AWS Systems Manager OpsCenter and AWS Auto Scaling.

  • AWS Systems Manager OpsCenter: This presents a centralized dashboard where you can view, investigate, and resolve operational issues. It aggregates issues across your AWS accounts and regions, enabling you to resolve problems faster and cut downtime.
  • AWS Auto Scaling: In case an alarm is due to high demand causing resource strain, AWS Auto Scaling can automatically scale resources to meet the demand. This ensures smooth operation and prevents overloading of resources.

A Practical Example

Let’s look at an example where we monitor an Amazon EC2 instance and take corrective action when CPU usage exceeds 70%. To do this, we would:

  • Step 1 – Set an Alarm in CloudWatch: Within AWS CloudWatch, go to Alarms –> Create Alarm. Select the EC2 instance and set CPU Utilization > 70%. Define the email address for notification then name the alarm.

aws cloudwatch put-metric-alarm --alarm-name cpulimit --alarm-description "Alarm when CPU exceeds 70 percent" --metric-name CPUUtilization --namespace AWS/EC2 --statistic Average --period 300 --threshold 70 --comparison-operator GreaterThanThreshold --dimensions Name=InstanceId,Value=instance_id --evaluation-periods 2 --alarm-actions arn:aws:sns:us-east-1:123456789012:cpu_limit --unit Percent

  • Step 2 – Setup Corrective Action: After setting the alarm, next is to define a corrective action. In the EC2 dashboard, select the instance and go to Actions –> Monitoring and Troubleshooting –> Manage CloudWatch Alarms -> Add/Edit Alarms. Here you can add a condition like “Stop this instance” when CPU > 70%.

aws ec2 create-fleet --launch-template-configs file://templates-config.json --target-capacity-specification TotalTargetCapacity=20

As seen in the example, alarms and notifications in AWS provide an indispensable system of checks and balances. They ensure resources are used optimally, operational issues are detected early, and systems auto-adjust to fluctuating demand. The AWS Certified SysOps Administrator should have a profound understanding of how to set, manage, and leverage alarms and notifications to troubleshoot and take corrective actions in an AWS environment.

Practice Test

True/False: CloudWatch can collect monitoring and operational data in the form of logs, metrics, and events, providing a unified view of AWS resources, applications, and services that run on AWS.

  • True
  • False

Answer: True

Explanation: This is one of the main functionalities of AWS CloudWatch. It provides actionable insights to monitor applications, understand and respond to system-wide performance changes.

Multiple Select: Which of the following are correct ways to troubleshoot a CloudFormation stack failure?

  • a) Use AWS CloudFormation Drift Detection.
  • b) Check for any template errors.
  • c) Use AWS CloudFormation StackSets.
  • d) Review the event messages in the AWS CloudFormation console.

Answer: b, d

Explanation: Checking for template errors and reviewing event messages in the AWS CloudFormation console are correct methods for troubleshooting failures. Drift detection and StackSets perform different functions.

Multiple Select: Which of the following can AWS CloudWatch monitor?

  • a) Applications
  • b) Infrastructure
  • c) Customer feedback
  • d) Environment conditions

Answer: a, b

Explanation: AWS CloudWatch is designed to monitor applications and system-wide deployment on AWS and on-premises servers, not environmental conditions or customer feedback.

True/False: As part of the troubleshooting process, it’s not necessary to review billing for any unexpected charges.

  • True
  • False

Answer: False

Explanation: Reviewing billing information can reveal hidden issues like resources that are unexpectedly cost-intensive or still running when they should not be. This makes it an important part of troubleshooting.

Single Select: How does Amazon SNS deliver notifications?

  • a) Email
  • b) HTTP
  • c) AWS Lambda
  • d) All of the above

Answer: d

Explanation: Amazon SNS is flexible and can deliver notifications via email, HTTP endpoint, and even trigger AWS Lambda functions.

True/False: CloudWatch only supports the monitoring of AWS resources.

  • True
  • False

Answer: False

Explanation: AWS CloudWatch can also monitor on-premise resources in addition to AWS resources.

Single Select: If a custom CloudWatch metric is not available, what could be the reason?

  • a) The metric name is misspelled
  • b) The resources are not tagged properly
  • c) The custom namespace does not exist
  • d) All of the above

Answer: d

Explanation: All these issues – misspelling the metric name, improper tagging of resources, or non-existence of the custom namespace – can lead to a custom CloudWatch metric not being available.

True/False: AWS Trusted Advisor is a tool that can be used to troubleshoot notifications and alarms.

  • True
  • False

Answer: True

Explanation: AWS Trusted Advisor provides insights about AWS resources, helping to optimize performance, security, and reduce costs – which also aids in troubleshooting.

Single Select: What is the first step in troubleshooting EC2 instance connectivity issues?

  • a) Verify your VPC settings
  • b) Check instance CPU utilization
  • c) Inspect the instance status checks
  • d) Test your internet connection

Answer: c

Explanation: The first step in troubleshooting EC2 instance connectivity issues is checking the instance status checks.

True/False: The EC2 action can be set to automatically recover, stop, or terminate an instance when a system status check fails.

  • True
  • False

Answer: True

Explanation: These are some of the actions that can be automated in response to a system status check failure in EC

Interview Questions

In the Amazon RDS environment, what does an alarm signify?

In the Amazon RDS environment, an alarm represents a particular Amazon CloudWatch metric that is outside of the defined acceptable range.

How can you receive notifications about AWS CloudWatch alarms?

You can receive notifications about CloudWatch alarms through Amazon SNS (Simple Notification Service) or via email.

What service should you use to automatically stop EC2 instances when CPU utilization is below a certain threshold for a specified period of time?

The AWS CloudWatch alarms can be used to automatically stop or terminate Amazon EC2 instances when CPU utilization is below a certain threshold for a specified period of time.

What methods can be used to view Amazon RDS events?

Amazon RDS events can be viewed using the AWS Management Console, AWS CLI, or Amazon RDS API.

In AWS, what is the function of cloud trail service?

AWS CloudTrail enables governance, compliance, operational auditing, and risk auditing of your AWS account. It allows you to log, continuously monitor, and retain account activity related to actions across your AWS infrastructure.

What are the several notification types in AWS RDS?

The several notification types in AWS RDS include Backup, Configuration Change, Creation, Deletion, Failover, Failure, Low Storage, Maintenance, Recovery, and Restoration.

If a CloudWatch Alarm goes off, meaning a certain threshold has been crossed, what are some of the automatic responses that can be set up?

Automatic responses that can be set up in response to a CloudWatch alarm include sending a notification through Amazon SNS, performing automated actions on EC2 instances, scaling triggered actions through Auto Scaling, and invoking Lambda functions.

What is the role of the AWS Personal Health Dashboard?

The AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact your account.

How can you take corrective actions based on AWS CloudWatch alarms?

AWS CloudWatch alarms actions depends on the state of the alarm, they can be one of the Amazon EC2 Auto Scaling actions, stop, terminate, or reboot an instance, or send a notification to an Amazon SNS topic.

What is the purpose of event subscriptions in Amazon RDS?

Event subscriptions in Amazon RDS allow a user to get notifications via Amazon SNS for any event related to a DB instance, DB snapshot, DB parameter group, or security group.

What is the primary benefit of using the Amazon CloudTrail service?

The primary benefit of using Amazon CloudTrail is the ability to continuously monitor, log, and retain account activity related to actions across your AWS infrastructure for operational and security auditing.

Can a user in AWS get the status of an alarm on his email?

Yes, by using the Amazon Simple Notification Service (SNS), a user can receive notifications when an alarm status changes.

What types of actions can be taken by AWS CloudWatch Alarms?

AWS CloudWatch Alarm actions can either initiate Auto Scaling policies, stop, terminate, or reboot an EC2 instance, or send a formulated message to an SNS topic when various defined conditions are matched.

Is it possible for a user to view all the RDS DB events for the past week?

Yes, AWS allows users to view all the Amazon RDS events in the past one week using AWS Management Console, CLI, or RDS APIs.

In AWS CloudTrail, is it possible for log file validation to fail?

Yes, log file validation can fail in AWS CloudTrail if the log file was changed, deleted, or corrupted since CloudTrail delivered it. When log file validation fails, CloudTrail sends a notification to the SNS topic specified for the trail.

Leave a Reply

Your email address will not be published. Required fields are marked *