Notes on: Linux Academy: AWS CSAA: 16) Monitoring

Just a place to put some notes on the “AWS Certified Solutions Architect - Associate (New!)” course from https://linuxacademy.com

Monitoring

Image: Architecture: Monitoring Services

CloudWatch

Image: Architecture: CloudWatch

CloudWatch Essentials:

- CloudWatch is used to monitor AWS services, such as EC2, ELB and S3
- You monitor your environment by configuring and viewing CloudWatch metrics
- Metrics are specific to each AWS service or resource, and include such metrics as:
-- EC2 per-instance metrics:
--- CPUUtilization
--- CPUCreditUsage
-- S3 Metrics:
--- NumberOfObjects
--- BucketSizeBytes
-- ELB Metrics:
--- RequestCount
--- UnhealthyHostCount

- Detailed vs. Basic level monitoring:
-- Basic: Data is available automatically in 5-minute periods at no charge
-- Detailed: Data is available in 1-minute periods

- CloudWatch Alarms can be created to trigger alerts (or other actions in your AWS accounts, such as an SNS topic), based on threshold you set on CloudWatch metrics
- Auto Scaling heavily utilizes CloudWatch - relying on threshold and alarms to trigger the addition (or removal) of instances from an auto scaling group

CloudWatch Alarms:

- CloudWatch Alarms allow for you (or the system admin) to be notified when certain defined thresholds are met on CloudWatch Metrics
- For example, you can setup an alarm to be triggered whenever the CPUUtilization metric on an EC2 instance goes above 70%
- Alarms can also be used to trigger other events in AWS like publishing to an SNS topic or triggering auto scaling

CloudWatch EC2 Monitoring:

System Status Checks: (things that are outside of our control)
- Loss of network connectivity
- Loss of system power
- Software issues on the physical host
- Hardware issues on the physical host
- How to solve: Generally stopping and restarting the instance will fix the issue. This causes the instance to launch on a different physical hardware device.

Instance Status Checks: (software issues that we do control)
- Failed system status checks
- Misconfigured networking or startup configuration
- Exhausted memory
- Corrupted file system
- Incompatible kernel
- How to solve: Generally a reboot, or solving the file system configuration issue.

By default, CloudWatch will automatically monitor metrics that can be viewed at the host level (NOT the software level), such as:
- CPUUtilization
- Network in/out
- CPUCreditBalance
- CPUCreditUsage

OS level metrics that require a third party script (perl) to be installed (provided by AWS)
- Memory utilization, memory used, and memory available
- Disk Swap utilization
- Disk space utilization, disk space used, disk space available

CloudTrail

Image: Architecture: CloudTrail

CloudTrail Essentials:

- CloudTrail is an API logging service that logs all API calls made to AWS
- It does not matter if the API calls from the command line, SDK, or console
- All created logs are placed into a designated S3 bucket - so they are highly available by default
- Cloudtrail logs help when addressing security concerns, by allowing you to view what actions users on your AWS account have performed
- Since AWS is just one big API - CloudTrail can log every single action taken in your account

Flow Logs

VPC Flow Logs:

- VPC Flow Logs allow you to collect information about the IP traffic going to and from network interfaces in your VPC
- VPC Flow Log data is stored in a log group in CloudWatch
- Flow logs can be created on a specific VPC, Subnet or Network interface
- Flow logs created on a VPC or Subnet will include all network interfaces in that VPC or subnet
- Each network interface will have its own unique log stream
- You can set the log to capture data on accepted traffic, rejected traffic, or all traffic
- Flow logs are NOT captured in “real-time”. The capture window is approx. 10 minutes, and then data is published
- VPC Flow Logs consist of network traffic for a specific 5-tuple
- A 5-tuple is a set of five different values that comprise a TCP/IP connection. It includes:
-- (1) Source IP address and (2) source port number
-- (3) Destination IP address and (4) destination port number
-- (5) Protocol

Benefits of VPC Flow Logs:
- Troubleshoot why certain traffic is not reaching an EC2 instance
- An added security layer by allowing you to monitor the traffic that reaches your EC2 instances

Limitations of VPC Flow Logs:
- Traffic NOT captured by VPC Flow Logs:
-- Traffic between an EC2 instance and an Amazon DNS Server
-- Traffic generated by request for instance metadata (request to 169.254.169.254)
-- DHCP Traffic


Quiz: Monitoring Quiz

T: CloudWatch is a service that allows you to view resource level metrics and create alarms based on metric thresholds.

Q: Why does stopping and starting an instance (usually) fix a System Status Check error?
A: Stopping and starting an instance causes the instance to be provisioned on different AWS hardware.
E: Unless you have dedicated tenancy enabled, stopping and starting an instance will generally cause it to be launched onto different AWS host hardware.

Q: CloudTrail can log API calls from?
E: AWS is basically one big API call, so it does not matter if the API calls from the command line, SDK, or console, they are all logged by CloudTrail.

Q: Which of the following CloudWatch EC2 metrics will require a custom script to enable?
A: Memory Utilization
E: Custom scripts are needed to enable OS-level monitoring of EC2 instances. Memory Utilization falls into that category, while CPU Credit Usage and Utilization does not (those are host-level metrics).

T: System Status Checks are AWS hardware/software issues that we have no control over.
T: CloudTrail is an API Logging service.

Comments