Notes on: Linux Academy: AWS CSAA: 20) Certified Solution Architect Concepts

Just a place to put some notes on the “AWS Certified Solutions Architect - Associate (New!)” course from https://linuxacademy.com

Implementation & Deployment

How to Design Cloud Services & Best Practices:

- Design for failure, and create self-healing application environments
- Always design applications with instances in at least two availability zones
- Guarantee that you have “reserved” capacity in the event of an emergency by purchasing reserved instances in a designated recovery availability zone (AWS does not guarantee on-demand instance capability)
- Rigorously test to find single points of failure and apply high availability
- Always enable RDS Multi-AZ and automated backups (InnoDB table support only for MySQL)
- Utilize Elastic IP addresses for fail over to “stand-by” instances when auto scaling and load balancing are not available

- Use Route 53 to implement failover DNS techniques that include:
-- Latency based routing
-- Failover DNS routing

- Have a disaster recovery and backup strategy that utilizes:
-- Multiple Regions
-- Maintain up to date AMI’s (and copy AMI’s from one region to another)
-- Copy EBS snapshots to other regions (use CRON jobs that take snapshots of EBS)
-- Automate everything in order to easily re-deploy resources in the event of a disaster
-- Utilize bootstrapping to quickly bring up new instances with minimal configuration and allows for “generic” AMI’s

- Decouple application components using services such as SQS (when available)
- “Throw away” old or broken instances
- Utilize CloudWatch to monitor infrastructure changes and health
- Utilize MultiPartUpload for S3 uploads (for objects over 100MB)
- Cache static content on Amazon CloudFront using EC2 or S3 Origins

- Protect your data in transit by using HTTPS/SSL endpoints
- Protect data at rest using encrypted file systems or EBS/S3 encryption options
- Connect to instances inside of the VPC using a bastion host or VPN connection
- Use IAM roles on EC2 instances instead of using API keys (Never store API keys on an AMI)

Monitoring you AWS Environment

Use CloudWatch for:
- Shutting down inactive instances
- Monitoring changes in your AWS environment with CloudTrail integration
- Monitor instance resources and create alarms based off of usage and availability
-- EC2 instances have “basic” monitoring which CloudWatch supports out of the box, and includes all metrics that can be monitored at a hypervisor level.
-- Status checks which can automate recovery of failed status checks by stopping and starting the instance again
-- EC2 metrics that include custom scripts to work with CloudWatch
--- Disk Usage: Available Disk Space
--- Swap Usage: Available swap
--- Memory Usage: Available Memory

Use CloudTrail for:
- Security and compliance
- Monitoring all actions taken against the AWS account
- Monitoring (and being notified) of changes to IAM accounts (with CloudWatch/SNS Integration)
- View what API Keys/Users performed any given API action against an environment (i.e. view what user terminated a set of instances or an individual instance)
- Fulfilling auditing requirements inside of organizations

Use AWS Config for:
- Receiving detailed configuration information about an AWS environment
- Taking a point in time “snapshot” of all supported AWS resources to determine the state of your environment
- Viewing historical configurations within your environment by viewing the “snapshots”
- Receiving notifications whenever resources are created, modified, or deleted
- Viewing relationships between resources, i.e. what EC2 instances and EBS volume is attached to

Architectural Trade-off Decisions:

Storage Trade-off Options
- S3 Standard Storage
-- 99.999999999% durability and 99.99% availability, but is the most expensive
- S3 RRS
-- Reduced redundancy durability is 99.99%, but the storage cost is cheaper
-- Should be used for easily reproducible data, and you should take advantage of lost object notification using S3 events
- Glacier
-- Requires an extended timeframe to check-in and check-out data from archiving
-- Costs are significantly reduced compared to S3 storage options

Database Trade-Off Options

- Running databases on EC2 instances:
-- Have to manage the underlying operating system
-- Have to build for high availability
-- Have to apply your own backups
-- Can use additional software to cluster MySQL
-- Requires more time to manage than RDS

- Managed RDS database provides:
-- Fully managed database updates and does not require managing of the underlying OS
-- Provides automatic point in time backups
-- Easily enable Multi-AZ failover, and when a failover occurs the DNS is switched from the primary instance to the standby instance
-- If Multi-AZ is enabled then backups are taken against the stand-by to reduce I/O freezes and updates are applied to the standby which is then switched to the primary
-- Easily create read replicas

Elasticity and Scalability:

- Proactive Cycle Scaling: Scaling that occurs at a fixed interval
- Proactive Event-based scaling: Scaling that occurs in anticipation of an event
- Auto-scaling based on demand: Scaling that occurs based off of increase in demand for the application

- Plan to scale out rather than up (horizontal scaling):
-- Add more EC2 instances to handle increases in capacity rather than increasing instance size
-- Be sure to design for the proper instance size to start
-- Use tools like Auto Scaling and ELB
-- A scaled service should be fault tolerant and operationally efficient
-- Scalable service should become more cost effective as it grows

- DynamoDB is a fully managed NoSQL service from AWS:
-- With high availability and scaling already built in
-- All the developer has to do is specify required throughput for the tables

- RDS requires scaling in a few different ways:
-- RDS does not support a cluster of instances to load balance traffic across
-- Because of this there are a few different methods to scale traffic with RDS:
--- Utilize read replicas to offload heavy read only traffic
--- Increase the instance size to handle increase in load
--- Utilize ElastiCache clusters for caching database session information

Security Architecture with AWS

Shared Security Responsibility Model:

- AWS is responsible for portions of the cloud, and you as the customer have portions of the cloud that you are responsible for - thus creating shared security responsibility

- Reduces the operational burden (on you) as AWS operates, manages, and controls the components from the host operating system and virtualization layer, down to the physical security of the facilities in which the services operate

- As the customer (you), using AWS means you assume the responsibility and management of the guest operating system (including updates and security patches), other associated applications software, as well as the configuration of the AWS-provided security group firewall.

- You are also responsible for your owo coded applications and custom applications built on top of the cloud

AWS is responsible for (EC2 example)
- Facilities
- Physical security of hardware
- Network infrastructure
- Virtualization infrastructure

You (as the customer) are responsible for (EC2 example)
- Amazon Machine Images (AMIs)
- Operating systems
- Applications
- Data-in-transit
- Data-at-rest
- Data stores
- Credentials
- Policies and configuration

AWS Platform Compliance and Security Services:

The AWS cloud infrastructure has been architected to be flexible and secure with world-class protection, by using its built-in security features:
- Secure access - Use API endpoints, HTTPS, and SSL/TLS
- Built-in firewalls - Virtual Private Cloud (VPC)
- Unique users - AWS Identity and Access Management (IAM)
- Multi-factor authentication (MFA)
- Private subnets - AWS allowing private subnets on your VPC
- Encrypted data storage - Encrypt your data in EBS, S3, Glacier, Redshift, and SQL RDS
- Dedicated connection option - AWS Direct Connect
- Perfect Forward Secrecy - ELB and CloudFront offer SSL/TLS cipher suites for PFS
- Security logs - AWS CloudTrail
- Asset identification and configuration - AWS Config
- Centralized key management - Centralized key management service
- Isolated GovCloud - US ITAR regulations using AWS GovCloud
- CloudHSM - Hardware Security Model (HSM) hardware based cryptographic storage
- Trusted Advisor - With premier support (identify security holes)

Incorporating Common Conventional Security Products:

OS-side Firewalls
- IPTABLES
- FirewallD
- Windows Firewall

AntiVirus Software
- TrendMicro (integrates into AWS EC2 instances)

DDoS Mitigation:

When mitigating against DOS/DDOS attacks, use the same practice you would use on your on-premise components:
- Firewalls:
-- Security groups
-- Network access control lists
-- Host-based firewalls
- Web application firewalls (WAFS)
- Host-based or inline IDS/IPS (Trend Micro)
- Traffic sharing/rate limiting

Along with your traditional approaches for DOS/DDOS attack mitigation, AWS provides capabilities based on its elasticity:
- You can potentially use CloudFront to absorb DOS/DDOS flooding attacks
- A potential attacker trying to attack content behind a CloudFront distribution is likely to send most requests to CloudFront edge locations, where the AWS infrastructure will absorb the extra request with minimal to no impact on the back-end customer web servers

We MUST have permission to do Port Scanning on any of your EC2 instances!
INGRESS filtering on all incoming traffic onto their network

Encryption Solutions:

S3 has built-in features that allow you to encrypt your data:
- AES-256 bit encryption that encrypts data-at-REST in an S3 bucket
- AWS will decrypt the data and send it to you when you download it

EBS encrypted volumes:
- You can select to have all data encrypted that is stored on an EBS volume
- If a snapshot is taken, that snapshot is automatically encrypted

RDS encryption:
- Aurora, MySQL, Oracle, PostgreSQL, and MS SQL all support this feature
- Encrypts the underlying storage space for the instance
- Automated Backups are encrypted (as well as snapshots)
- Read Replicas are encrypted
- RDS provides SSL endpoint to encrypt a connection to a DB instance

Complex Access Control:

- Through IAM policies, AWS gives us the ability to create extremely complex and granular permission policies for our users (all the way down to the resource level)
- IAM policies with resource level permissions:
-- EC2: Create permissions for instances such as reboot, start, stop, or terminate based all the way down to the instance ID
-- EBS volumes: Attach, Delete, Detach
-- EC2 actions that are not one of these above are not governed by resource-level at this time
- This is not EC2 limited, can also include services such as RDS, S3, etc.

- Additional security measures, such as MFA authentication are also available when acting on certain resources:
-- For example, you can require MFA before an API request to delete an object within an S3 bucket

CloudWatch for the Security Architect:

CloudWatch Security
- Requests are signed with a HMAC-SHA1 signature, calculated from the request and the user’s private key
- CloudWatch control API is only accessible via SSL encrypted endpoints
- CloudWatch access is given via IAM permission policies, essentially only giving users permissions that are needed (only give access to CloudWatch if they need access to CloudWatch)
- Use CloudWatch and CloudTrail to monitor changes inside the AWS environment
-- We can ask CloudWatch to notify us (via SNS) if there have been changes for example:
--- Changes to IAM security credentials
--- Assigning access policies to users
--- Adding/deleting users
- It is important to know how we can use CloudWatch for security in our AWS environment

CloudHSM:

- HSM (Hardware Security Module) is a dedicated physical machine/appliance isolated in order to store security keys and other types of encryption keys used within an application
- The key is used within the domain of the HSM appliance instead of being exposed outside the appliance

- HSM Appliances have special security mechanisms to make them more secure:
-- The security key is only used within the HSM
-- A HSM client is used to expose the APIs of the HSM
-- So an application can communicate with HSM to do the encryption (or decryption) of the data that we are requesting
-- The appliance is physically isolated from other resources
-- Tamper resistant (built to notify via advanced logging)
-- On AWS, even though they are hosting the appliance, AWS engineers have NO access to the keys (only to manage and update the appliance)
-- If the keys are lost or reset (to access the appliance) you will never be able to access the data stored on the appliance

- Some types of keys that might be stored on HSMs:
-- Keys used to encrypt file systems
-- Keys used to encrypt databases
-- Keys used to provide DRM
-- Used with S3 encryption

- When to use CloudHSM instead of something like Key Management Service?
-- Generally, compliance requirements require it or internal security policy require it
-- Not even AWS engineers have access to the keys on the CloudHSM appliance, only access to “manage” the appliance

Disaster Recovery

Disaster Recovery:

Business disaster recovery key words (very important for AWS CSA Exam)

Recovery time objective (RTO): Time it takes after a disruption to restore operations back to its regular service level, as defined by the companies operational level agreement (i.e. if the RTO is 4 hours, you have 4 hours to restore the service back to an acceptable level)

Recovery point objective (RPO): Acceptable amount of data loss measured in time (i.e. if the system goes down at 10pm, and RPO is 2 hours, then you should recover all data as part of the application as it was before 8PM)

Not only should you design for disaster recovery for your applications running on AWS, you can also use AWS as a disaster recovery solution for your on-premise applications or data. The AWS services used should be determined based off of the business RTO and RPO operational agreement.

Pilot light: A minimal version of your production environment that is running in AWS. This allows for replication from on-premise servers to AWS, and in the event of a disaster the AWS environment spins up more capacity (elastically/automatically) and DNS is switched from on-premise to AWS. It is important to keep up to date AMI and instance configurations if following pilot light protocol.

Warm Standby: Has a larger foot print than a pilot light setup, and would most likely be running business critical applications in “standby”. This type of configuration could also be used as a test area for applications.

Multi-Site Solution: Essentially clones your “production” environment, which can either be in the cloud or on premise. Has an active-active configuration which means instances' size and capacity are all running in full standby and can easily convert at the flip of a switch. Methods like this could also be used to “load balanace” using latency based routing or Route 53 failover in the event of an issue.

Services Examples:
- Elastic Load Balancer and Auto Scaling
- Amazon EC2 VM Import Connector
- AMI’s with up to date configurations
- Replication from on-premise database servers to RDS
- Automate the increasing of resources in the event of a disaster
- Use AWS Import/Export to copy large amounts of data to speed up replication times (also used for off-site archiving)
- Route 53 DNS Failover/Latency Based Routing Solutions
- Storage Gateway (Gateway-cached volumes/Gateway-stored volumes)

Quiz: Certified Solutions Architect Concepts Quiz

T: When designing for elasticity and scalability, you want to strive for scaling out (adding more instances) instead of scaling up (increasing instance sizes). However, you must make sure you start with the proper instance size.

Q: What best describes Recovery Time Objective (RTO)?
A: The time it takes after a disruption to restore operations back to its regular service level.
E: The Recovery Time Objective (RTO) is the time it takes after a disruption to restore operations back to its regular service level (as defined by a company's operational level agreement).

Q: What service is best for logging all actions taken against the AWS API?
A: CloudTrail
E: Cloudtrail is AWS's logging service that can be used to log all actions taken inside your AWS account.

Q: In the shared security responsibility model, what are items that you are responsible for managing? (choose all that apply)
A: Guest operating systems, AMIs
E: AWS is responsible for everything physical. That includes the security of the physical hardware at their data centers and their network infrastructure. You are responsible for selecting and managing the security for AMI and the OS you install on instances.

T: S3 offers 256-bit encryption for data-at-rest.
E: S3 offers 256-bit encryption for data-at-rest, which is an option you can turn on/off. AWS manages the keys and will decrypt the data when you request to download it.

Q: When designing cloud services, what design elements should you always consider? (select all that apply)
A1: Design for failure
A2: Create self-healing application environments
A3: Decouple applications
E: When designing cloud architecture, you always want to start by designing for failure, and create self-healing whenever possible. Decoupling your application is also best practice. However, you should always use a MIN of TWO Availability Zones. Only using one Availability Zone does not allow for high availability.

Q: What AWS service, if used as part of your application's architecture, has an added benefit of helping to mitigate DDoS attacks from hitting your back-end instances?
A: CloudFront
E: When CloudFront is used as part of your application's architecture, traffic from a DDoS attack will most likely be redirected to the cached data at an edge location (instead of being routed to your applications EC2 instances).

Q: Perfect Forward Secrecy is used to offer SSL/TLS cipher suites for which two AWS services?
A1: Cloudfront
A2: Elastic Load Balancing

Q: What feature should you utilize for redundancy if auto scaling and load balancing are not available?
A: Elastic IP address set up for failover to "stand-by" instances
E: Setting up an Elastic IP address and having it ready for failover is a great solution when other services that provide high availability and fault tolerance are not available.

Q: What best describes CloudHSM?
A: A dedicated appliance that is used to store security keys
E: CloudHSM (which is not a feature specific to AWS) is a dedicated appliance that is used to store security keys.

Q: What is it called when you have a minimal version of your production environment running (which can be easily increased in size) as a disaster recovery solution?
A: Pilot light
E: A pilot light is the practice of having a minimally active version of of your environment set up and running in a separate region. If there is catastrophic failure on your primary environment, you can quickly spin up the pilot light environment to become your primary environment.

Comments