AWS CSA Notes '22: Part 1 of 8 - Storage (+ Introduction)

Introduction

A few notes I'm compiling to hopefully help me pass the Amazon AWS Certified Solutions Architect – Associate (SAA-CO2) exam - CSAA (65 questions, 180 mins.) I passed this a few years ago (2017), now I need to get it again, and - unsurprisingly - there have been some new things in the last 5 years.

Note: SAA-CO2 actually expires on August 29, 2022!

Deep Dive into AWS Solution Architect Exam Domains

1 Storage

1.1 File Storage

1.1.1 Amazon Elastic File System (EFS)

EFS can be created in two ways:

  1. Standard storage classes: Store data across and within multiple availability zones.
  2. One Zone storage classes: Store data in a single availability zone (less expensive.)
Standard classes are subdivided into:
  • Amazon EFS Standard-Infrequent Access (EFS Standard-IA)
  • Amazon EFS Standard
One Zone classes are subdivided into:
  • Amazon EFS One Zone-Infrequent Access (EFS One Zone-IA)
  • Amazon EFS One Zone
Which makes 4 EBS storage classes.
EFS enables access to high levels of IOPS and throughput with consistently low latencies.
EFS supports a broad spectrum of use cases: machine learning, application development, testing, database backups, serverless and persistent file storage for containers.

1.1.2 Amazon FSx for Lustre

Target for high-performance computing and financial modelling applications: electronic design optimization, machine learning, video processing, and financial modelling.

Offers: millions of IOPs, up to 100s of GBs per second throughput, sub-millisecond latencies, concurrent access to the same directory from 1000s of EC2 instances.

Is compatible with popular Linux-based AMIs: CentOS, SUSE, Linux, Ubuntu, Red Hat...

1.1.3 Amazon FSx for Windows File Server

  • Accessible over SMB.
  • Use for highly available Microsoft SQL server deployments, lift-and-shift Windows applications, data analytics, and media workflows.
  • Create a fully managed file system across multiple or single AZs.
  • Support for automatic durable backups.

1.1.4 Amazon FSx for NetApp ONTAP

  • Supports multiple GBps of throughput per file system and multi-protocol access to data using iSCSI, SMB and NFS protocols.
  • Supports SnapMirror replication, and on-premises caching solutions: NetApp FlexCache and Global File Cache.
  • Provides fully elastic, low-cost, unlimited storage capacity.
  • Provides access to ONTAP features such as replication (SnapMirror), clones (FlexClone), and snapshots.
  • Has high durability and availability with multi-AZ deployment options.
  • Offers shared storage for up to 1000s of clients running in Amazon EKS, Amazon EC2, Amazon WorkSpaces, VMware Cloud on AWS, Amazon AppStream 2.0 instances, and Amazon ECS
  • Supports petabyte-scale data sets in a single namespace, data deduplication, compression, and compaction to reduce storage consumption costs.
  • Features elastic capacity pool tiering: two storage system options:
    • primary storage
    • capacity pool storage
  • Supports the following data security and protection features:
    • File access auditing
    • On-demand anti-virus scanning
    • Encryption of data in transit using Kerberos for SMB and NFS
    • Automatic encryption of file system backups and data at rest using the keys managed in AWS KMS
    • Authorization and authentication using Active Directory

1.1.5 Amazon FSx for OpenZFS

  • Provides (for frequently accessed cached data) 12.5 GB/s throughput and up to 1 million IOPS, with support for up to 4 GB/s and 160'000 IOPS.
  • Further improve throughput by enabling data compression on your file system.
  • Supports thousands of simultaneous access points from different clients.
  • Supports Zstandard compression technologies to help you reduce storage costs.
  • Supports NFS v3, 4, 4.1 and 4.2.
  • [With AWS Transit Gateway or VPC peering] is able to access file systems from another VPC irrespective of the AWS region as well as from your on-premise DC via AWS Direct Connect or VPN.
  • Automatically backs up your file-system to AWS S3 for disaster recovery.
  • When accessed from supported EC2 instances, it securely encrypts your file system data at rest using encryption keys in AWS KMS and automatically encrypts data in transit.

1.2 Object Storage

1.2.1 Amazon S3

Common use cases: enterprise applications, big data analytics, IoT devices, mobile applications, website.

Storage classes offered by Amazon S3 include:

  • S3 Standard
    • Mission-critical production data that requires frequent data access.
    • High performance, 99.99% availability, 99.999999999% durability (11*9s)
    • Supports lifecycle policies, events, encryption, cross-region replication
  • S3 Standard-Infrequent Access (S3 Standard-IA)
    • 99.9% availability, same durability, same features as Standard.
  • S3 One Zone-Infrequent Access (S3 One Zone-IA)
    • Same as S3 Standard-IA but cheaper.
  • Amazon S3 Glacier (S3 Glacier)
    • 99.99% durability
    • 3 retrieval options:
      • expedited: 1 to 5 minutes
      • standard: 3 to 5 hours
      • bulk: 5 to 12 hours
  • Deep Archive (S3 Glacier Deep Archive)
  • Amazon S3 Glacier Instant Retrieval
  • S3 Outpost
  • S3 Intelligent-Tiering

S3 uses TLS/SSL to protect data in transit. S3 uses client-side (before uploading) and server-side (encrypt on write, decrypt on download) encryption to protect data at test.

Other S3 features: data access controls, version control, append metadata tags, secure data from unauthorized access, prevent accidental deletions, run big data analytics.

S3 stores data as objects within buckets. A single object can store up to 5 TB of data.

1.2.2 Amazon S3 Multi-Region Access Points

Select the use cases that best apply to Amazon S3 Multi-Region Access Points:

  • Accelerating performance of S3 over the public internet
  • Connecting to S3 from on-premise or a VPC in an AWS Region
  • Connecting to S3 from an AWS Region

Limitations and restrictions of Amazon S3 Multi-Region Access Points:

  • Multi-Region Access Point names must:
    • Be distinct in each AWS account
    • Be between 3 and 50 characters long
    • Begin with a lowercase letter or number
    • Not end or start with a dash
    • Not contain periods, uppercase letters, or underscores
  • You can't reuse or edit the Multi-Region Access Point aliases generated by Amazon S3 or use the Access Points as the distribution origin for Amazon CloudFront.
  • You can't access data through Multi-Region Access Points using interface endpoints or gateway endpoints.
  • Multi-Region Access Points don't support IPv6, S3 Batch Operations, Amazon S3 on Outpost buckets, or CopyObject, either destination or source.
  • A single AWS account can have a maximum of 100 Multi-Region Access Points and a maximum of 20 regions for a single Multi-Region Access Point.
1.3 Block Storage

1.3.1 Amazon Elastic Block Store (EBS)

Persistent storage for Amazon ECS instances, and continues independently after the life of an instance.

  • You can only attach an Amazon EBS volume to one EC2 instance within the same AZ.
    • Except when you use EBS volumes with Multi-Attach to connect to a single Provisioned IOPS SSD volume, to up to 16 Nitro-based instances within the same AZ.
  • AWS EBS storage is primarily group into:
    • SSD-backed storage - appropriate for transactional workloads like boot volumes and databases.
    • HDD-backed storage - appropriate for throughput-intensive workloads like MapReduce and log processing.

1.3.4 Amazon EC2 Instance Store

Ephemeral, on-disk storage of an EC2 instance, physically connected to a host computer. Perfect for temporary storage of data that frequently changes, or data replicated across multiple instances.

  • Unlike Amazon EBS, the data in an EC2 instance store persists only for its associated instance's lifetime (terminate, hibernate, stop, disk fails - it's gone.)
  • EC2 instance store does not have snapshot support.

Summary of Storage Types and Their Use Cases

  • Criteria: Throughput scale:
    • EBS [Block]: Single gigabyte per second
    • S3 [Object]: Multiple gigabytes per second
    • EFS [File]: Multiple gigabytes per second
  • Per-operation latency:
    • EBS [Block]: Lowest, consistent
    • S3 [Object]: Low (for mixed request types) and integration with CloudFront
    • EFS [File]: Low, consistent
  • Access:
    • EBS [Block]: Single EC2 instance in a single AZ
    • S3 [Object]: 1 to millions of web connections
    • EFS [File]: 1 to thousands of on-premises servers or EC2 instances, from multiple AZs
  • Data durability/availability:
    • EBS [Block]: Data stored in a single EC2 instance within a single AZ.
    • S3 [Object]: Data redundantly stored across multiple AZs.
    • EFS [File]: Data redundantly stored across multiple AZs.
  • Use cases:
    • EBS [Block]: NoSQL and transactional databases, boot volumes, ETL, and data warehousing.
    • S3 [Object]: Entertainment and media, big data analytics, backups, data lakes, web serving, and content management.
    • EFS [File]: Big data analytics, home directories, developer tools, database backups, enterprise applications, entertainment and media, web serving, and content management, container storage.


Glossary:
  • AZ = Availability Zone
  • KMS = Key Management Service {AWS Key Management Service}

Comments