Notes on: Linux Academy: AWS CSAA: 10) Storage Services: S3 (Simple Storage Service)

Just a place to put some notes on the “AWS Certified Solutions Architect - Associate (New!)” course from https://linuxacademy.com

Expanding on: AWS Essentials: S3

Documentation


Things to Know


S3 Essentials:

- As AWS’ main storage service, S3 can serve many purposes when designing highly available, fault tolerant, and secure application architecture. Including:
-- Bulk (basically unlimited) static object storage
-- Various storage classes to optimize cost vs. needed object availability/durability
-- Object versioning
-- Access restrictions via S3 bucket policies/permissions
-- Object management via lifecycle policies
-- Origin for CloudFront CDN
-- File shares and backup/archiving for hybrid networks (via AWS Storage Gateway)

Important S3 Facts:
- Objects stay within an AWS region and are synced across all AZ’s for extremely high availability and durability
- You should always create an S3 bucket in a region that makes sense to its purpose:
-- Serving content to customers
-- Sharing Data with EC2

S3 Read Consistency Rules:
- ALL regions now support read-after-write consistency for PUTS of new objects into S3.
-- Objects can be immediately available after “putting” an object in S3
- All regions use eventual consistency for PUTS overwriting existing objects and DELETES of objects

S3 Buckets:

- Buckets are the main storage container of S3, and contain a grouping of information and have sub name spaces that are similar to folders (called folders)
- Tags can be used to organize buckets (i.e. tag based on application the bucket belongs to)
- Each bucket must have a unique name across ALL of AWS
- Bucket limitations:
-- Only 100 buckets can be created in an AWS account at a time
-- Bucket ownership cannot be transferred once a bucket is created
-- Bucket names must be in lowercase

S3 Objects:

- Objects are static files that contain metadata information:
-- Set of name-key pairs
-- Contain information specified by the user, and AWS information such as storage type
- Each object must be assigned a storage type, which determines the object’s availability, durability, and cost
- By default, all objects are private
- Objects can:
-- Be as small as 0 bytes and as large as 5 TB
-- Have multiple versions (if versioning is enabled)
-- Be made publicly available via a URL
-- Automatically switch to a different storage class or deleted (via lifecycle policies)
-- Encrypted
-- Organized into “sub-name” spaces called folders

Object Encryption:
- SSE (Server Side Encryption):
-- S3 can encrypt the object before saving it on the partitions in the data centers and decrypt it when it is downloaded
-- AES-256
- Or you can use your own encryptions keys:
-- Considered client side encryption where you encrypt the data before upload
- SSL terminated endpoints for the API

S3 Folders:

- For simplicity, S3 supports the concept of “folders”
- This is done only as a means of grouping objects
- Amazon S3 does this by using key-name prefixes for objects

Amazon S3 has a flat structure, there is no hierarchy like you would see in a typical file system.

S3 Permissions:

- All buckets and objects are private by default - only the resource owner has access
- The resource owner can grant access to the resource (buckets/objects) through S3 “resource based policies” OR access can be granted through a traditional IAM user policy
- Resource based policies (for S3) are:

+ Bucket policies
-- Are policies that are attached only to the S3 bucket (not an IAM user)
-- The permissions in the policy are applied to all objects in the bucket
-- The policy specifies what actions are allowed or denied for a particular user of that bucket - such as:
--- Granting access to an anonymous User
--- Who (a “principal”) can execute certain actions like PUT or DELETE
--- Restriction access based off of IP address (generally used for CDN management)

+ S3 access control lists
-- Grant access to uses in other AWS accounts or to the public
-- Both buckets and objects has ACLs
-- Object ACLs allow us to share an S3 object with the public via a URL link

S3 Storage Classes:

- A storage class represents the “classification” assigned to each Object in S3. Current Storage Class types include:
-- Standard
-- Reduced Redundancy Storage (RRS)
-- Infrequent Access (S3-IA)
-- Glacier

- Each storage class has varying attributes that dictate things like:
-- Storage cost
-- Object availability
-- Object durability
-- Frequency of access (to the object)

+ Standard:
- Designed for general, all-purpose storage
- Is the default storage option
- 99.999999999 object durability (“eleven nines”)
- 99.99% object availability
- Is the most expensive storage class

+ Reduced Redundancy Storage (RRS):
- Designed for non-critical, reproducible objects
- 99.99% object durability
- 99.99% object availability
- Is less expensive than the standard storage class

+ Infrequent Access (S3-IA):
- Designed for objects that you do not frequently access, but must be immediately available when accessed
- 99.999999999% object durability
- 99.90% object availability
- Is less expensive than the standard/RRS storage classes

+ Glacier:
- Designed for long-term archival storage (not to be used for backups)
- May take several hours for objects stored in Glacier to be retrieved
- 99.999999999% object durability
- Is the cheapest S3 storage class (very low cost)

Glacier:

- Amazon Glacier is an archival storage type
- Used for data that is NOT accessed frequently
- “Check out” and “check in jobs” can take several hours, meaning how long it can take for the data to be changed and/or retrieved
- Integrates with Amazon S3 lifecycle policies for easy archiving
- Very inexpensive and cost effective archival storage solution
- Glacier should NOT be used as a backup solution

NOTE: Glacier now offers three levels of data retrieval (pricing varies):
- Expedited: 1-5 minutes
- Standard: 3-5 hours
- Bulk: 5-12 hours

S3 Versioning:

- S3 versioning is a feature to manage and store all old/new/deleted versions of an object
- By default, versioning is disabled on all buckets/objects
- Once versioning is enabled, you can only “suspend” versioning (it cannot be fully disabled)
- Suspending versioning only prevents new version from being created. All objects with existing version will maintain their older versions.
- Versioning can only be set on the bucket level and applies to ALL objects in the bucket
- Lifecycle policies can be applied to specific versions of an object
- Versioning and lifecycle policies can both be enabled on a bucket at the same time
- Versioning can be used with lifecycle policies to create a great archiving and backup solution in S3

Lifecycle Policies:

An object lifecycle policy is a set of rules that automate the migration of an object’s storage class to a different storage class (or deletion) based on specified time intervals:
- By default, lifecycle policies are disabled on a bucket/object
- Are customizable to meet your company’s data retention policies
- Great for automating the management of object storage and to be more cost efficient
- Can be used with versioning to create a great archiving and backup solution in S3

Example:
(1) I have a work file that I am going to access every day for the next 30 days
(2) After 30 days, I may only need to access that file once a week for the 60 next days
(3) After which (90 days total) I will probably never access that file again but want to keep it just in case

S3 Event Notifications:

- S3 events notifications allow you to setup automated communication between S3 and other AWS services when a selected event occurs in an S3 bucket

- Common event notification triggers include:
-- RRSObjectLost (used for automating the recreation of lost RRS objects)
-- ObjectCreated (for all or the following specific APIs called)
--- Put
--- Post
--- Copy
--- CompleteMultiPartUpload

- Events notification can be sent to the following AWS services:
-- SNS
-- Lambda
-- SQS Queue

Note: RRS objects might be things like thumbnails.

S3 Static Web Hosting:

- Amazon S3 provides an option for a low-cost, highly reliable web hosting service for static websites (content that does not change frequently)
- When enabled, static web hosting will provide you with a unique endpoint (URL) that you can point to any properly formatted file stored in an S3 bucket. Supported formats include:
-- HTML
-- CSS
-- JavaScript
- Amazon Route 53 can also map human-readable domain names to static web hosting buckets, which are ideal for DNS failover solutions

Cross-Origin Resource Sharing (CORS):
- CORS is a method of allowing a web application located in one domain to access and use resources in another domain
- This allows web applications running JavaScript or HTML5 to access resources in an S3 bucket without using a proxy server
- For AWS, this (commonly) means that a web applications hosted in one S3 bucket can access resources in another S3 bucket

Image: CORS configuration


Single Operation Upload:

- A single operation upload is “traditional” upload where you upload the file in one part
- A single operation upload can upload a file up to 5GB in size, however any file over 100MB should use multipart upload

Multipart Upload:

- Multipart upload allows you to upload a single object as a set of parts
- Allows for uploading parts of a file concurrently
- Allows for stopping/resuming file uploads
- If transmission of any part fails, you can retransmit that part without affecting other parts
- After all parts of your object are uploaded Amazon S3 assembles these parts and creates the object
- Required for objects 5GB and larger, and highly suggested for use when objects are 100MB and larger
- Can be used to upload a file up to 5TB in size

AWS Import/Export:

- AWS Import/Export gives the ability to take on-premise data and physically snail mail it to AWS (using a device that you own)
- AWS will import that data to S3, EBS, or Glacier within one business day of the physical device arriving at AWS
- Benefits:
-- Off-site backup policy
-- Quickly migrate LARGE amounts of data to the cloud (up to 16TB per job)
-- Disaster recovery (AWS will even take S3 data and ship it back to you)

Snowball:

- Snowball is a petabyte-scale data transport solution
- Snowball uses an AWS provided secure transfer appliance
- Quickly move large amounts of data into and out of the AWS cloud

Storage Gateway:

- Connects local data center software appliances to cloud based storage such as Amazon S3

Gateway-Cached Volumes
- Create storage volumes and mount them as iSCSI devices on the on-premise servers
- The gateway will store the data written to this volume in Amazon S3 and will cache frequently accessed data on-premise in the storage device

Gateway-Stored Volumes
- Store all the data locally (on-premise) in storage volumes
- Gateway will periodically take snapshots of the data as incremental backups and stores them on Amazon S3

Quiz

T: S3 can be used as an option for low-cost, reliable web hosting for STATIC (not dynamic) web sites.

Q: Through what process are objects moved from the standard storage class to Glacier?
A: Lifecycle policies
E: Objects uploaded and stored using the standard storage class must use lifecycles to move them to Glacier.

T: All S3 buckets are private by default.

Q: You have a static web page hosted in an S3 bucket, Your requests for a file from a website in another S3 bucket keep failing. What is the most likely solution?
A: Enable CORS configuration on the S3 buckets
E: S3 buckets are in different domains. CORS (cross-origin resource sharing) will allow for domains to share resources. So, enabling CORS on the S3 buckets is the best solution.

T: The S3 infrequent access (S3-IA) storage class has object durability of 99.999999999% and availability of 99.90%
E: S3-IA has the same durability as S3-standard but has a slightly slower availability since these objects are expected to be accessed much less frequently.

Q: You are currently running an application on AWS that hosts customers' photo albums. For each main photo uploaded, your application generates a thumbnail for use in the mobile version of the application. What is the most cost effective storage solution, while also providing the highest level of availability and durability?
A: Use the standard storage class for the main photos and the reduced redundancy storage class for the thumbnails.
E: Since the customers' main photos cannot be reproduced, storing them in the standard storage class will provide the highest level of availability and durability. The thumbnails can be easily reproduced from the main photos, so you can store them in reduced redundancy storage, which has lower durability, but is cheaper than standard.

Q: If need to upload a file to S3 that is 500MB in size, what data transit option should you use?
A: Multi-part upload
E: Multi-part upload should be used uploading any file over 100MB in size (and required for an object over 5GB in size - up to 5TB in size). Single operation upload may be used but is not recommended. Import/export and Snowball are used for datasets that are larger than 5TB.

Q: Your company has petabytes of data that it wants to move from their on-premise network to AWS. What AWS solution should you use?
A: AWS Snowball
E: Snowball is a service provided by AWS for moving extremely large (petabytes) of data into AWS.

Q: You work for a hospital that is required to store patient's medical records for a minimum of 10 years. Most of these records will never be accessed but must be made available upon request (within a few hours). What is the most cost-effective storage option?
A: Glacier
E: Glacier is an AWS solution for archival storage, which is designed for long-term storage of data that is very rarely accessed.

Q: What best describes what occurs when you suspend object versioning?
A: All existing objects retain their current and past versions, and no new versions are created when updated object are uploaded.
E: When you suspend versioning, S3 retains all current and existing past versions. However, all new objects will overwrite the existing current version. No new versions will be created.

Q: What is the object durability and availability advertised by AWS for their S3 standard storage service?
A: Durability of 99.999999999% and availability of 99.99%
E: S3 standard storage class is advertised as having object durability of 99.999999999% (known as 11 nines) and availability of 99.99%

Architecture Diagrams

Image: AWS Account & Services Layer (Storage Services)
AWS’s main storage services is S3. S3 has many different methods of importing, exporting, and syncing data with on-premise networks.

Image: AWS Account & Services Layer (Simple Storage Service)

Image: Using Route 53 to target Static Web Hosting on S3

Comments