Just
a place to put some notes on the “AWS Certified Solutions Architect - Associate
(New!)” course from https://linuxacademy.com
Expanding
on: AWS Essentials: S3
Documentation
Things to Know
S3
Essentials:
- As AWS’ main storage service,
S3 can serve many purposes when designing highly available, fault tolerant, and
secure application architecture. Including:
-- Bulk (basically unlimited)
static object storage
-- Various storage classes to optimize cost vs. needed object
availability/durability
-- Object versioning
-- Access restrictions via S3
bucket policies/permissions
-- Object management via lifecycle
policies
-- Origin for CloudFront CDN
-- File shares and
backup/archiving for hybrid networks (via AWS Storage Gateway)
Important S3 Facts:
- Objects stay within an AWS
region and are synced across all AZ’s for extremely high availability and
durability
- You should always create an
S3 bucket in a region that makes sense to its purpose:
-- Serving content to customers
-- Sharing Data with EC2
S3 Read Consistency Rules:
- ALL regions now support
read-after-write consistency for PUTS of new objects into S3.
-- Objects can be immediately
available after “putting” an object in S3
- All regions use eventual
consistency for PUTS overwriting existing objects and DELETES of objects
S3
Buckets:
- Buckets are the main storage
container of S3, and contain a grouping of information and have sub name spaces
that are similar to folders (called folders)
- Tags can be used to organize
buckets (i.e. tag based on application the bucket belongs to)
- Each bucket must have a unique name across ALL of AWS
- Bucket limitations:
-- Only 100 buckets can be
created in an AWS account at a time
-- Bucket ownership cannot be
transferred once a bucket is created
-- Bucket names must be in
lowercase
S3
Objects:
- Objects are static files that
contain metadata information:
-- Set of name-key pairs
-- Contain information
specified by the user, and AWS information such as storage type
- Each object must be assigned
a storage type, which determines the object’s availability, durability, and
cost
- By default, all objects are
private
- Objects can:
-- Be as small as 0 bytes and
as large as 5 TB
-- Have multiple versions (if
versioning is enabled)
-- Be made publicly available
via a URL
-- Automatically switch to a
different storage class or deleted (via lifecycle policies)
-- Encrypted
-- Organized into “sub-name”
spaces called folders
Object Encryption:
- SSE (Server Side Encryption):
-- S3 can encrypt the object
before saving it on the partitions in the data centers and decrypt it when it
is downloaded
-- AES-256
- Or you can use your own
encryptions keys:
-- Considered client side
encryption where you encrypt the data before upload
- SSL terminated endpoints for
the API
S3
Folders:
- For simplicity, S3 supports
the concept of “folders”
- This is done only as a means
of grouping objects
- Amazon S3 does this by using
key-name prefixes for objects
Amazon S3 has a flat structure, there is no hierarchy like you would
see in a typical file system.
S3
Permissions:
- All buckets and objects are
private by default - only the resource owner has access
- The resource owner can grant
access to the resource (buckets/objects) through S3 “resource based policies”
OR access can be granted through a traditional IAM user policy
- Resource based policies (for
S3) are:
+ Bucket policies
-- Are policies that are
attached only to the S3 bucket (not
an IAM user)
-- The permissions in the
policy are applied to all objects in the bucket
-- The policy specifies what
actions are allowed or denied for a particular user of that bucket - such as:
--- Granting access to an
anonymous User
--- Who (a “principal”) can
execute certain actions like PUT or DELETE
--- Restriction access based
off of IP address (generally used for CDN management)
+ S3 access control lists
-- Grant access to uses in other
AWS accounts or to the public
-- Both buckets and objects has
ACLs
-- Object ACLs allow us to
share an S3 object with the public via a URL link
S3
Storage Classes:
- A storage class represents the “classification” assigned to each
Object in S3. Current Storage Class types include:
-- Standard
-- Reduced Redundancy Storage
(RRS)
-- Infrequent Access (S3-IA)
-- Glacier
- Each storage class has varying attributes that dictate things like:
-- Storage cost
-- Object availability
-- Object durability
-- Frequency of access (to the
object)
+ Standard:
- Designed for general,
all-purpose storage
- Is the default storage option
- 99.999999999 object durability (“eleven
nines”)
- 99.99% object availability
- Is the most expensive storage
class
+ Reduced Redundancy
Storage (RRS):
- Designed for non-critical,
reproducible objects
- 99.99% object durability
- 99.99% object availability
- Is less expensive than the
standard storage class
+ Infrequent Access
(S3-IA):
- Designed for objects that you
do not frequently access, but must be immediately available when accessed
- 99.999999999% object durability
- 99.90% object availability
- Is less expensive than the
standard/RRS storage classes
+ Glacier:
- Designed for long-term
archival storage (not to be used for backups)
- May take several hours for
objects stored in Glacier to be retrieved
- 99.999999999% object durability
- Is the cheapest S3 storage
class (very low cost)
Glacier:
- Amazon Glacier is an archival storage type
- Used for data that is NOT
accessed frequently
- “Check out” and “check in
jobs” can take several hours, meaning how long it can take for the data to be
changed and/or retrieved
- Integrates with Amazon S3
lifecycle policies for easy archiving
- Very inexpensive and cost
effective archival storage solution
- Glacier should NOT be used as
a backup solution
NOTE: Glacier now offers three levels of data retrieval (pricing
varies):
- Expedited: 1-5 minutes
- Standard: 3-5 hours
- Bulk: 5-12 hours
S3
Versioning:
- S3 versioning is a feature to
manage and store all old/new/deleted versions of an object
- By default, versioning is
disabled on all buckets/objects
- Once versioning is enabled,
you can only “suspend” versioning (it cannot be fully disabled)
- Suspending versioning only
prevents new version from being created. All objects with existing version will
maintain their older versions.
- Versioning can only be set on
the bucket level and applies to ALL objects in the bucket
- Lifecycle policies can be
applied to specific versions of an object
- Versioning and lifecycle
policies can both be enabled on a bucket at the same time
- Versioning can be used with
lifecycle policies to create a great archiving and backup solution in S3
Lifecycle
Policies:
An object lifecycle policy is a
set of rules that automate the
migration of an object’s storage class to a different storage class (or
deletion) based on specified time intervals:
- By default, lifecycle
policies are disabled on a bucket/object
- Are customizable to meet your
company’s data retention policies
- Great for automating the
management of object storage and to be more cost efficient
- Can be used with versioning to
create a great archiving and backup solution in S3
Example:
(1) I have a work file that I am going to access every day for the
next 30 days
(2) After 30 days, I may only need to access that file once a week
for the 60 next days
(3) After which (90 days total) I will probably never access that
file again but want to keep it just in case
S3
Event Notifications:
- S3 events notifications allow you to setup automated communication
between S3 and other AWS services when a selected event occurs in an S3 bucket
- Common event notification
triggers include:
-- RRSObjectLost (used for automating the recreation of lost RRS
objects)
-- ObjectCreated (for all or the following specific APIs called)
--- Put
--- Post
--- Copy
--- CompleteMultiPartUpload
- Events notification can be
sent to the following AWS services:
-- SNS
-- Lambda
-- SQS Queue
Note: RRS objects might be things like thumbnails.
S3
Static Web Hosting:
- Amazon S3 provides an option for
a low-cost, highly reliable web hosting service for static websites (content
that does not change frequently)
- When enabled, static web
hosting will provide you with a unique endpoint (URL) that you can point to any
properly formatted file stored in an S3 bucket. Supported formats include:
-- HTML
-- CSS
-- JavaScript
- Amazon Route 53 can also map
human-readable domain names to static web hosting buckets, which are ideal for
DNS failover solutions
Cross-Origin Resource
Sharing (CORS):
- CORS is a method of allowing
a web application located in one domain to access and use resources in another
domain
- This allows web applications
running JavaScript or HTML5 to access resources in an S3 bucket without using a
proxy server
- For AWS, this (commonly)
means that a web applications hosted in one S3 bucket can access resources in
another S3 bucket
Image: CORS configuration
Single
Operation Upload:
- A single operation upload is “traditional”
upload where you upload the file in one part
- A single operation upload can
upload a file up to 5GB in size, however any file over 100MB should use
multipart upload
Multipart
Upload:
- Multipart upload allows you
to upload a single object as a set of parts
- Allows for uploading parts of
a file concurrently
- Allows for stopping/resuming
file uploads
- If transmission of any part
fails, you can retransmit that part without affecting other parts
- After all parts of your
object are uploaded Amazon S3 assembles these parts and creates the object
- Required for objects 5GB and
larger, and highly suggested for use
when objects are 100MB and larger
- Can be used to upload a file
up to 5TB in size
AWS
Import/Export:
- AWS Import/Export gives the
ability to take on-premise data and physically snail mail it to AWS (using a
device that you own)
- AWS will import that data to
S3, EBS, or Glacier within one business day of the physical device arriving at
AWS
- Benefits:
-- Off-site backup policy
-- Quickly migrate LARGE
amounts of data to the cloud (up to 16TB per job)
-- Disaster recovery (AWS will
even take S3 data and ship it back to you)
Snowball:
- Snowball is a petabyte-scale
data transport solution
- Snowball uses an AWS provided
secure transfer appliance
- Quickly move large amounts of
data into and out of the AWS cloud
Storage
Gateway:
- Connects local data center
software appliances to cloud based storage such as Amazon S3
Gateway-Cached Volumes
- Create storage volumes and
mount them as iSCSI devices on the on-premise servers
- The gateway will store the
data written to this volume in Amazon S3 and will cache frequently accessed
data on-premise in the storage device
Gateway-Stored Volumes
- Store all the data locally
(on-premise) in storage volumes
- Gateway will periodically
take snapshots of the data as incremental backups and stores them on Amazon S3
Quiz
T: S3 can be used as an option for
low-cost, reliable web hosting for STATIC (not dynamic) web sites.
Q: Through what process are objects
moved from the standard storage class to Glacier?
A: Lifecycle policies
E:
Objects uploaded and stored using the standard storage class must use
lifecycles to move them to Glacier.
T: All S3 buckets are private by
default.
Q: You have a static web page hosted in
an S3 bucket, Your requests for a file from a website in another S3 bucket keep
failing. What is the most likely solution?
A: Enable CORS configuration on the S3
buckets
E:
S3 buckets are in different domains. CORS (cross-origin resource sharing) will
allow for domains to share resources. So, enabling CORS on the S3 buckets is
the best solution.
T: The S3 infrequent access (S3-IA)
storage class has object durability of 99.999999999% and availability of 99.90%
E:
S3-IA has the same durability as S3-standard but has a slightly slower
availability since these objects are expected to be accessed much less
frequently.
Q: You are currently running an
application on AWS that hosts customers' photo albums. For each main photo
uploaded, your application generates a thumbnail for use in the mobile version
of the application. What is the most cost effective storage solution, while
also providing the highest level of availability and durability?
A: Use the standard storage class for
the main photos and the reduced redundancy storage class for the thumbnails.
E:
Since the customers' main photos cannot be reproduced, storing them in the
standard storage class will provide the highest level of availability and
durability. The thumbnails can be easily reproduced from the main photos, so
you can store them in reduced redundancy storage, which has lower durability,
but is cheaper than standard.
Q: If need to upload a file to S3 that
is 500MB in size, what data transit option should you use?
A: Multi-part upload
E:
Multi-part upload should be used uploading any file over 100MB in size (and
required for an object over 5GB in size - up to 5TB in size). Single operation
upload may be used but is not recommended. Import/export and Snowball are used
for datasets that are larger than 5TB.
Q: Your company has petabytes of data
that it wants to move from their on-premise network to AWS. What AWS solution
should you use?
A: AWS Snowball
E:
Snowball is a service provided by AWS for moving extremely large (petabytes) of
data into AWS.
Q: You work for a hospital that is
required to store patient's medical records for a minimum of 10 years. Most of
these records will never be accessed but must be made available upon request
(within a few hours). What is the most cost-effective storage option?
A: Glacier
E:
Glacier is an AWS solution for archival storage, which is designed for
long-term storage of data that is very rarely accessed.
Q: What best describes what occurs when
you suspend object versioning?
A: All existing objects retain their
current and past versions, and no new versions are created when updated object
are uploaded.
E:
When you suspend versioning, S3 retains all current and existing past versions.
However, all new objects will overwrite the existing current version. No new
versions will be created.
Q: What is the object durability and
availability advertised by AWS for their S3 standard storage service?
A: Durability of 99.999999999% and
availability of 99.99%
E:
S3 standard storage class is advertised as having object durability of
99.999999999% (known as 11 nines) and availability of 99.99%
Architecture
Diagrams
Image: AWS Account & Services Layer (Storage Services)
AWS’s
main storage services is S3. S3 has many different methods of importing,
exporting, and syncing data with on-premise networks.
Image: AWS Account & Services Layer (Simple Storage Service)
Image: Using Route 53 to target Static Web Hosting on S3
Comments
Post a Comment