Notes on: Linux Academy: AWS CSAA: 14) Database Services

Just a place to put some notes on the “AWS Certified Solutions Architect - Associate (New!)” course from https://linuxacademy.com

Database Services

Image: Architecture: Account & Services Layer: Database Services

i) Relational Database Service

Image: Architecture: Account & Services Layer: Relational Database Service

RDS Essentials:

- RDS is a fully managed Relational Database Service:
+ Does not allow access to the underlying operating system (fully-managed)
+ You connect to the RDS database server in the same way you would connect to a traditional on-premise database instance (i.e. MySQL command line)
+ RDS has the ability to provision/resize hardware on demand for scaling
+ You can enable Multi-AZ deployments for backup and high availability solutions
+ Utilize Read Replicas (MySQL/PostgreSQL/Aurora) - to help offload hits on your primary database
+ Relational databases are databases that organize stored data into tables
+ The associated tables have defined relationships between them

- Databases Supported by RDS:
+ MySQL
+ MariaDB
+ PostgreSQL
+ Oracle
+ MS SQL Server
+ Aurora:
-- Is a home grown Relational Database that has been forked from, and fully compatible with MySQL
-- It has five times better performance than MySQL and a lower price point than commercial databases

- Benefits of running RDS instead of a database on your own instance:
+ Automatic minor updates
+ Automatic backups (point-in-time-snapshots)
+ Not required to manage the operating system
+ Multi-AZ with a single click
+ Automatic recovery in event of a failover

RDS Multi-AZ Failover:

- Multi-AZ failover (Automatic AZ-Failover) synchronously replicates data to a backup (stand-by) database instance located in another availability zone (but in the same region)

- In the event of:
+ Service outage in an availability zone
+ Primary DB instance failure
+ Instance server type is changed
+ Manual failover initiated
+ Updating software version
+ AWS will automatically switch the CNAME DNS record from the primary instance to the stand-by instance

- RDS backups are taken against the stand-by instance to reduce I/O freezes and slow down IF multi-az is enabled

- In order for multi-az to work, your primary database instance must be launched into a “subnet group
-- NOTE: An RDS instance must be launched into a subnet (inside a VPC) just like an EC2 instance. So the same security/connectivity rules, and highly available/fault tolerant concepts apply.

RDS Backups:

- AWS provides automated point-in-time backups against the RDS database instance
- Automated backups are deleted once the database instance is deleted and cannot be recovered (but you can take your own snapshots of backups before deleting)
- Backups on database engines only work correctly when the database engine is “transactional” but do currently work for all supported database types
- MySQL requires InnoDB for reliable backups

RDS Read Replicas:

- Read replicas are asynchronous copies of the primary database that are used for read only purposes (only allow “read connections”)
- When you write new data to the primary database, AWS copies it for you to the read replica
- You can create and have multiple read replicas for a primary database
- Read replicas can be created from other read replicas (so no performance hit on the primary database)
- MySQL, MariaDB, PostgreSQL, and Aurora currently support read replicas
- You can monitor replication lag using CloudWatch

Benefits of using Read Replicas:
- Read Replicas allow for all read traffic to be redirected from the primary database to the read replica. This will greatly improve the performance on the primary database.
- Read replicas allow for elasticity in RDS - you can add more read replicas as demand increases
- You can promote a read replica to a primary instance
- MySQL:
-- Replicate for importing/exporting data to RDS
-- Can replicate across regions

When should you use Read Replicas?
- High volume, non-cached database read traffic (elasticity)
- Running business function such as a data warehousing
- Importing/Exporting data into RDS
- Rebuilding indexes:
-- Ability to promote read replica to a primary instance

ii) DynamoDB

DynamoDB Essentials:

- DynamoDB is a fully-managed, NoSQL database service provided by AWS
- It is similar to MongoDB, but is a home-grown AWS solution
- Is schemaless and uses a key-value store
- You specify the required throughput capacity, and DynamoDB does the rest (being fully-managed)

- Being fully-managed means:
+ Service manages all provisioning (and scaling) of underlying hardware
+ Fully distributed, and scales automatically with demand and growth
+ Built as a fault tolerant highly available service
-- On the back end, it fully synchronizes the data across all of the availability zones within the region you create the DynamoDB tables in

- DynamoDB also easily integrates with other AWS services, such as Elastic MapReduce
+ Can easily move data to a Hadoop cluster in Elastic MapReduce

- Popular use cases include:
+ IOT (storage meta data)
+ Gaming (storing session information, leaderboards)
+ Mobile (Storing user profiles, personalization)

iii) ElastiCache

ElastiCache Essentials:

- ElastiCache is a fully managed, in-memory cache engine
- ElastiCache is used to improve database performance by caching results or requires that are made to a database
- ElastiCache is great for large, high-performance or high-taxing queries - and can store them inside of a cache (Elastic Cache Cluster) that can be accessed later (instead of repeat request continually hitting the primary database)
- So it reduces load on the database which increases performance
- ElastiCache allows for managing web sessions and also caching dynamic generated data

- Available engines to power ElastiCache include:
+ Memcached (Mem-Cached-D)
+ Redis
- Generally, the applications needs to be built to work with either Redis or Memcached
- Popular options like MySQL have Memcached plugins, which allow an application to easily work with ElastiCache (if using Memcached as the engine)

iv) Redshift

Redshift Essentials

- Amazon Redshift is a petabyte-scale data warehousing service
- It is fully-managed and scalable
- Generally used for big-data analytics and it can integrate with most popular business intelligence tools, including:
+ Jaspersoft
+ Microstrategy
+ Pentaho
+ Tableau
+ Business Objects
+ Cognos

Quiz: Databases Quiz

T: AWS provides automated backups of RDS databases, which are point-in-time snapshots.

Q: What are two benefits of using read replicas?
A1: Creates elasticity in RDS
A2: Improves performance of the primary database by taking workload from it
E: You can add/remove read replicas based on demand, so it creates elasticity for RDS. Read replicas can take read only workloads off of the primary database, thus improving performance.

Q: The Availability Zone that your RDS database instance is located in is suffering from outages, and you have lost access to the database. What could you have done to prevent losing access to your database (in the event of this type of failure) without any downtime?
A: Enabled multi-AZ failover
E: If multi-AZ failover is enabled, a duplicate copy of the database is kept in a separate AZ. If there is failure in the primary database's AZ, AWS will automatically switch the CNAME DNS record from the primary to the failover backup instance.

Q: What database service should you choose if you need petabyte-scale data warehousing?
A: Redshift
E: Redshift is for petabyte-scale data warehousing.

T: When setting up a DynamoDB database, you only need to specify the required throughput capacity. There is no instance size or storage type to choose from. AWS scales compute power with your needs.

T: A read replica can be promoted to the primary instance.

Q: How does using Elasticache help to improve database performance?
A: It can store high-taxing queries
E: Elasticache is designed for large, high-performance or taxing queries. it can store the queries to alleviate hits to the database.

Q: What database service offers petabyte-scale data warehousing?
A: Redshift
E: Redshift offers petabyte-scale data warehousing that is generally used for big data analytics.

Q: What are the "engine" options for ElastiCache?
A: Redis & Memcached

Q: What are three attributes of DynamoDB?
A1: Fully-managed
A2: A NoSQL database platform
A3: Uses key-value store

Comments