Suggestions of using AWS purpose-built databases for microservices #reinvent [DAT209-L]

AWS re:Invent 2019

2019.12.06

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

This post is the session report about DAT209-L: Leadership session: AWS purpose-built databases at AWS re:Invent 2019.

日本語版はこちらです。

概要

In this session, Shawn Bice, VP of databases, discusses the AWS purpose-built database strategy and explains why your application should drive the requirements for which database(s) to use, not the other way around. Learn about the purpose of each AWS database service and how AWS customers are using purpose-built databases to build some of the most scalable applications on the planet. If you are a technology or engineering leader and you’re trying to understand how to modernize your data strategy, this session is for you. We discuss using various approaches for new application development, lifting-and-shifting to managed services, and refactoring monolithic database architectures to use purpose-built databases.

Speakers

Shawn Bice
- VP, Databases, Amazon Web Services
Tobias Ternstrom
- Director, RDS & Aurora, Amazon Web Services
Joseph Idziorek
- Principal Product Manager, Amazon Web Services

To build applications as the microservices architecture, you definitely need to understand how AWS database services are categorized and how to use each of them. In this session, AWS experts explained 7 types of databases and these use cases by company's real situations.

App architectures & patterns have evolved over the years...

Builders today are...
- Not really building monolith applications
- Looking towards purpose-built systems
- Taking a big app breaking into smaller parts
- Picking the right tool for the right job

60s: Mainframe
80s: Client Server
1. Separate app logic from a database
90s: Three tier
1. The internet arrived
2. A client layer, an application layer and single database layer
Microservices
1. Gotten into this new era of the cloud, the systems today are way more specialized
2. Databases are more specialized than they have ever been before

Common database categories

Easy way to think of a data strategy -> Categorize

Relational
1. Make sure that responses are strongly consistent
2. Put a constraint to the data type
Key-value
1. One.item or a trillion items in a table performs the same to scale out
Document
1. Create the data model on the fly as a JSON
In-memory
1. Query frequent accessed data to it instead of a full table scan
Graph
1. Highly connected data
Time-series
1. Sequel primary axis of the data model
2. Doesn't do updates, inserts append only
Ledger
1. Once I write to it, I can never change it
2. Immutable transaction log with cryptographic verifiability

Top of mind for our customers

Move to Managed
1. Services: RDS, Aurora, ElastiCache, DocDB
2. Tools: SCT, DMS
3. Programs: MAP, Pro Serve, Partner
Break Free
1. Services: Aurora, Amazon Redshift
2. Tools: SCT, DMS
3. Programs: MAP, DM Freedom, Pro Serve, Partner
New Modern Apps
- New Requirements
  - Users: 1 million+
  - Data Volume: TB-PB-EB
  - Performance: Milli-Micro sec
  - Request Rate: Millions+
  - Access: Any device
  - Scale: up-out-in
  - Economics: Pay as you go
  - Developers Access: Managed API

Customer Stories

Lyft using Key-value: Amazon DynamoDB

Need the performance to be able to scale whether the number of users is 10 or 10 million
Use DynamoDB for storing individual GPS locations with particular to your ride
Key-value patterns enables us to get inputs for an individual rider based on a known key
DynamoDB scales horizontally, which is virtually unlimited and still have millisecond performance

ZipRecruiter using Relational: Amazon Aurora and Amazon RDS

Need to have a rich query experience over a million businesses and 100 million job seekers
Don't know exactly who is going to search for what
Use read replica to support the workload

Liberty Mutual using Document: Amazon DocumentDB

Need flexible data model for JSON which stores information about customers, policies and assets
Iterate fast and deliver new features without changing the schema and the database
If you don't know the access pattern but need boundless scale, DocumentDB is preferred over DynamoDB

UBISOFT using In-memory: Amazon ElastiCache

Optimized for latency over durability, which means both reads and writes will get microsecond latency
Minimal latency is significant for online games
Practical data structure where you can easily create and update ledger area and leaderboard

Nike using Graph: Amazon Neptune

The circles with nouns and the edge with directions and connections
Can actually query on connections
Build a social graph to connect to customers and athletes

Klarna using Ledger: Amazon QLDB

Make sure no one comes in and pokes around with your records
The traditional way to protect the records such as limited access and auditing relys on human
Use the hash key to verify that nothing resolved

Fender - The right tool for the job

Migrated their product data, images, and purchase orders
- from SQL Servers
- to DynamoDB, Amazon S3, AWS Lambda and Amazon ElastiCache
Lowered costs by 20%
Increased speed by 50%
Migrated the whole system to the cloud in less than 6 months

Recently Updates

Amazon Managed Cassandra Service (Preview)

Challenges to manage large Cassandra clusters at scale
- Specialized expertise to set up, configure, and maintain infrastructure and software
- Scaling clusters is time-consuming, manual, and error-prone, so many overprovision capacity
- Manual backups and error-prone restore processes to maintain integrity
- Unreliable upgrades with clunky rollback and debugging capabilities

Federated Query for Amazon Athena (Preview)

Challenges querying data from multiple databases
- Microservices can minimize the blast radius but it is difficult to take a look individually
- Imagine an e-commerce store with a microservices architecture
- Accessing multiple systems can be challenging

Amazon Aurora integration with ML

Challenges with integrating machine learning (ML) with your database
- Select and train the model
- Create application code to read data from the database
- Query and format the data for the ML algorithm
- Call an ML service to run the algorithm
- Format the output