Amazon Managed Streaming for Apache Kafka

2022.08.23

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Introduction to Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon MSK makes it easy to ingest and process streaming data in real-time with fully managed Apache Kafka.

APACHE KAFKA AT A HIGH LEVEL

 What can fully manage Apache Kafka on AWS do:

• Allow you to create, update, and delete clusters • MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you • Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA) • Automatic recovery from common Apache Kafka failures Data is stored on EBS volumes 

  • You can build producers and consumers of data 
  • Can create custom configurations for your clusters
  • Default message size of 1MB
  • Possibilities of sending large messages (ex: 10MB) into Kafka after custom configuration 

MSK – Configurations 

  • Choose the number of AZs (3 – recommended, or 2)
  • Choose the VPC & Subnets
  • The broker instance type (ex: kafka.m5.large)
  • The number of brokers per AZ (can add brokers later)
  • Size of your EBS volumes (1GB – 16TB) 

MSK – Security 

Encryption:

    • Optional in-flight using TLS between the brokers
    • Optional in-flight with TLS between the clients and brokers
    • At rest for your EBS volumes using KMS

Network Security:

    • Authorize specific security groups for your Apache Kafka clients

Authentication & Authorization

    • Define who can read/write to which topics
    • Mutual TLS (AuthN) + ​​Kafka ACLs (AuthZ)
    • SASL/SCRAM (AuthN) + ​​Kafka ACLs (AuthZ)
    • IAMAccessControl(AuthN+AuthZ)

MSK – Monitoring 

CloudWatch Metrics • Basic monitoring (cluster and broker metrics) • Enhanced monitoring (++enhanced broker metrics) • Topic-level monitoring (++enhanced topic-level metrics) 

Prometheus (Open-Source Monitoring) • Opens a port on the broker to export cluster, broker, and topic-level metrics • Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics) 

Broker Log Delivery • Delivery to CloudWatch Logs • Delivery to Amazon S3 • Delivery to Kinesis Data Streams 

MSK Serverless 

  • Run Apache Kafka on MSK without managing the capacity
  • MSK automatically provisions resources and scales compute & storage
  • You just define your topics and your partitions and you're good to go!
  • Security: IAM Access Control for all clusters

Kinesis Data Streams vs Amazon MSK