Amazon Managed Streaming for Apache Kafka
Introduction to Amazon Managed Streaming for Apache Kafka (Amazon MSK)
Amazon MSK makes it easy to ingest and process streaming data in real-time with fully managed Apache Kafka.
APACHE KAFKA AT A HIGH LEVEL
What can fully manage Apache Kafka on AWS do:
• Allow you to create, update, and delete clusters • MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you • Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA) • Automatic recovery from common Apache Kafka failures Data is stored on EBS volumes
- You can build producers and consumers of data
- Can create custom configurations for your clusters
- Default message size of 1MB
- Possibilities of sending large messages (ex: 10MB) into Kafka after custom configuration
MSK – Configurations
- Choose the number of AZs (3 – recommended, or 2)
- Choose the VPC & Subnets
- The broker instance type (ex: kafka.m5.large)
- The number of brokers per AZ (can add brokers later)
- Size of your EBS volumes (1GB – 16TB)
MSK – Security
Encryption:
-
- Optional in-flight using TLS between the brokers
- Optional in-flight with TLS between the clients and brokers
- At rest for your EBS volumes using KMS
Network Security:
-
- Authorize specific security groups for your Apache Kafka clients
Authentication & Authorization
-
- Define who can read/write to which topics
- Mutual TLS (AuthN) + Kafka ACLs (AuthZ)
- SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ)
- IAMAccessControl(AuthN+AuthZ)
MSK – Monitoring
CloudWatch Metrics • Basic monitoring (cluster and broker metrics) • Enhanced monitoring (++enhanced broker metrics) • Topic-level monitoring (++enhanced topic-level metrics)
• Prometheus (Open-Source Monitoring) • Opens a port on the broker to export cluster, broker, and topic-level metrics • Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics)
• Broker Log Delivery • Delivery to CloudWatch Logs • Delivery to Amazon S3 • Delivery to Kinesis Data Streams
MSK Serverless
- Run Apache Kafka on MSK without managing the capacity
- MSK automatically provisions resources and scales compute & storage
- You just define your topics and your partitions and you're good to go!
- Security: IAM Access Control for all clusters
Kinesis Data Streams vs Amazon MSK