Amazon Managed Streaming for Apache Kafka



Introduction to Amazon Managed Streaming for Apache Kafka (Amazon MSK)

Amazon MSK makes it easy to ingest and process streaming data in real-time with fully managed Apache Kafka.


 What can fully manage Apache Kafka on AWS do:

• Allow you to create, update, and delete clusters • MSK creates & manages Kafka brokers nodes & Zookeeper nodes for you • Deploy the MSK cluster in your VPC, multi-AZ (up to 3 for HA) • Automatic recovery from common Apache Kafka failures Data is stored on EBS volumes 

  • You can build producers and consumers of data 
  • Can create custom configurations for your clusters
  • Default message size of 1MB
  • Possibilities of sending large messages (ex: 10MB) into Kafka after custom configuration 

MSK – Configurations 

  • Choose the number of AZs (3 – recommended, or 2)
  • Choose the VPC & Subnets
  • The broker instance type (ex: kafka.m5.large)
  • The number of brokers per AZ (can add brokers later)
  • Size of your EBS volumes (1GB – 16TB) 

MSK – Security 


    • Optional in-flight using TLS between the brokers
    • Optional in-flight with TLS between the clients and brokers
    • At rest for your EBS volumes using KMS

Network Security:

    • Authorize specific security groups for your Apache Kafka clients

Authentication & Authorization

    • Define who can read/write to which topics
    • Mutual TLS (AuthN) + ​​Kafka ACLs (AuthZ)
    • SASL/SCRAM (AuthN) + ​​Kafka ACLs (AuthZ)
    • IAMAccessControl(AuthN+AuthZ)

MSK – Monitoring 

CloudWatch Metrics • Basic monitoring (cluster and broker metrics) • Enhanced monitoring (++enhanced broker metrics) • Topic-level monitoring (++enhanced topic-level metrics) 

Prometheus (Open-Source Monitoring) • Opens a port on the broker to export cluster, broker, and topic-level metrics • Setup the JMX Exporter (metrics) or Node Exporter (CPU and disk metrics) 

Broker Log Delivery • Delivery to CloudWatch Logs • Delivery to Amazon S3 • Delivery to Kinesis Data Streams 

MSK Serverless 

  • Run Apache Kafka on MSK without managing the capacity
  • MSK automatically provisions resources and scales compute & storage
  • You just define your topics and your partitions and you're good to go!
  • Security: IAM Access Control for all clusters

Kinesis Data Streams vs Amazon MSK