How Kinesis Data Stream Works


Kinesis Data Stream

Kinesis data stream is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale.

Characteristics of the kinesis data stream include but are not limited to:

  • Retention between 1 day to 365 days
  • Ability to reprocess (replay) data
  • Once data is inserted in Kinesis, it can't be deleted (immutability)
  • Data that share the same partition goes to the same shard (ordering)
  • Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
  • Consumers: • Write your own: Kinesis Client Library (KCL), AWS SDK • Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics

Kinesis Data Streams – Capacity Modes

Provisioned mode: • You choose the number of shards provisioned, scale manually or using API • Each shard gets 1MB/s in (or 1000 records per second) • Each shard gets 2MB/s out (classic or enhanced fan-out consumer) • You pay per shard provisioned per hour 

On-demand mode: 

  • No need to provision or manage the capacity
  • Default capacity provisioned (4 MB/s in or 4000 records per second)
  • Scales are automatically based on observed throughput peaks during the last 30 days
  • Pay per stream per hour & data in/out per GB

Kinesis Data Streams Security 

  • Control access/authorization using IAM policies
  • Encryption in flight using HTTPS endpoints
  • Encryption at rest using KMS
  • You can implement encryption/decryption of data on the client side (harder)
  • VPC Endpoints are available for Kinesis to access within VPC
  • Monitor API calls using CloudTrail 

Navigate to AWS Console, search for kinesis and select kinesis data stream

Fill up the stream name,

select only one shard for demo purposes, you can select as many shards as you want.

I'm making use of CloudShell

For Producer

aws kinesis put-record --stream-name DemoStream --partition-key user1 --data "user signup" --cli-binary-format raw-in-base64-out

Remember to change your stream name and press enter.

It will generate as ShardId and SequenceNumber

For Consumer

# Describe the stream

aws kinesis describe-stream --stream-name DemoStream

# Consume some data

aws kinesis get-shard-iterator --stream-name DemoStream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON

aws kinesis get-records --shard-iterator