Kinesis Producers
Kinesis Producers
A producer for Amazon Kinesis Data Streams is an application that feeds user data records into a Kinesis data stream (also called data ingestion). The Kinesis Producer Library (KPL) makes it easier to construct producer applications by allowing developers to achieve high write throughput to a Kinesis data stream.
There are different methods to stream data into Amazon kinesis streams:
- Kinesis SDK
- Kinesis Producer Library (KPL)
- Kinesis Agent
Other third-party libraries include
Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi
Kinesis Producer SDK - PutRecord(s)
- PutRecord (one record) and PutRecords (many records) APIs are utilized.
- PutRecords leverages batching and enhances performance, resulting in fewer HTTP calls .
- AWS Mobile SDKs: Android, iOS, etc...
- Managed Amazon Web Services sources for Kinesis Data Streams:
- • AWS IoT
- • CloudWatch Logs
- • Kinesis Data Analytics
Use cases:
low throughput, higher latency, simple API, AWS Lambda
Kinesis Producer Library (KPL)
- Easy to use and highly configurable C++/Java library
- Used for building high-performance, long-running producers
- Automated and configurable retry mechanism
- Synchronous or Asynchronous APIs (better performance for async)
- Submits metrics to CloudWatch for monitoring
- Batching (both turned on by default) – increase throughput, decrease cost:
- Collect Records and Write to multiple shards in the same PutRecords API call
- Aggregate – increased latency
Kinesis Producer Library (KPL) Batching
By inserting some delay using RecordMaxBufferedTime, batching efficiency can be impacted (default 100ms)
NOTE: When not to use the Kinesis Producer Library
-
- The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable)
- Larger values of RecordMaxBufferedTime result in higher packing efficiencies and better performance
- Applications that cannot tolerate this additional delay may need to use the AWS SDK directly
Kinesis Agent
• Monitor Log files and sends them to Kinesis Data Streams • Java-based agent, built on top of KPL • Install in Linux-based server environments
Features:
- Write from multiple directories and write to multiple streams
- Routing feature based on directory/log file
- Pre-process data before sending to streams (single line, CSV to JSON, log to JSON)
- The agent handles file rotation, checkpointing, and retry upon failures
- Emits metrics to CloudWatch for monitoring
AWS Kinesis API - Exceptions
Provisioned Throughput Exceeded Exceptions
- Happens when sending more data (exceeding MB/s or TPS for any shard)
- Make sure you don't have a hot shard (such as your partition key is bad and too many data goes to that partition) Solution: • Retries with backoff • Increase shards (scaling) • Ensure your partition key is a good one