Kinesis Producers

2022.08.23

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Kinesis Producers

A producer for Amazon Kinesis Data Streams is an application that feeds user data records into a Kinesis data stream (also called data ingestion). The Kinesis Producer Library (KPL) makes it easier to construct producer applications by allowing developers to achieve high write throughput to a Kinesis data stream.

There are different methods to stream data into Amazon kinesis streams:

  • Kinesis SDK 
  • Kinesis Producer Library (KPL) 
  • Kinesis Agent 

Other third-party libraries include 

Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi

Kinesis Producer SDK - PutRecord(s)

  •  PutRecord (one record) and PutRecords (many records) APIs are utilized.
  •  PutRecords leverages batching and enhances performance, resulting in fewer HTTP calls .
  •  AWS Mobile SDKs: Android, iOS, etc...
  •  Managed Amazon Web Services sources for Kinesis Data Streams:
    • • AWS IoT
    • • CloudWatch Logs
    • • Kinesis Data Analytics

Use cases:

low throughput, higher latency, simple API, AWS Lambda

Kinesis Producer Library (KPL) 

  • Easy to use and highly configurable C++/Java library
  • Used for building high-performance, long-running producers
  • Automated and configurable retry mechanism
  • Synchronous or Asynchronous APIs (better performance for async)
  • Submits metrics to CloudWatch for monitoring 
  • Batching (both turned on by default) – increase throughput, decrease cost:
    • Collect Records and Write to multiple shards in the same PutRecords API call
    • Aggregate – increased latency

Kinesis Producer Library (KPL) Batching

By inserting some delay using RecordMaxBufferedTime, batching efficiency can be impacted (default 100ms) 

NOTE: When not to use the Kinesis Producer Library 

    • The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable) 
    • Larger values ​​of RecordMaxBufferedTime result in higher packing efficiencies and better performance 
    • Applications that cannot tolerate this additional delay may need to use the AWS SDK directly 

Kinesis Agent 

• Monitor Log files and sends them to Kinesis Data Streams • Java-based agent, built on top of KPL • Install in Linux-based server environments 

Features: 

  • Write from multiple directories and write to multiple streams
  • Routing feature based on directory/log file
  • Pre-process data before sending to streams (single line, CSV to JSON, log to JSON)
  • The agent handles file rotation, checkpointing, and retry upon failures
  • Emits metrics to CloudWatch for monitoring 

AWS Kinesis API - Exceptions 

Provisioned Throughput Exceeded Exceptions 

  • Happens when sending more data (exceeding MB/s or TPS for any shard)
  • Make sure you don't have a hot shard (such as your partition key is bad and too many data goes to that partition) Solution: • Retries with backoff • Increase shards (scaling) • Ensure your partition key is a good one