I thought about OpenSearch configuration patterns

I've been using it without much thought, but I'd like to take this opportunity to consider the configuration patterns of OpenSearch (such as instance types and number of servers) on my own.

Seiichi Arai

2024.10.16

This article was published more than one year ago. Please be aware that the information may be outdated.

This page has been translated by machine translation. View original

Recently, we experienced an issue with our OpenSearch Service used for business operations.
We couldn't access the Dashboard, and when checking the logs, there was an endless stream of Java error messages.
Upon investigation, we found that the t3.small instances being used for data nodes had become unstable, resulting in a loss of quorum between these two nodes.
Since the usage was limited to internal company purposes, the impact wasn't too severe. However, during our investigation, we discovered that our current configuration of two data nodes (using t3.small instances) is not recommended for production environments.
Since we had been using the existing setup somewhat mindlessly, I'd like to take this opportunity to consider various configuration patterns for OpenSearch from the perspective of instance types and server count.
※This article focuses on OpenSearch version 2.13.
Reference documents
AWS OpenSearch Service Best Practices
Amazon OpenSearch Service Quotas
Cost Optimization
Amazon OpenSearch Service Pricing
How do I improve the fault tolerance of my Amazon OpenSearch Service domain?
Amazon OpenSearch Service Dedicated Master Nodes
Four Common Misunderstandings About Operating Elasticsearch

 Instance Type SelectionFirst, let's look at instance type selection.
The documentation recommends r6g.large for small production workloads (both as data nodes and dedicated master nodes).
I've listed relatively low-cost instance types in this range:
Instance usage fees (ap-northeast-1) ※As of September 2024


Instance Type
CPU
Memory
Hourly Cost (USD)
Monthly Cost (USD)


t3.small.search
2
2
0.056
40.8

t3.medium.search
2
4
0.112
81.6

r6g.large.search
2
16
0.202
147.6

r6g.xlarge.search
4
32
0.404
295.2

r7g.medium.search
1
8
0.107
78.12

r7g.large.search
2
16
0.214
156.24

r7g.xlarge.search
4
32
0.429
312.48

m6g.large.search
2
8
0.164
119.52

m7g.large.search
2
8
0.175
127.68

https://aws.amazon.com/jp/opensearch-service/pricing/
T-series instances
Burstable T-series instances (such as t2 or t3) can handle temporary loads but become unstable under sustained loads, so they should be avoided in production environments.
Conversely, for development environments, T-series instances may be chosen to keep costs down.
r6g.xxx.search series instances
https://docs.aws.amazon.com/ja_jp/opensearch-service/latest/developerguide/bp.html#bp-cost-optimization-instances
OpenSearch Service is constantly adopting new Amazon EC2 instance types that provide better performance at a lower cost. We recommend always using the latest generation instances.

Don't use T2 or t3.small instances for production domains because they can become unstable under sustained heavy loads. r6g.large instances are an option for small production workloads (both as data nodes and dedicated master nodes).
For stability, it seems safest to use r6g.large.search in production environments.
If higher specifications are needed, r6g.xlarge.search would be the next best choice.
r7g.xxx.search series instances
The r7g series, which is the next generation after the recommended r6g.large, includes the r7g.medium which is less expensive despite having somewhat lower specifications.
Since it's not a burstable instance, it offers stability, but the CPU and memory directly impact search and write performance, so caution is needed.
If you can sacrifice some performance, this could be an option to consider when trying to reduce costs.
On the other hand, r7g.large.search is more expensive compared to the 6th generation r6g.large.search.
OpenSearch Service is constantly adopting new Amazon EC2 instance types that provide better performance at a lower cost. We recommend always using the latest generation instances.
Although the documentation states this, the pricing may be revised downward in the future. Let's hope for that.
m6g.xxx.search series instances
According to this document, m6g.large.search is included in the recommendations as a minimum instance type for dedicated master nodes.
It's less expensive compared to the r6g.xxx series, so it might be a viable option.
Other instance types
There are other instance types like c6g.xxx.search and higher-spec instances, but including them would give us too many options to consider, so I'll omit them for now.
 Master Node & Data Node Count SelectionNext, let's look at selecting the number of master and data nodes.
Here we'll consider:
What is the appropriate number of master and data nodes?
Whether to add dedicated master nodes or not?
The recommended configuration is quite large with 3 dedicated master nodes + 3 data nodes, so I've enumerated some smaller patterns below.


Pattern
Master Nodes
Data Nodes
Characteristics
Use Case


Minimum Configuration
-
1
Low fault tolerance
Development/Testing

Low Workload Configuration
-
3
Can continue with 1 node down
Development or light production

Recommended Configuration
3
3
Cluster stability and redundancy
Production

Serverless Configuration
-
-
Automatic provisioning
Production

※In configurations with data nodes only, the data nodes also serve as master nodes.
Minimum Configuration
With only one data node, fault tolerance is low, and the entire cluster will stop if the node goes down.
Therefore, this is an option to consider for development or test environments where cost is a priority.
Low Workload Configuration
In a low workload configuration with 3 data nodes, operations can continue even if one node fails.
However, without dedicated master nodes to focus on cluster management (master election, shard allocation, etc.), there's a risk of unstable performance.
Since documents recommend using dedicated master nodes for production workloads, this configuration is suitable for development environments or light production environments.
Recommended Configuration
The recommended configuration consists of 3 dedicated master nodes + 3 data nodes, providing high cluster stability and redundancy.
This configuration also allows the use of Multi-AZ with Standby for higher availability.
However, the increased number of instances results in higher costs.
Serverless Configuration
For larger-scale operations, OpenSearch Services Serverless can also be considered.
OpenSearch Service Serverless doesn't require provisioning instances in advance and automatically scales, ensuring flexible scalability.
Computing power is measured in OpenSearch Compute Units (OCUs), and you're billed based on the number of OCUs used per hour.
For pricing details, this blog is helpful.
Minimum price with replicas enabled (2OCU: 0.5 * 4) ※For production

$488.976

Minimum price with replicas disabled (1OCU: 0.5 * 2) ※For development/testing

$244.488
Other patterns
First, configurations with an even number of nodes, such as 2 data nodes, should be avoided.

This was something I misunderstood as well, but with a configuration of only 2 data nodes, if one data node goes down, the cluster will stop due to quorum loss.
OpenSearch's Quorum value is calculated as "number of dedicated master nodes/2 + 1 (rounded down to the nearest integer)," so to prevent the cluster from stopping when one node goes down, at least 3 data nodes are needed.
So, a configuration with only 2 data nodes might seem highly available, but from an availability perspective, it's no different from a configuration with just 1 data node. For more details, refer to this document.
Having an odd number of nodes in the cluster ensures that during a network partition, there will be a group that meets the quorum (majority) requirement and can elect a new master.
When adding dedicated master nodes, you can choose between 3 or 5 master nodes, but a 5-node configuration becomes too expensive, so I've omitted it from this discussion.
 Personally Recommended Patterns For Development EnvironmentsIf cost is the primary consideration, for development environments:
Instance type t3.small.search
1 data node
would be sufficient.
If you want to increase the specifications, selecting t3.medium.search might be a good option.
 For Low Workload EnvironmentsPersonally, since the recommended configuration is quite costly, if you can sacrifice some performance and stability:
Instance type r7g.medium.search
3 data nodes
might be a good starting point.
From there, you can adjust the instance type and node count, or add dedicated master nodes, based on your workload.
T-series burstable instances often become unstable when CPU usage increases during shard reallocation (such as when adding nodes or updating versions), so I recommend avoiding them for production use.

In my experience, identifying the cause and recovering from such issues can take quite some time.
 For Production EnvironmentsIt's a difficult decision, but starting with the recommended configuration:
Instance type r6g.large.search
3 dedicated master nodes + 3 data nodes
would be the safest approach. (Though it costs quite a bit at $147.6/month × 6 nodes = $885.6/month.)
On the other hand, a serverless configuration (with replicas enabled) would have a minimum usage fee of $488.976/month, which is cheaper than the recommended configuration. However, if you have a stable workload over a long period, the serverless configuration might end up costing more.

Also, OpenSearch Serverless doesn't support some OpenSearch API operations and OpenSearch plugins.
If these constraints are acceptable, considering a serverless configuration might be a good idea.
One more point to note is that data migration between an OpenSearch Service domain and an OpenSearch Serverless Collection isn't yet provided by AWS, so if you decide to switch, you'll need to migrate the data yourself.
 ConclusionI found there's quite a lot to consider when selecting instance types and server configurations for OpenSearch.
Especially for production environments, it's important to select an odd number of nodes to avoid quorum failures and to introduce dedicated master nodes to ensure cluster stability.
I hope this article will be helpful for future OpenSearch deployments and operations.

I thought about OpenSearch configuration patterns

Instance Type Selection

Master Node & Data Node Count Selection

Personally Recommended Patterns

For Development Environments

For Low Workload Environments

For Production Environments

Conclusion

AWS Topics

Trending Topics

Products & Services

Features and Series

Instance Type	CPU	Memory	Hourly Cost (USD)	Monthly Cost (USD)
`t3.small.search`	2	2	0.056	40.8
`t3.medium.search`	2	4	0.112	81.6
`r6g.large.search`	2	16	0.202	147.6
`r6g.xlarge.search`	4	32	0.404	295.2
`r7g.medium.search`	1	8	0.107	78.12
`r7g.large.search`	2	16	0.214	156.24
`r7g.xlarge.search`	4	32	0.429	312.48
`m6g.large.search`	2	8	0.164	119.52
`m7g.large.search`	2	8	0.175	127.68

Pattern	Master Nodes	Data Nodes	Characteristics	Use Case
Minimum Configuration	-	1	Low fault tolerance	Development/Testing
Low Workload Configuration	-	3	Can continue with 1 node down	Development or light production
Recommended Configuration	3	3	Cluster stability and redundancy	Production
Serverless Configuration	-	-	Automatic provisioning	Production