I'm new to Elasticsearch and I'm trying to understand the best practices to improve performance in my scenario. I currently have 18 indices, one for each environment and location, each index has 1 shard and is < 1 GB in size:
\> Examples: search_books_us, search_forum_us, search_books_br, search_forum_br, etc.
The document structure is basically the same across indices:
{
"title": "text",
"description": "text",
"content": "text",
"type": "text"
}
I want to keep the separation by location (US, PT, etc.), but I'm not sure if it’s better to:
- Keep the 18 small indices (~1 GB each).
- Merge related indices, for example:
* merge all search_books_* with search_forum_*
* end up with ~9 indices (~2 GB each)
* distinguish document types using a field filter:
"filter": [{ "term":{ "type": "FORUM" }}]
3. Merge everything into a single large index (~18 GB) and filter by both type and an adition field location. (I’m not very comfortable with this option because I prefer keeping the location context separated.)
I’ve read some topics about this, but they are based on old/deprecated Elasticsearch versions:
- https://stackoverflow.com/questions/11042021/how-can-i-organize-my-elasticsearch-indexes
- https://stackoverflow.com/questions/39233992/separate-indices-or-use-type-field-in-elasticsearch
I also read recommendations saying that shards in the 20–40 GB range are generally healthy, but in my case options 1 and 2 would still result in much smaller shard sizes, even after merging. So my question is:
- Is it better to keep many small indices, merge some of them and use filters, or aim for fewer/larger shards?
Am I heading in the right direction? Any guidance on the trade-offs (query speed, resource usage, cluster/heap overhead, search strategy, etc.) would be really helpful. Thanks!