Index shard allocation for single node Elasticsearch environment


#1

I'm currently working with a single node Elastic Stack deployment. System specifications are 2 CPU / 8 GB RAM / 500 GB SSD. I've implemented most if not all optimization best practices (minus index / shard sizing).

I'm starting to notice that queries are taking longer and long as the number of indices and shards grow. We've got 59 indices / 455 total shards / 220 unassigned shards / 28 million documents / 21 GB data.

I've done some reading in the docs and came to the following conclusions:

  1. Since I'm using a single Elasticsearch node, having an index config with 5 shards / 1 replica per shard (shown below), is consuming unnecessary resources.

GET /winlogbeat-6.4.0-redacted/_settings

{
  "winlogbeat-6.4.0-redacted": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "10000"
          }
        },
        "refresh_interval": "5s",
        "number_of_shards": "5",
        "provided_name": "winlogbeat-6.4.0-redacted",
        "creation_date": "redacted",
        "number_of_replicas": "1",
        "uuid": "redacted",
        "version": {
          "created": "redacted"
        }
      }
    }
  }
}

Question: Should I consider shrinking my indices to 3 shards per index with 0 replicas? Or would it be even better to shrink my indices to 1 shard / 0 replicas? I don't anticipate indices to go above 10 GB in size during production.

This would be useful for a single node setup only, I think. When we scale to 3 Elasticsearch nodes I will switch to 3 or 5 shards with 1 replica per shard.

Thanks for reading!


(Christian Dahlqvist) #2

Please read this blog post for guidance on shard sizes and sharding. I would recommend going down to a single primary shard per index and perhaps also use the rollover API so each index can coven a time period longer than 1 day, especially if you have a longer retention period.


#3

Thanks for the link and recommendations. The link you provided was one of the resources I used to help guide my understanding of indices and shards.

Assuming I understand correctly, it seems the max size for a single shard should not exceed 20-40 GB when dealing with time series data. In my case I'm dealing with about 400~ unique fields (Windows Event Logs / Winlogbeat).

I'll be a bit more conservative since my data isn't time series and aim for 10-20 GB per shard. If my indices exceed that amount, I'll likely want to expand into a multi node cluster and/or use multiple shards per index.

Does this sound correct?


(Christian Dahlqvist) #4

That sounds like a reasonable starting point.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.