Index created with 10 shards with roll over api when first index created was with 30 shards


(sgu) #1

Elastic Search/Kibana 5.2 and Logstash 5.0.2 with kafka 0.9

We create the first indices 30 shards with logstash*-000001 and using time/log count and curator with rollover api and Alias to rollover indices. We are getting indices created with rollover but with 10:1 shards:replicas. Not sure where ES or Logstash is taking 10:1 instead of 30:1. Any help will be greatly helpful.

FYI.. we have other indices created for diff app with 10:1 with same elastic search cluster, not sure it matters.

Thank you in advance


(Aaron Mildenstein) #2

Did you create this manually with curl? The Console in Kibana? Some other way? The answer matters a great deal.

It's almost certain you have an index template that sets the default number of shards, and that is being applied. Rollover does not pass on shard count by inheritance.

If you want to force the next index to have a certain number of shards/replicas via Curator, you can do so using the extra_settings option.

I would fix the index template, however, to ensure the mappings and everything else are what they should be.

All of this raises the question, though, why do you need 30 shards per index? How many data nodes do you have? How many events per second are you trying to index?


(sgu) #3

Yes The first indices were created manually with 30:1 as index were creating with 10:1 even though template was set at 30:1 . We also verified the index template to be 30:1 but no luck not sure why.

Thank you for the extra_settings option. will give it a try.

testing phase and trying different options, but let me put the details and would appreciate if you can comment/suggest.
18 Data nodes with 2 master with trying to scale for our peak traffic 120mil/hour


(Aaron Mildenstein) #4

120 million events/hour translates into roughly 33,333 events per second. You should not need 30 shards to accomplish that. I think you'd probably be okay with 9:1, so that each node has 1 primary or 1 replica, which maximizes the I/O on each node, i.e. each node is only writing to one shard at a time. It may take time to get to that balance, since you have a larger number of shards than can be equally balanced on nodes right now, but in the end it will be worth it.

You should also have 3 master nodes, because with 2, if even one goes down, your entire cluster goes offline.

That high of a shard count will also have dramatic repercussions in terms of cluster management and performance if you plan on keeping more than 1000 shards per node, total (and that, only if you have 30g heap on each data node—the max number of shards per node before hitting those consequences falls off if the heap is smaller than 30g).


(sgu) #5

Thanks and really appreciate your inputs. will try these.
Would like to ask for more from Indexer and ES perf tuning

ES index buffer sizes, and Queue Sizes and Mem suggestions ?

Logstash - Indexer -- if any Instances setup considerations
flush_size ?
pool_max ?
pool_max_per_route ?
Any conf suggestions ?

Thanks


(Aaron Mildenstein) #6

Please start a new thread/topic for those questions.


(sgu) #7

Thank you. and the extra settings option did work.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.