When should you split your indexes?


(Spuder) #1

I have logstash pulling logs from web servers and database servers and is displayed in kibana for the IT department.

The 3 sources of logs:

  • web-searches
  • web-errors
  • database-errors

All data is being pushed to daily logstash indexes.

The development team wants to run a substantial amount of queries against just one subset of data. (web-searches).

While they could do a filter, to only search logs tagged as 'web-searches', I want to know.

  • Would there be any performance advantage to putting the web-searches into their own index?
  • What guidelines constitute making a new index?
  • Do filters slow down searches? or require lots of cpu?

How do you change the default number of shards?
(Mark Walkom) #2

Personally I'd split all three out, depending on size. More for data hygiene reasons.
There are no guidelines on when to make a new index.
Filters are great, you are better off doing a filter than a query as it's a lot more efficient and it is also cached.


(Spuder) #3

So would you recommend making new indexes for every type of server that I add? I anticipate that the 3 sources could grow to 10. It seems like that would be a lot of shards and indexes


(Mark Walkom) #4

Ideally you want to put similar styled data into the same indices, so system syslog in one, network in another and so on.


(Tyler Langlois) #5

Hey @spuder, good to see you on here after meeting you at OpenWest. :smile:

Do you have some numbers regarding the quantity and size of the documents and your ES nodes? Getting a rough idea for what your daily indices look like can help determine where some optimizations can be made. Just getting a sample of the past few days indices from /_cat/indices would probably be enough. Also looking at your nodes with something like /_cat/nodes?h=host,heapPercent,heapMax,ramPercent,ramMax,load&v can help determine what type of load your nodes can handle. (/_cat/health is useful to see overall shard count as well - see the documentation for cat API if you need more information.)

Like warkolm mentioned, keeping your of documents/logs separated by index can help keep things organized.


(Spuder) #6

Thanks @tylerjl It was good meeting you too.

My indexes range from a few hundred megs to 25GB per day. Almost all of my indexes are less than 5GB.

I had about 30 indexes open at one time.

30 Indexes
5 Shards
1 replica

30 * 5 * 2 = 300 shards open at once.

I've since dropped that down to about 2 weeks worth to help with a related performance problem

/_cat/indices
....
green open  logstash-2015.05.19  5 1 22459568 0  18.7gb   9.6gb
green open  logstash-2015.05.14  5 1  5710772 0     6gb   2.9gb
/_cat/nodes?h=host,heapPercent,heapMax,ramPercent,ramMax,load
swat-elasticsearch02.ndlab.local 17 7.8gb 57 15.6gb 0.57
swat-elasticsearch03.ndlab.local 28 7.8gb 57 15.6gb 0.44
swat-elasticsearch01.ndlab.local 28 7.8gb 57 15.6gb 0.52

If I do split up the shards, that will mean that there are way more indexes.

90 indexes (30 * 3)
5 shards
1 replica

90 * 5 * 2 = 900 shards

Is going from 300 shards to 900 shards going to reduce performance?
Should I reduce the sharding from 5 down to 2 ?


(Mark Walkom) #7

That many shards will reduce performance unless you have them spread across multiple nodes.

I'd definitely reduce the shard count.


(Spuder) #8

Is there a guideline for how many indicies and shards are too many?


(Mark Walkom) #9

Nothing hard at the moment, it's more experience gained from in the trenches :wink:


(system) #10