When should you split your indexes?

I have logstash pulling logs from web servers and database servers and is displayed in kibana for the IT department.

The 3 sources of logs:

  • web-searches
  • web-errors
  • database-errors

All data is being pushed to daily logstash indexes.

The development team wants to run a substantial amount of queries against just one subset of data. (web-searches).

While they could do a filter, to only search logs tagged as 'web-searches', I want to know.

  • Would there be any performance advantage to putting the web-searches into their own index?
  • What guidelines constitute making a new index?
  • Do filters slow down searches? or require lots of cpu?

Personally I'd split all three out, depending on size. More for data hygiene reasons.
There are no guidelines on when to make a new index.
Filters are great, you are better off doing a filter than a query as it's a lot more efficient and it is also cached.

So would you recommend making new indexes for every type of server that I add? I anticipate that the 3 sources could grow to 10. It seems like that would be a lot of shards and indexes

Ideally you want to put similar styled data into the same indices, so system syslog in one, network in another and so on.

Hey @spuder, good to see you on here after meeting you at OpenWest. :smile:

Do you have some numbers regarding the quantity and size of the documents and your ES nodes? Getting a rough idea for what your daily indices look like can help determine where some optimizations can be made. Just getting a sample of the past few days indices from /_cat/indices would probably be enough. Also looking at your nodes with something like /_cat/nodes?h=host,heapPercent,heapMax,ramPercent,ramMax,load&v can help determine what type of load your nodes can handle. (/_cat/health is useful to see overall shard count as well - see the documentation for cat API if you need more information.)

Like warkolm mentioned, keeping your of documents/logs separated by index can help keep things organized.

Thanks @tylerjl It was good meeting you too.

My indexes range from a few hundred megs to 25GB per day. Almost all of my indexes are less than 5GB.

I had about 30 indexes open at one time.

30 Indexes
5 Shards
1 replica

30 * 5 * 2 = 300 shards open at once.

I've since dropped that down to about 2 weeks worth to help with a related performance problem

/_cat/indices
....
green open  logstash-2015.05.19  5 1 22459568 0  18.7gb   9.6gb
green open  logstash-2015.05.14  5 1  5710772 0     6gb   2.9gb
/_cat/nodes?h=host,heapPercent,heapMax,ramPercent,ramMax,load
swat-elasticsearch02.ndlab.local 17 7.8gb 57 15.6gb 0.57
swat-elasticsearch03.ndlab.local 28 7.8gb 57 15.6gb 0.44
swat-elasticsearch01.ndlab.local 28 7.8gb 57 15.6gb 0.52

If I do split up the shards, that will mean that there are way more indexes.

90 indexes (30 * 3)
5 shards
1 replica

90 * 5 * 2 = 900 shards

Is going from 300 shards to 900 shards going to reduce performance?
Should I reduce the sharding from 5 down to 2 ?

That many shards will reduce performance unless you have them spread across multiple nodes.

I'd definitely reduce the shard count.

Is there a guideline for how many indicies and shards are too many?

Nothing hard at the moment, it's more experience gained from in the trenches :wink: