When should you split your indexes?

spuder · May 12, 2015, 8:50pm

I have logstash pulling logs from web servers and database servers and is displayed in kibana for the IT department.

The 3 sources of logs:

web-searches
web-errors
database-errors

All data is being pushed to daily logstash indexes.

The development team wants to run a substantial amount of queries against just one subset of data. (web-searches).

While they could do a filter, to only search logs tagged as 'web-searches', I want to know.

Would there be any performance advantage to putting the web-searches into their own index?
What guidelines constitute making a new index?
Do filters slow down searches? or require lots of cpu?

warkolm · May 12, 2015, 9:40pm

Personally I'd split all three out, depending on size. More for data hygiene reasons.
There are no guidelines on when to make a new index.
Filters are great, you are better off doing a filter than a query as it's a lot more efficient and it is also cached.

spuder · May 12, 2015, 10:18pm

So would you recommend making new indexes for every type of server that I add? I anticipate that the 3 sources could grow to 10. It seems like that would be a lot of shards and indexes

warkolm · May 13, 2015, 2:13am

Ideally you want to put similar styled data into the same indices, so system syslog in one, network in another and so on.

tylerjl · May 13, 2015, 6:31am

Hey @spuder, good to see you on here after meeting you at OpenWest.

Do you have some numbers regarding the quantity and size of the documents and your ES nodes? Getting a rough idea for what your daily indices look like can help determine where some optimizations can be made. Just getting a sample of the past few days indices from /_cat/indices would probably be enough. Also looking at your nodes with something like /_cat/nodes?h=host,heapPercent,heapMax,ramPercent,ramMax,load&v can help determine what type of load your nodes can handle. (/_cat/health is useful to see overall shard count as well - see the documentation for cat API if you need more information.)

Like warkolm mentioned, keeping your of documents/logs separated by index can help keep things organized.

spuder · May 19, 2015, 9:49pm

Thanks @tylerjl It was good meeting you too.

My indexes range from a few hundred megs to 25GB per day. Almost all of my indexes are less than 5GB.

I had about 30 indexes open at one time.

30 Indexes
5 Shards
1 replica

30 * 5 * 2 = 300 shards open at once.

I've since dropped that down to about 2 weeks worth to help with a related performance problem

/_cat/indices
....
green open  logstash-2015.05.19  5 1 22459568 0  18.7gb   9.6gb
green open  logstash-2015.05.14  5 1  5710772 0     6gb   2.9gb

/_cat/nodes?h=host,heapPercent,heapMax,ramPercent,ramMax,load
swat-elasticsearch02.ndlab.local 17 7.8gb 57 15.6gb 0.57
swat-elasticsearch03.ndlab.local 28 7.8gb 57 15.6gb 0.44
swat-elasticsearch01.ndlab.local 28 7.8gb 57 15.6gb 0.52

If I do split up the shards, that will mean that there are way more indexes.

90 indexes (30 * 3)
5 shards
1 replica

90 * 5 * 2 = 900 shards

Is going from 300 shards to 900 shards going to reduce performance?
Should I reduce the sharding from 5 down to 2 ?

warkolm · May 20, 2015, 8:04am

That many shards will reduce performance unless you have them spread across multiple nodes.

I'd definitely reduce the shard count.

spuder · May 20, 2015, 3:24pm

Is there a guideline for how many indicies and shards are too many?

warkolm · May 20, 2015, 9:24pm

Nothing hard at the moment, it's more experience gained from in the trenches

Topic		Replies	Views
Should you split indexes by category Elasticsearch	2	671	July 5, 2017
Help me understand the use case for indices Kibana	6	1163	April 6, 2017
Should I run multiple indexes? Elasticsearch	7	1796	July 5, 2017
One large index vs. many smaller indexes Elasticsearch	5	10651	July 6, 2017
Splitting index into smaller ones Elasticsearch	5	3629	July 6, 2017

When should you split your indexes?

Related topics