I have 5 nodes ES cluster, and around 45 servers, sending the log to kafka to the ES. I have build around 8 visualization from my log, And saved it into the dashboard.
Now, my dashboard is taking much time to load and if the time frame is more than a day,it will not even get loaded. If i keep the time frame short enough like 15 mins, or an hours,It gets loaded perfectly fine.
I have no idea, why is behaving like that ?
KB is only as fast as ES, so what specs are your nodes, how much data is there, how many indices and shards?
I have 5 nodes, 8gb ram each, 2 cpu each and 200 gb Volume each. I have 344 shards, which was created by default. I have allocated 4gb heap size to the elastic search. Following is the amount of data that gets collected each day, which is being send constantly to the ES though out the day. Two new indices gets created each day with the flowing size.
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open collectdmetrics-2016.05.03 5 1 98612060 0 37gb 18.5gb
green open tomcatlog-2016.04.22 5 1 10571390 0 6.2gb 3.1gb
You should reduce the shard size, you really only need one primary for that data size.
By default, ES creates 5 shards per index. So, i should reduce to 1 shards per index ?
That's my suggestion, yes.
Then on what case we need more shards ?
Lowering down the shards didn't helped much. I do see better performance, but not huge one. Kb is still taking around min to load the dashboards.
Which versions are you running?
Also, if your old indices still have 5 shards (e.g., you didn't reindex them), then it's still going to be slow whenever it touches those day's indexes. A 3 GB shard is pretty small, so you could even look at reindexing an entire month (after it's done) into a single 1 or 2 shard index to improve query performance.
Thinking about what ES has to do whenever Kibana sends it a time range request:
- Lookup each index being requested (e.g., let's say 7 days).
- Find all of their shards.
- If each index has 5 shards, then that's 5 * 7 (35) shards that need to be queried. Even if it's fast, that's going to take some time to do all of the communication.
- Even worse, each shard needs to return back a minimum number of documents. Let's say each shard needs to return 10 documents (it's more), then that's 350 documents that need to be passed around the cluster wastefully.
All of that adds up to a slower response because there are too many shards, which is why fixing the sharding will improve things. Similarly, once you have finished writing to yesterday's index, you should look into "optimizing" that index by force merging it.
Thank you for getting back to me and making the idea of shards clear to me.
I am using kibana 4.5 and ES "2.3.1". I have reduced the number of shards per index to 1 and m making the search within the single shards indexes. Also, i did the force merging to all my indexes for past 7 days like below
curl -XPOST 'http://localhost:9200/log-2016.05.10/_forcemerge/max_num_segments=1'
that did help little bit, but the same performance issue is persistent ?
Then it comes down to what the visualizations themselves are doing. Each visualization series of data represents an aggregation. Aggregations can be relatively cheap or very expensive based on what they are doing. The more indices that get touched (by looking across days), the more expensive it becomes.
My suggestion would be to take a look at the raw
Request from Kibana (viewable below the chart in the Visualization section) and see how they individually perform within ES. You are likely to find a bottleneck there.