I've read so much on this, but dated information, and wondered if things have changed or anyone knows of a good way to handle this in kibana/elastic.
So given an index name that has a date in it e.g events_10_2018 with 10 being october and 2018 being year.
When we setup and index pattern of events_* that obviously selects every month.
If we ask for last 30 days in the time picker in Kibana, it appears to be searching all the indexes including where there is no data in those indexes, i.e events_09_2018. This is causing a massive overhead and performance problem.
I know we could create index pattern for each month or year, but that is a headache maintenance wise, and as each visualisation links to an index pattern it's an issue.
So is there a better way to ensure kibana is aware that the index name contains month and year, and uses that when it's filtering with the time picker? Or can I use the value of the time picker in filters for each visualisation for the _index field?
Kibana used to be using the _field_stats API directly in Elasticsearch to determine which indices should be used to execute these time based queries against. However, Elasticsearch added an additional search phase which generalizes this approach which is further explained here and eliminates the need for Kibana to implement this functionality.
Have you tried use the Elasticsearch profile api to see what component of your search is taking the longest in your environment?
Hi Brandon, thank you for your time and reply.....
yes I have read up on the _field_stats API. I've not used the profile API, but I've proven that Kibana is looking at all of the indexes and not just those for the time period.
So I setup 2 index patterns. One which was events_* and one that was events_10_2018. Obviously the later only covers 1 months of indexes, and the other covers all of them. In Kibana selecting 30 days, in theory, Kibana should know that it should just search the index containing events in the last 30 days. In the * index, it takes 5-10 seconds to just return anything even a count in discover. Thats with only 5000 documents in it spanning over 5 years. If I do the same with the month specific index it's instant.
While we could setup index patterns for every month/year, the visualisations don't really lend themselves well to switching index pattern, e.g each year.
Am I wrong in thinking atm there is no other way of telling Kibana that my index names have a date in them and to make use of this when filtering with the time picker?
Just to give an example with the explain API, you can see the one with the *, elastic has skipped the indexes not needed for the search, but as you can see it's still taken it much longer to execute.
Am I wrong in thinking atm there is no other way of telling Kibana that my index names have a date in them and to make use of this when filtering with the time picker?
That is correct for Kibana 6.0+, after Elasticsearch introduced that search optimization.
Just to give an example with the explain API, you can see the one with the *, elastic has skipped the indexes not needed for the search, but as you can see it's still taken it much longer to execute.
This search request is accessing quite a few shards, how large are your shards?
Hi Brandon, this is just with test data, so only 5000 documents, spanned over the 5 years, so maybe 1 or 2 documents per month. We've pushed in 100k documents over the same number of shards and confirmed that the times don't really increase with volume of documents, only volume of shards.
We've scaled up on aws to double the number of cores, and confirmed that increases performance by a factor of 2, sometimes over 2. So that is an option for us. We also can combine the number of indexes. We are creating an index per month per document type (we have around 30 types with different properties), per year. I dont see any reason why we couldn't drop the month out, reducing the number of shards by a factor of 12. I think creating index patterns for each year would be a pain atm because Kibana doesn't make it easy to switch index pattern for the visualisations.
Some of this may become a non issue if our clients don't need 5 years of data, we don't know yet so are working on worst case possibility for our performance tests. But given that we all know if you ask a client how long you want to keep data for, they will always say forever . I've designed everything around hot/warm architecture, but currently only have a need for the hot, but potentially we are already setup to move older data to smaller nodes (xpack costs may make this impossible, we are 3 node platinum holders).
thank you for your help thus far. I do hope these niggles get ironed out in some respect in the product over the next few versions. I know 7 is in development. So many people have asked similar questions as this over the years so Im sure the team are aware of these limitations.
@amageek, the other thing you guys might look at is reducing the number of shards per index to 1. The general rule of thumb is to keep shard size under 50GB, and to start at 1 and scale up from there.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.