Daily Indicies for logs, average Indicie size is 500MB-1GB (Around 20 are 10GB+)
I am having an issue when search far back (Around 6 months+) I am getting time outs/shard failed errors. There is not really anything useful in the elastic search logs except this (Occasionly not every time)
org.elasticsearch.transport.RemoteTransportException: [sys-elastic-data-1][172.16.10.122:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<agg [1]>] would be [4031321656/3.7gb], which is larger than the limit of [4013975142/3.7gb], real usage: [4031316536/3.7gb], new bytes reserved: [5120/5kb]
Sometimes when I get the timeouts, if i then click refresh the data loads straight away, so it almost like its nearly loading it/caching it and then showing.
I am just looking for some advice on improving performance.
What I am thinking atm is to force merge all Indicies - Will this help?
Will increase my nodes help? Since all my Indicies have 3 shards, I dont see how scaling up my clusters could help unless I re-index and change the amount of shards to match the node count.
If you have 1 replica configured you have around 1500 shards in the cluster, which sounds like a lot for the amount of heap you have. If you have a long retention period, try to switch to monthly or weekly indices so you get to an average shard size over 10GB.
If you are suffering from heap pressure, forcemerging older indices down to a single segment should help.
When I look in the Monitoring section in Kibana, the heap usage never really goes over 50%.
At the moment it is sat @ 9.0 GB / 20.7 GB
We don't really search this ES cluster alot, it is basically just for statistics ect (It's not like being searched every minutes) probabaly only a few times a day.
In regards to force merge, do I need to make the index read-only before doing this? Currently they are all open, however only the latest days indicie is written to.
Would increasing the amount of nodes I have help? - Or would this not be useful unless I reindexed everything to have say 6 shards (for 6 nodes) I read somewhere your sharing policy should be around 1:1 for your amount of data nodes.
The problem is , I don't know before hand. So it is my download CDN logs, and some days they are 1GB some days they are 20GB, but its the same Index, just different days. so in terms of creating them, there is no way of logstash knowing in advance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.