Hello,
I use Elasticsearch last version and my cluster contains 3 nodes. I receive 100GB of data per day.
Actually, I have 6GB of heap size.
My Elasticsearch has a lot of latence when I search data. I would like to know the best config to improve Elasticsearch.
What is the specification of the hosts your cluster is deployed on? How much data do you have in the cluster? How many indices and shards is this data distributed across?
Each host has 4GB of RAM with 2GB for the heap size, 120GB of hard disk, 4 VCPU and Ubuntu 16,04.
Actually, I have 172000000 documents and I have an index per day.
The configuration of my index is 5 shards and 1 replica.
That sounds excessive.
How many indices do you have in the cluster?
Actually, I have 43 indices.
That means that you have 430 shards? That is a lot given the amount of data you have. You should probably look to reduce this significantly, e.g. through the shrink index API. Also have a look ate this blog post for some guidance on sharding.
I have 127 shards. In fact, I let the number of shards by default in the configuration.
Have you identified what is limiting performance? What does CPU usage, memory usage, disk I/O (and iowait) look like?
My CPU usage is at 25% so I think isn't the problem.
My memory usage is at 91%, it's problematic and I think improving it is a solution.
The heap size is at 1GB on each node.
My disk I/O is between 15 and 20 MB/s in the max.
Do you see messages about long or frequent GC in the Elasticsearch logs?
No,I have not logs about that and the last log goes back to 16/04.
I thought you said the heap size was 2GB per node.
If this is constant you probably need to increase the size of the heap.
How much iowait
do you see?
Yes, you are right. It's 2GB per node and 1GB is used.
Yes it's constant in the time.
IOWAIT changes a lot, I see 45% as max value.
I create one index by day and on each index I have 47 millions of logs.
What's the best number of shard to run this ?
Actually, I have 5 shards and 3 nodes.
That depends on the size. have a look at the blog post I linked to earlier for some guidance.
Is it possible with 275 millions of documents the search is long or with a good optimisation the search will be improved ?
What type of data do you have? How are you modelling your data? What type of queries are you running?
I have logs.
All my logs pass in the filter of Logstash but I don't know if it's that you hear in "modelling"
It's in the discover, I change the filter of date and I take a more large scale and the search is very long.
Make sure that you do not have a lot of small shards, as that can be inefficient and cause performance problems. Have a look at this blog post (linked to it earlier) for some practical guidelines.
If you can provide the full output of the cluster stats API, we will get a better view of the state of your cluster.
How many visualisations do you have in the dashboards that are slow? Are they slow for shorter time periods as well?
I have 6 visualisations and no if I take a less period with less logs the time is good.