Question: I create one index with one shard per day on Elasticsearch. When I perform queries against several days worth of these indices, it is very slow. Would anyone that has implemented a similar architecture give some guidance on what could be changed to make the queries more efficient?
Description of the system:
- Using elasticsearch hot-warm-cold architecture for time-base indices
- One index of data per day of around 140MB
- One shard per index.
- An index stays in the hot phase for 1 day
- From day 1 to day 60 it is in the warm phase
- From day 60 to day 180 the index goes to the cold phase. After that it is deleted
- The health metrics of our ES cluster seem just fine. It doesn't seem to have too much CPU, Memory pressure
The machines on the cluster are described below. The high IO ones are used in the HOT phase. The high storage are used in the warm and cold phases. The data is replicated in two regions as shown in the picture.
Description of the problem:
When I do a query over these indexes, the query is very slow. To illustrate that, I performed a query for a single _id. It takes more than 40 seconds. The image below is the the kibana output of that query.
It seems for me that ES does not perform the query of each index in parallel. Since when I go to the profiler, the total time it takes seem to be the sum of time times for each of the indices. Below there is the image reflecting the profiler result while also querying for the _id
I also posted this on SO: https://stackoverflow.com/questions/59898636/elasticsearch-query-over-multiple-indexes-very-slow?noredirect=1#comment105932184_59898636