I have the following issue. Our largest datastream ("C") is responding poorly to the queries.
The main parts of the stack:
10*hot nodes (each: 16 cores, 60GB+ memory)
6*warm nodes (each: 16 cores, 60GB+ memory)
The datastream's specs ("C"):
retention on hot: 170 hours
retention on warm: 31 days
backing indices' size: ~1.4TB
30 primary shards per index
on average 3,200,000,000 documents per index
The allocation policy is 5 shards per node.
During the warm phase, the segments are force-merged to 1.
If I make a request for the past 5 days, I get a response in ~10 seconds. However, if I send the very same request for an interval where only the warm nodes are affected, it takes 100-200 seconds (it's a prod environment). When I ran a query for 30 days, it took 25 minutes to finish.
While I do understand that the hot nodes have more processing power (+4 VMs), those are constantly indexing data (IR: on average 100K/s for all nodes in the cluster). So, I don't think that "idle" nodes should respond this slowly.
I checked, and the CPU usage is well below 50%. I don't see issues with the JVM heap either (during one of the requests, I was monitoring it). Also, I/O isn't reaching half the maximum of the SSD's capacity. Naturally, the setup of all the nodes is the same; they're in the same DC, and there isn't any rouge process blocking ES on the warm nodes.
I'd really like to avoid throwing resources into the issue, which won't really solve anything but "hides" the issue, and naturally, money is a lot more finite resource.
Does anyone have any pointers on what can be done to speed up the search on the warm nodes? I respectfully don't care about the query performance (it can be optimized a little bit), since, as I told it's running 10-20 times faster on nodes that are a lot busier than the warm cluster. I'm happy to provide further information if possible and needed.
Right couple of things that I would start with before throwing resources at it:
how many fields?
are field properly mapped?
what is the size of a single primary shard on an index?
what is the size of your task queue and what nodes do the tasks hit?
These are some intial areas I would look at to get an idea of where problems might be occuring, keeping in mind that some would (exponentially) increase if multiple areas are hit.
I understand you don't care about the performance of your queries but I do feel like there is a lot going on as the response times you mentioned are no where near what elastic is "known" for.
There are 30-40 fields per document (~36 on average).
Yep, all fields are mapped per type properly.
One shard is around 48GB.
It's a big prod cluster, so there is constantly something to do. At this moment, there were 40 tasks, a maximum of 4 on one node.
Regarding the query, it's a cardinality one that is resource-intensive, and it's a main KPI that we must have. In the past years (before 8.X and data streams), I started optimizing the query, and I'll do the last step soon (moving all possible filters to pre-processing). That's why I wrote that I don't want to focus on it.
I was doing some calculations, and it seems that while the hot nodes have only ~1.1 shards per node, the warm ones have ~8. Still, I feel like the resources aren't utilized on the warm nodes since I don't see peaks in any of the charts while these queries are running. This issue may have something to do with thread pools or something and can't be helped, but I wanted to ask around here, too, to see if I'm missing something.
There are 30-40 fields per document (~36 on average). This should be more the OK. even if you in total get thousands of unique fields (try to stay below 10000 though)
Yep, all fields are mapped per type properly. great, that should help
One shard is around 48GB. that should be oke, shard sizes should be aimed between ~40 and <60)
It's a big prod cluster, so there is constantly something to do. At this moment, there were 40 tasks, a maximum of 4 on one node. Makes sense, do you feel like there are tasks running behind?
Follow up question time:
If possible could you
share your ILM policy (preferably the json export)
share the query you are sending?
How many shards are there in total on your warm nodes p/node?
My thinking:
I want to understand what actions your are taking in moving from hot to warm tier. e.g is a force merge done or not.
the query to better understand how it's build up.
ES has a build in limit of 1000 shards p/node on warm. This could have been disabled which is not recommended and oversharding could explain what you are experiencing.
One of the things you could do with your query btw is to use the search profiler in dev tools to figure out the expensive parts.
Do all nodes have access to local SSD or are you using some SSD backed networked storage?
So you have 7 days worth of indices on the hot nodes and 31 days on the warm nodes. How many shards are on each hot and warm node? Do you have the same number of replicas on the hot and warm tier?
Is the interval length the same in both queries, but just further back in time?
So you are forcemerging down to a single segment. Are you making any other changes, e.g. number of replicas or perhaps merging into a smaller number of shards per index?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.