Slow query concerns, how to optimize?

Originally, there was an index a1. Now, it's modified to have index a1 with alias A, and index a2 with alias A. When querying using alias A, the query speed increases from 7ms to 60ms compared to directly querying using a1. However, after I delete index a2 and query index a1 directly, I notice that the speed doesn't recover to 7ms but instead becomes 50ms. What could be the reason for this?

Where did you get the measurements from?
Depending on the size of your "hit" documents, search time varies. 50ms is consider very good.
Caching is one of the reasons that you might see a very fast result occasionally (ie issue the same search very quickly back to back).

7ms to 50ms is a bad comparison/reference IMO.
I would increase the range of your search to include more documents so your search time is in the hundreds of MS to start with and compare.

A cluster of 3 nodes was set up, handling approximately 5 billion records. The search time has increased from the original 7 milliseconds to 200 milliseconds as the system has been running. In the scenario where I have an index with around 100 million documents, totaling about 20 gigabytes per day, and I need to retain data for approximately 30 days, how should I design the index layout? The searches involve performing full-text searches to find matches across the entire dataset.

What I believe is your 7ms is an anomaly. Our system is much bigger than yours and rarely get 7ms response.
Our searches are often in hundreds of ms if not seconds for complex aggregation.
Search performance is more related to how your schema is designed rather than number of indices.

3 nodes means you have 3 system CPUs. That's all you got. Your shard is at most 3 to fully utilize the CPUs. (beyond 3 would not give you benefit). So increasing number of your data nodes will also help.

I assume your system is write heavy like most DBs. Look at the bulkWrite performance is more critical IMO. We optimize for writes in our system. Search performance comes up when there's a gross issue like taking 30 seconds for a search. It often relates to very complex aggregation, and we will optimize the query string to improve it. If not, we will ended up redesign the schema to make it simpler to search. The end result is to eliminate complex search string.

No, no, no. While our system is write-heavy, our queries mostly involve simple WHERE clauses without aggregations. However, the query volume is very high, so we have strict requirements for query response times. When I tried using more threads in the program to increase concurrency, I noticed that as the number of connections to the Elasticsearch cluster increases, the disk bottleneck on the ES cluster hosts becomes more apparent, reaching or even exceeding 100% utilization. This directly leads to a decrease in query efficiency by dozens or even hundreds of times. So, without changing the hardware, figuring out how to structure the ES indices has become a challenging issue. Do you have any insights or ideas on this?

If you want to increase the capacity of the cluster in term of reads, you can think of increasing the number of nodes and increase the number of replicas of your indices.
But before this, I'd recommend checking if the hardware could be also upgraded (using SSD for example).

1 Like

That's what I would do as @dadoonet suggested. If you already know the disk is the bottle neck, using fastest SSD is the first thing to try. Next is to add more data nodes to increase capacity. I don't have success in read performance when increasing replica counts, even though it suppose to. (Maybe it's due to our write heavy nature?)
The next thing I would try is to move to local SSD instance (I assume you are using either AWS or GCP, or some cloud hosting service) instead of EBS. This gives a huge disk performance boost. But the down side is you really need to have enough data nodes to handle forced HW shutdown/replacement.

But again, 7ms is still an anomaly IMO. Shooting for sub 10ms consistently is not realistic. Others might be able to chime in on this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.