I'm using elasticsearch's msearch for making a large number of batch queries. Problem is this is very slow,
I'm primarily using this for doing searches / counts over an index.
For eg. if I have 2000 of such queries lined up, they take forever to complete.
Is there a way to speed this whole thing up?
What is the specification of your cluster with respect to node count and hardware? How much data do you have in the cluster that you are querying? How many indices and shards is this distributed across? What does CPU and disk I/O look like when you are currently querying? What latencies and query throughput are you seeing?
I may not have responses to all of your questions but I'll try and answer whatever I can:
- I have about ~300 GB of data
- 4 indices, don't know about shards (?)
- If I use very large batch sizes two things happen:
- Either I get an error (something along the lines of cannot receive beyond 2GB of data)
- Or for modertely large batch sizes, ~8000 samples, I get a spike in RAM usage, almost eating up entirety of it
I currently use smaller batch sizes (200-1000) for this reason now. This is helpful when the total number of samples is something like 1000-4000 but when I have to make search queries over 100k samples for whatever reason, this becomes super slow. I've also noticed that larger batches take longer to do.
What types of queries are you running? How many nodes? What hardware specification?
I'm mainly doing search queries, eg. search for words or phrases in an index. Don't know about nodes but if you're asking about the hardware specs of the system I'm using this on, I'm on an 8 core i78850h with 16 GB of RAM.
Are you running this on a single host? What type of disk do you have?
What does CPU usage look like when you run the queries? What does disk I/O and iowait look like?
With that much data and queries I would not be surprised if you are limited by disk performance.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.