Implications of changing search queue size?


(Andrew Swan) #1

Hi, I've got an application that uses the same basic index strategy as logstash (one index per day), but I have data that covers several years, which means I have thousands of indexes. Now, I want to run an aggregation across these years of data, but that quickly causes overflow in the default 1000 element search queue.
I realize that an ideal solution would be to change the index strategy to have fewer indexes rather than a very large number of small ones, but for various reasons, that's not easy to do with this application.
The most expedient solution appears to be increasing the search queue size, but I'm having a hard time finding any concrete description of what consequences this might have? I understand that some search operations may take longer if they have to wait for a while in a longer queue. Are there other things I should be concerned about? Is the memory consumed by requests waiting in the search queue anything to worry about (particularly given that there are gigabytes of segment data already filling up the heap)?
Another possible solution is to perform a series of smaller aggregations from my application and then combine the results myself, but that feels wasteful as it duplicates a task that ES is already quite good at.
So, any guidance or words of wisdom?

Thanks in advance...

-Andrew


(Mark Walkom) #2

Well a larger queue means more things get held in memory until they process, which means less heap for querying.


(Andrew Swan) #3

Thanks for the reply, I was hoping to quantify the overhead further though. As I understand it, an incoming query turns into a search operation per shard that the query addresses. My queries generally aren't huge, the JSON representation is a few hundred bytes. If an item in the search queue is just the raw query plus the identity of the shard, that sounds like it would be on the order of one or at most a few kilobytes. With a queue size of 10,000, that would be in the neighborhood of 10 MB which doesn't sound like a substantial cost compared to the gigabytes of heap consumed by segment memory, etc.
I'm not very familiar with the ES internals, is my crude understanding of what goes into the search queue more or less correct or am I way off the mark?

-Andrew


(system) #4