Range query blocks elasticsearch

Hi everyone,
I'm am almost a beginner in elasticsearch and I need a little help.
I built a monitoring infrastructure with ELK for 60 nodes, with a single instance of all ELK stack with monitoring plugin on a single node, and it's been working for three months. Basically, my logstash collects three differents input:

  • Input from metricbeat with 7 metricsets of system module from about 60 nodes, with 1 minute interval;
  • Input from 15 exec inputs with a simple ruby filter parsing, executed each minute;
  • Input from 1 exec input with a simple ruby filter parsing, executed each minute.

So, for each day, elasticsearch has three different indexes. For example, indexes of yesterday are (data has been taken from monitoring plugin):

  • Index1-2017.08.15, document count 2.4m, data 917.7 MB, index rate 0 /s, search rate 0/s, and Unassigned shards 5
  • Index2-2017.08.15, document count 20.2k, data 299.7 MB, index rate 0 /s, search rate 0/s, and Unassigned shards 5
  • Index3-2017.08.15, document count 1.4k, data 52.1 MB, index rate 0 /s, search rate 0/s, and Unassigned shards 5

Each day a crontab starts to delete indexes older than 8 weeks, in order to release space and shards (so there are about a total of 860 shards in elasticsearch everyday).

Each index has a field called type, and I can select these documents from others with the type. For instance, all documents in index1* have type: "type1", etc...

I need to collect all data in a time range (for example the last 5 minutes of data), and to select them with type field in order to avoid to collect wrong data. Actually, I have a lot of documents in 5 minutes, so I used scroll settings, using for example 20s of context's time and 100 pages for scroll.

I also need to run sometimes another module which calls this query a few times with intervals in a determined time. For example, to gather all the past month metrics, I must have these metrics divided in 5-minutes interval, then I call the following query 5 times at time until I reach all the month. Jobs can't be more than 5 at times to avoid an overload of elasticsearch.

The query I built is (for example):

//I need to scroll to get all results
scroll: 20s,
size: 100,
//This sort operation should optimize elasticsearch because I don't need any sort
sort: ["_doc"],
body: {
  query: {
    bool: {
      filter: [
        //Filter documents for type
        {terms: { type: ["type1", "type2", "type3"]}},
        // Filter documents for arrival time
        {range: {
           "@timestamp": {
             gte: now-290s,
             lte: now
            }
           }}
        ]
      }
   }
}

Now, I run this query with nodejs every 5 minutes and I use the other module which uses this query with 5-minutes intervals to collect last months metrics. It works properly for the first minutes, but after a while elasticsearch node stops working (logstash doesn't provide any events, elasticsearch doesn't respond and kibana finds that elasticsearch plugin is red).

I don't know if it is a problem of reindexing, heavy query, elasticsearch settings, etc...

Can someone help me?

Thank you :grinning:

Hi @Marco_42,

What do you mean by that? Does the Elasticsearch process die? If yes, what's the error message? Does the node go out of memory? I suggest that you install X-Pack Monitoring to get an overview of potential bottlenecks (you can use the free X-Pack Basic license) or you can also use the various cluster APIs to find out how Elasticsearch is doing.

Daniel

Hi @danielmitterdorfer,
Thanks for your answer.
Fortunately, I've already found a solution. I explain that in case of it could be useful to anyone. To be more clear, there were more than one single problem, but it was my fault, not ELK. I've already installed the monitoring pack, and it is very helpful.
So, basically, Elasticsearch process didn't die, but it didn't answer to queries. It returned this error:

org.elasticsearch.transport.RemoteTransportException: [QwB-F9A]
                              [127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@1da35753 on EsThreadPoolExecutor

I found that it could be solved giving more pools, but I knew that probably I had created a complex query.
Then, after some researches, I've found that:

  • Each time I call a scroll and I use "doc" sort, I have to close it even if it is exhausted https://github.com/elastic/elasticsearch/issues/16929. This was a problem because I've been opening a lot of scroll contexts at time;

  • Type query is not optimized for time-series data. Unfortunately, this query is performed on all indexes because all indexes could contain the required filters. Fortunately, indexes are new for each day, so, knowing that I need to gather only data between two dates and that the names of indexes start with the type, I fixed this problem with this new query:

    {
          index: ["Index1-2017.08.15", "Index2-2017.08.15", ...],
          scroll: 20s,
          size: 100,
          sort: ["_doc"],
          body: {
              query: {
                  bool: {
                      filter: {
                          range: {
                              "@timestamp": {
                                  gte: now-290s,
                                  lte: now
                              }
                          }
                      }
                  }
              }
          }
      };
    

Thank you again

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.