Query performance questions

carlosrdrz · February 24, 2016, 8:02pm

Hello!

I have some questions about how queries work and what's the effect of some config options.

I have 14 indexes (I'm using the logstash-YYYY-MM as their names to have documents from the same time range in the same index) each one with 5 shards and 1 replica. Each index has (approx) 10 GB of data and 25 millions of documents.

I'm using the following mapping:

{
  "template": "logstash-*",
  "settings": {
    "index": {
      "refresh_interval": "5s",
      "cache": {
        "query": {
          "enable": "true"
        }
      }
    }
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "strings": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping":   { "type": "string", "doc_values": true, "index": "not_analyzed" }
          }
        }
      ],
      "_all":    { "enabled": false },
      "_source": { "enabled": true },
      "properties": {
        "@timestamp":         { "type": "date",    "doc_values": true }
      }
    }
  }
}

I'm mostly interested in analytics, so I'm using aggregations heavily. Most aggregations are data_histograms on @timestamp, but I also use different terms aggregations on string fields (let's call them A, B and C). Also, it's worth mentioning that I want to support full-year queries in acceptable times, which will get data from 12 indexes.

Currently, I'm using a 5 node cluster where each node has 2 cores and 7.5 GB of RAM, so in total I have 10 cores and 37,5 GB of RAM.

My main issue is that I'm getting high response times (30 seconds, and after that the request timeouts) and seeing 100% CPU usage when trying to display a full-year dashboard with some charts, and I want to understand if I'm just hitting a CPU bottleneck because of the amount of data that is being queried, or if something is wrongly configured.

These are some observations and some questions:

Only 25% of RAM is being used. I thought that aggregations relied on RAM to make them work faster, so I was expecting to have much more data in the heap (although I know I'm using doc_values, which are supposed to store fielddata in disk).
When I use the cat API to show fielddata on nodes and how many RAM they're using, I can see A, B and C listed there, but timestamp is not. Is that normal? Shouldn't timestamp be listed there, too?
I tried the following: I requested a year dashboard with Kibana and then I used the cat api to list the thread pools. I saw that each node had 6 search threads for the search pool, with like 100 pending tasks on each pool. AFAIK each thread processes one shard, so does that mean that if I reduce the number of shards per index, I would have less pending tasks on the pool and therefore better response times or that would just reduce the pending tasks but make them run longer?
It's stated in the docs in several places that it's okay to overallocate shards. Doesn't that have more performance issues like the one pointed out in point 3? Could reduce the number of shards per index improve query response times?
Is there any way of knowing why a query is taking so long to process? Where time is being spent? Keep in mind I don't have access to log files, so I can't use the slow log.
I keep seeing in Kibana (full-year queries) that I have acceptable query times (in the order of 2-3 seconds) but the request time is much higher (20-25 seconds). Any idea of what the reason could be?
Is my cluster just too small to handle that kind of queries for that amount of data?

I'm highly interested in any other guideline/tip/recommendation that you guys could give me about other stuff I could check to gain more information about the cluster or to understand how queries work and where the bottleneck could be.

Thanks!

Topic		Replies	Views
Our elastic search query performance is VERY low Elasticsearch	12	1633	May 11, 2017
Slow Query Performance Elasticsearch	10	791	July 6, 2017
Massive performance issues on our production cluster Elasticsearch	5	2589	July 6, 2017
Performance problems Elasticsearch	12	632	July 6, 2017
Query Performance Elasticsearch	10	1905	July 6, 2017

Query performance questions

Related topics