Elasticsearch 7.0 SQL ACCESS request cache

c81b4c93b1f70af5398f · June 19, 2019, 12:38pm

Looks like SQL query is not cached, and equivalent dsl qeury is cached.

Every single time this SQL query is executed, it takes over 3 seconds. (The number of documents is about 6,000,000)

    POST /_sql?format=json
    {
        "query": "SELECT HISTOGRAM(\"batchDate\", INTERVAL 1 DAY) AS \"h_date\", SUM(\"aggrCount\") AS \"cnt\" FROM \"my.index-*\" WHERE \"batchDate\" < 1560988799999 AND \"batchDate\" >= 1560340034000 GROUP BY \"h_date\""
    }

And when I'm using QueryDSL using request body got from /_sql/translate API with the sql query above, it took over 3~5 seconds only first time, and next it took under 1ms. Maybe it might be cached.

    POST /my.index-*/_search
    {
      "size" : 0,
      "query" : {
        "range" : {
          "batchDate" : {
            "from" : 1560340034000,
            "to" : 1560988799999,
            "include_lower" : true,
            "include_upper" : false,
            "boost" : 1.0
          }
        }
      },
      "_source" : false,
      "stored_fields" : "_none_",
      "aggregations" : {
        "groupby" : {
          "composite" : {
            "size" : 1000,
            "sources" : [
              {
                "25579" : {
                  "date_histogram" : {
                    "field" : "batchDate",
                    "missing_bucket" : true,
                    "value_type" : "date",
                    "order" : "asc",
                    "interval" : 86400000,
                    "time_zone" : "Z"
                  }
                }
              }
            ]
          },
          "aggregations" : {
            "25580" : {
              "sum" : {
                "field" : "aggrCount"
              }
            }
          }
        }
      }
    }

Andrei_Stefan · June 24, 2019, 9:18am

@c81b4c93b1f70af5398f the json we generate as a query can be different every time you call translate. The relevant parts of the query will not be different, obviously, but the name of the aggregations, for example, can be different - 25579 and 25580 from your example. And I think caching works on the body of the query (the json itself). If you generate the query once, run it several times, and then generate it again with translate and run it, does it take seconds or returns instantly?

nuwan · June 24, 2019, 11:18am

Hi

Have you setup elasticsearch cluster with V 7 at least 3 node cluster ?

Nuwan

c81b4c93b1f70af5398f · June 24, 2019, 11:26am

The name of aggregations depends like you mentioned in translate API.
I repeated the query dsl request several times after generating translated json body(dsl) once, and it got much faster after several times. But it gets fast only with translated dsl that has same name of aggregations. If I generate new translated json body(with another aggregation name), it takes long again.
When it comes with SQL, It takes over 3 seconds even after several times.
Can I fix the name of aggregations when requesting by SQL?

c81b4c93b1f70af5398f · June 24, 2019, 11:28am

Hi nuwan.
My cluster has 9 nodes and every node is V 7.0.0

nuwan · June 24, 2019, 11:47am

Hi

Thank you very much for quick respond my configuration is look like below . appciate if you can help me to setup 3 node cluster . I'm going go update my 3 node production cluster from v 6.5.1 to v 7.1.1

[root@orc-app1 ~]# cat /etc/elasticsearch/elasticsearch.yml
cluster.name: ElasticDemO
node.name: ${HOSTNAME}
network.host: 192.168.60.4
xpack.security.enabled: false
bootstrap.system_call_filter: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
discovery.seed_hosts:

192.168.60.4:9300
192.168.60.5:9300
192.168.60.6:9300
cluster.initial_master_nodes: ["192.168.60.4","192.168.60.5","192.168.60.6"]
http.cors.enabled: true
discovery.zen.minimum_master_nodes: 2
node.master: true

[root@orc-app1 ~]#

http://192.168.60.4:9200/_cat/nodes?v

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
192.168.60.4 25 94 5 0.00 0.02 0.05 mdi * orc-app1.dev

nuwan · June 24, 2019, 11:49am

appreciate if you can send me sample configuration file or if you can go through my configuation file and let me know what need to be modified

Andrei_Stefan · June 24, 2019, 12:06pm

@nuwan please, open a new thread of discussion and keep this discussion focused on the initial issue. Thank you.

nuwan · June 24, 2019, 12:14pm

yes I did still I'm unable to get the answer for that appreciate if any one help me regarding my concerns Any one setup 3 node cluster with elasticsearch v 7

Thank you

nuwan · June 24, 2019, 12:15pm

Andrei_Stefan · June 24, 2019, 12:29pm

@c81b4c93b1f70af5398f no, you cannot control the name of the generated aggregation in a query, everything is transparent and ES SQL should behave just like any other SQL facing tool.

I've created a github issue and we'll discuss internally if this is feasible to implement: https://github.com/elastic/elasticsearch/issues/43531

c81b4c93b1f70af5398f · June 26, 2019, 8:31am

Thanks for your follow up. Hope to be fixed soon.

system · July 24, 2019, 8:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SQL Vs. Query DSL Elasticsearch	2	704	March 28, 2022
Query execution time Elasticsearch	4	3951	January 15, 2019
Queries cached Elasticsearch	2	245	July 6, 2017
Search time consistency Elasticsearch	3	307	July 6, 2017
Date histogram cache? Elasticsearch	2	449	July 12, 2019

Elasticsearch 7.0 SQL ACCESS request cache

Related topics