Elasticsearch doesn't cache query with rounded date

tlinhart · April 9, 2021, 12:59pm

I originally posted the question to Stack Overflow but I guess it's more appropriate to ask here. (For reference, the original question is here - caching - Elasticsearch doesn't cache query with rounded date - Stack Overflow

I'm using this Elasticsearch query:

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {"term": {"address.country.keyword": "Germany"}},
            {"term": {"address.city.keyword": "Berlin"}},
            {"range": {"timestamp": {"gte": "now-3d/h"}}}
          ]
        }
      }
    }
  },
  "size": 0
}

AFAICT this should qualify the query for caching, at least based on these resources:

However, according to the /_stats/request_cache endpoint caching is not involved - hit_count stays the same. If I remove the range clause, it works. Also if I change the time specification to now/h , it also works. Am I missing something?

Elasticsearch's version is 6.2.3.

nik9000 · April 9, 2021, 1:32pm

now disables the query cache. Now that I think of it, I'm not sure it has to, especially when rounded like that.

nik9000 · April 9, 2021, 1:42pm

Sorry, that can't be right. I looked at the master branch and we do indeed cache now queries. And you are talking about the request cache anyway, which is "higher" than the query cache.

6.2 is quite old at this point. I'd try again in 7.12 - it may still not hit the request cache. But I think it ought to.

tlinhart · April 9, 2021, 1:48pm

Thanks for response. As mentioned in the original post comment, it works in 7.9.1. However, I can't easily upgrade to 7.x due to breaking changes. I'd at least need to find the lowest version that fixed this issue. Anyway, I'd assume it would work as 6.2 documentation (linked in the question) suggests it should.

nik9000 · April 9, 2021, 3:50pm

Oh! Sorry. I hadn't read closely enough. Old versions are bound to have bugs because we don't backport everything everywhere.

dadoonet · April 9, 2021, 4:38pm

I'd also put everything in the filter clause instead of the must clause.

dadoonet · April 9, 2021, 4:38pm

At least upgrade to 6.8 latest.

tlinhart · April 9, 2021, 5:01pm

OK I'll give it a try and upgrade to 6.8, thanks.

tlinhart · April 12, 2021, 9:27am

@dadoonet I tested the exact same query on both 6.8 and newest 7.12.0 (both single-node setups running in a Docker) and caching wasn't involved in either case. At least querying /_stats/request_cache doesn't show anything. When I remove the range using now, I see a cache miss for the first time and then cache hits every time. But when there's a now involved, no caching happens. So there are basically three options I guess:

I'm doing something wrong. Either I specify the query incorrectly or I'm "debugging" the cache usage the bad way.
Whenever there's used now in the query (even when rounding is used) it prevents the cache usage. In that way I guess the documentation is misleading.
Or, there's a bug.

Is there anything else I could try? Both in terms of reformulating the query or debugging further.
Thanks

dadoonet · April 12, 2021, 9:48am

If you are using only now then no cache is involved I think.
now-3d/h should be cached for one hour or so I guess...

Are you running the query multiple times like 5x or 10x? I think that you need to do that to see it cached.

tlinhart · April 12, 2021, 10:00am

Yeah, I tested almost every possible combination. When using plain now, it's not cached as expected. But the cache is not involved even when rounding using /d or /h. I created a tiny "test suite" which contains a script that send prepared queries multiple times and another script that monitors the request cache and also query cache using the _stats endpoint. The query cache is leveraged, however the shard-level cache is not.

dadoonet · April 12, 2021, 10:16am

I wonder if this excellent blog post written by @spinscale would help to understand what is happening?

tlinhart · April 12, 2021, 10:25am

Yeah, maybe it's just that I don't really understand what's going on. I'll go through the article and get back if there are any concerns left. Thanks a lot for the sharing!

tlinhart · April 12, 2021, 4:05pm

I've read the article but I wouldn't say I'm convinced I understand it significantly better. Especially with regards to the usage of now and date math / rounding. I made a final test with three different queries which are hits only, aggs only and combination of both:

Query 1 (hits only):

{
              "query": {
                "constant_score": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {"term": {"address.country.keyword": "Česko"}},
                        {"term": {"address.city.keyword": "Praha"}},
                        {"range": {"last_seen": {"gte": "now-3d/d"}}}
                      ]
                    }
                  }
                }
              },
              "size": 5
            }

Query 2 (aggs only):

{
              "query": {
                "constant_score": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {"term": {"address.country.keyword": "Česko"}},
                        {"term": {"address.city.keyword": "Praha"}},
                        {"range": {"last_seen": {"gte": "now-3d/d"}}}
                      ]
                    }
                  }
                }
              },
              "aggs": {
                "area": {"stats": {"field": "area"}}
              },
              "size": 0
            }

Query 3 (both hits and aggs):

{
              "query": {
                "constant_score": {
                  "filter": {
                    "bool": {
                      "filter": [
                        {"term": {"address.country.keyword": "Česko"}},
                        {"term": {"address.city.keyword": "Praha"}},
                        {"range": {"last_seen": {"gte": "now-3d/d"}}}
                      ]
                    }
                  }
                }
              },
              "aggs": {
                "area": {"stats": {"field": "area"}}
              },
              "size": 5
            }

I was sending these queries against Elasticsearch in an infinite loop (with 200 ms delay between requests) and monitored the request and query cache usage with

watch -n 1 "curl -s -X GET http://localhost:9200/_stats | jq '.indices.realestate.total | {request_cache, query_cache}'"

The results for each examined version of Elasticsearch are here:
6.2.3 (version I have in production): request cache not used for any query, query cache not used for any query
6.8.15: request cache not used for any query, query cache used for all three queries
7.12.0: request cache not used for any query, query cache used for all three queries

So, I'm not really sure what's going on, especially why request cache is not used. Is there anything else I can do to understand it any better and I can do to leverage cache (effectively any, whatever can speed up the queries) in 6.2.3 while using now with rounding?

spinscale · April 13, 2021, 2:33pm

Hey,

I tried to create a minimal reproduction to check out the caching behaviour. Minimal implies, that the query cache will not be used, but I suppose that is fine for our testing.

DELETE test

PUT test 
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

PUT test/_doc/1
{
  "address" : {
    "country" : "T1",
    "city" : "T2"
  },
  "last_seen" : "2021-04-10T12:34:56.789Z",
  "area" : 10.0
}

PUT test/_doc/2
{
  "address" : {
    "country" : "T1",
    "city" : "T2"
  },
  "last_seen" : "2021-04-11T12:34:56.789Z",
  "area" : 10.0
}

PUT test/_doc/3
{
  "address" : {
    "country" : "T1",
    "city" : "T2"
  },
  "last_seen" : "2021-04-12T12:34:56.789Z",
  "area" : 10.0
}

# verify fields are mapped correctly
GET test/_mapping

GET test/_stats?filter_path=_all.primaries.request_cache,_all.primaries.query_cache

# also try without the parameter
GET test/_search?request_cache=true
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "filter": [
            {
              "term": {
                "address.country.keyword": "T1"
              }
            },
            {
              "term": {
                "address.city.keyword": "T2"
              }
            },
            {
              "range": {
                "last_seen": {
                  "gte": "now-3d/d"
                }
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "area": {
      "stats": {
        "field": "area"
      }
    }
  },
  "size": 0
}

So this indexes three documents matching your query (well right now due to the now part). On 7.12. this immediately fills up the request cache even if I do not specify request_cache=true in the URL?

Is this behaviour different to yours?

That said, the query cache will not work, because of not having enough data as mentioned in the blog post, but let's try, one caching at a time

--Alex

tlinhart · April 14, 2021, 8:14am

Hi Alex, thanks a lot for staying with me.
When I run the exact sequence of calls you provided, I get a request cache miss for the first time. What's weird to me is that when I subsequently call the search API with the exact same query, the cache stats stay the same:

{
  "_all" : {
    "primaries" : {
      "query_cache" : {
        "memory_size_in_bytes" : 0,
        "total_count" : 0,
        "hit_count" : 0,
        "miss_count" : 0,
        "cache_size" : 0,
        "cache_count" : 0,
        "evictions" : 0
      },
      "request_cache" : {
        "memory_size_in_bytes" : 0,
        "evictions" : 0,
        "hit_count" : 0,
        "miss_count" : 1
      }
    }
  }
}

To provide complete information, here's the exact version I'm using:

{
  "name": "f09a1378ed75",
  "cluster_name": "es-test",
  "cluster_uuid": "ZzobHNUiR86jwCS9iKZ7Qg",
  "version": {
    "number": "7.12.0",
    "build_flavor": "default",
    "build_type": "docker",
    "build_hash": "78722783c38caa25a70982b5b042074cde5d3b3a",
    "build_date": "2021-03-18T06:17:15.410153305Z",
    "build_snapshot": false,
    "lucene_version": "8.8.0",
    "minimum_wire_compatibility_version": "6.8.0",
    "minimum_index_compatibility_version": "6.0.0-beta1"
    },
  "tagline": "You Know, for Search"
}

spinscale · April 15, 2021, 7:47am

Just to get this right: Even when you configure via the parameter to use the cache, it does not get used?

I also tested on 7.12. but not on docker. How much memory are you giving to the Elasticsearch process and are there any configuration options that you set?

--Alex

tlinhart · April 15, 2021, 7:56am

Yeah, even when I use the query parameter request_cache=true, it doesn't use it.

I use this custom Dockerfile:

FROM docker.elastic.co/elasticsearch/elasticsearch:7.12.0

RUN ./bin/elasticsearch-plugin install analysis-icu

RUN mkdir -p ./config/hunspell/cs_CZ \
 && cd ./config/hunspell/cs_CZ \
 && curl -O https://issues.apache.org/jira/secure/attachment/12541597/cs_CZ.aff \
 && curl -O https://issues.apache.org/jira/secure/attachment/12541598/cs_CZ.dic \
 && echo "strict_affix_parsing: false" > ./settings.yml

This is the command I use to run the Docker container:

docker run --name=elastic -d -p 9200:9200 -p 9300:9300 --restart=always \
  -e ES_JAVA_OPTS="-Xms2g -Xmx2g" -e "cluster.name=es-test" \
  -e "discovery.type=single-node" -e "bootstrap.memory_lock=true" \
  --ulimit memlock=-1:-1 \
  elasticsearch:test

The physical host has 4 GB of RAM.

spinscale · April 15, 2021, 8:02am

Can you share the cluster state/nodes stats/nodes info output somewhere in a gist? The shard level request cache can be deactivated dynamically via the cluster update settings API, but is enabled by default. The setting is index.requests.cache.enable.

tlinhart · April 15, 2021, 9:30am

Here is the requested output - https://gist.github.com/tlinhart/857a46ff46fd6d0afeb43fa04dd62920. If there is anything else I could provide, please let me know.