Add tags or booleans to speed up queries?

GabrielM · April 22, 2021, 10:14am

Hello,
I want to query documents if the length of a field is more than 1k or 10k, etc.

I'm trying to follow an Elasticsearch best practice to avoid doing queries like "if field_length> 1000" (Tune for search speed | Elasticsearch Guide [7.12] | Elastic)

I see 2 solutions to do this:

Add an array of string tags, example:

{
  "my_long_field": "xxxxxxx"
  "tags": ["is_more_than_1k", "is_more_than_10k"]
}

And query:

{
  "query": {
    "match": {
      "tags": "is_more_than_10k"
    }
  }
}

Add booleans:

{
  "my_long_field": "xxxxxxx"
  "is_more_than_1k": true
  "is_more_than_10k": true
  "is_more_than_100k": false
}

And query:

{
  "query": {
    "match": {
      "is_more_than_10k": true
    }
  }
}

Do you know which solution is the fastest please ?

Thank you in advance, have a nice day !

dadoonet · April 22, 2021, 10:29am

Another solution would be to actually store the exact size of the field in the document and then use a range query.

GabrielM · April 22, 2021, 10:59am

Hello and thank you for your response. The official documentation advises to add tags instead of doing a range query: Tune for search speed | Elasticsearch Guide [7.12] | Elastic

dadoonet · April 22, 2021, 12:30pm

Agreed. I'm "just" saying that using ranges is more flexible.
Also depending on the datasize you might not notice a big difference between one way or the other. Specifically when using the same values for the range queries. I guess they will be cached at some point.

GabrielM · April 22, 2021, 12:37pm

I agree, it's more flexible but we are talking about +50TB dataset here and this small optimization can save my life

system · May 20, 2021, 12:38pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.