Search query is slow and first query always takes too much time

Code_Wizard · October 29, 2020, 7:23am

Hi, search queries are slow when i do should match with multiple search terms and also for matching nested documents, basically it is taking 7-10 sec for first query and 5-6 sec later on due to elasticsearch cache, but queries for non nested objects with just match works fast i.e within 100ms .

i'm running elastic search in AWS instance with 250GB RAM and 500GB disk space, i have one template and 204 indexes with total of around 107 Million document indexed with 2 shards per index in a single node, and i have kept 30GB heap size.

i can have nested objects more than 50k so i have increased length to 500k, searching on this nested objects is taking too much time and any OR (should match) operations on fields other than nested also taking time, it there any way i can boost my query performance for nested objects? or is there anything wrong in my configuration?
And is there any way i can make first query also faster?

following is my sample mapping.

      {
      "index_patterns": [
        "product_*"
      ],
      "template": {
        "settings": {
          "index.store.type": "mmapfs",
          "number_of_shards":2,
          "number_of_replicas": 0,
          "index": {
            "store.preload": [
              "*"
            ],
            "mapping.nested_objects.limit": 500000,
            "analysis": {
              "analyzer": {
                "cust_product_name": {
                  "type": "custom",
                  "tokenizer": "standard",
                  "filter": [
                    "lowercase",
                    "english_stop",
                    "name_wordforms",
                    "business_wordforms",
                    "english_stemmer",
                    "min_value"
                  ],
                  "char_filter": [
                    "html_strip"
                  ]
                },
                "entity_name": {
                  "type": "custom",
                  "tokenizer": "standard",
                  "filter": [
                    "lowercase",
                    "english_stop",
                    "business_wordforms",
                    "name_wordforms",
                    "english_stemmer"
                  ],
                  "char_filter": [
                    "html_strip"
                  ]
                },
                "cust_text": {
                  "type": "custom",
                  "tokenizer": "standard",
                  "filter": [
                    "lowercase",
                    "english_stop",
                    "name_wordforms",
                    "english_stemmer",
                    "min_value"
                  ],
                  "char_filter": [
                    "html_strip"
                  ]
                }
              },
              "filter": {
                "min_value": {
                  "type": "length",
                  "min": 2
                },
                "english_stop": {
                  "type": "stop",
                  "stopwords": "_english_"
                },
                "business_wordforms": {
                  "type": "synonym",
                  "synonyms_path": "<some path>/business_wordforms.txt"
                },
                "name_wordforms": {
                  "type": "synonym",
                  "synonyms_path": "<some path>/name_wordforms.txt"
                },
                "english_stemmer": {
                  "type": "stemmer",
                  "language": "english"
                }
              }
            }
          }
        },
        "mappings": {
          "dynamic": "strict",
          "properties": {
            "product_number": {
              "type": "text",
              "analyzer": "keyword"
            },
            "product_name": {
              "type": "text",
              "analyzer": "cust_case_name"
            },
            "first_fetch_date": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
            },
            "last_fetch_date": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
            },
            "review": {
              "type": "nested",
              "properties": {
                "text": {
                  "type": "text",
                  "analyzer": "cust_text"
                },
                "review_date": {
                  "type": "date",
                  "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||yyyy-MM||yyyy"
                }
              }
            }
          }
        },
        "aliases": {
          "all_products": {}
        }
      },
      "priority": 200,
      "version": 1,
    }

if i search for any specific term in review text the response is taking too much time.

{
    "_source":{
        "excludes":["review"]
    },
    "size":1,
    "track_total_hits":true,
    "query":{
        "nested":{
            "path":"review",
            "query":{
                "match":{
                    "review.text":{
                        "query":"good",
                        "zero_terms_query":"none"
                    }
                }
            }
        }
    },
    "highlight":{
        "pre_tags":[
            "<b>"
        ],
        "post_tags":[
            "</b>"
        ],
        "fields":{
            "product_name":{
                
            }
        }
    }
}

I'm sure I'm missing something obvious!

Mark_Harwood · October 29, 2020, 12:28pm

Perhaps worth revisiting your rationale for using nested docs:

The example query is not one that warrants the use of nested docs.

Code_Wizard · October 29, 2020, 12:59pm

Thanks for the reply, for our data nested object fits, but you are saying that the example query is not one that warrants the use of nested docs, may i know why?

Mark_Harwood · October 29, 2020, 2:01pm

Nested docs+queries are only ever needed when you query 2 fields or more - e.g. text:good AND author:john. Without nested docs we have a problem called "cross matching". This problem is explained in these slides that first proposed adding support to Lucene.

If you only query one field at a time then the type object can be used instead of nested, saving resources and query complexity.

Code_Wizard · October 30, 2020, 6:52am

We do query 2 or more fields for example review:text:good AND review:review_date:[now-2d TO now] AND product_name:abc for this best approach seems to be nested query and we do have some other fields which we kept as object whish is not shown in sample data above.

Mark_Harwood · October 30, 2020, 7:24am

Might that be a case for keeping a “reviews” index separate from the products index?

Code_Wizard · October 30, 2020, 10:51am

I had thought about it but i see redundant data of product in review, because i can have >50k review for a product and if i separate index as review index and product index, i need to have product detail in each review otherwise i can't search with product detail along with review, i don't feel that's a good approach, hope you got my point!

Mark_Harwood · October 30, 2020, 11:58am

Your original question was about improving speed. Denormalisation improves query speed but is paid for with added disk space and cost of updates.
It comes down to physics. You just can't magic certain things to be faster without physically re-organising things.

That's a scary number of review objects in a single JSON document.

Code_Wizard · October 30, 2020, 12:25pm

I agree, what is the optimal way for my use case? when you say re-organizing things!

Mark_Harwood · October 30, 2020, 12:31pm

That's for you to determine. "Optimal" depends on the content of your queries/docs, any SLAs, disk costs, update latencies, rates of product changes etc.

Denormalizing where appropriate - copying small subsets of product data onto review documents in a "reviews" index.

It's hard to advise without knowing the costs of the various trade-offs required.

system · December 1, 2020, 9:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow query performance Elasticsearch	2	280	July 6, 2017
Search Performance Tuning Elasticsearch	4	458	November 1, 2018
Elasticsearch 7.2 slow query after update Elasticsearch	15	5069	August 18, 2019
Search performance issues, non-cacheable cases Elasticsearch	8	478	July 6, 2017
First time queries (not filters) to Elasticsearch takes lot of time Elasticsearch	7	2656	February 1, 2017

Search query is slow and first query always takes too much time

Related topics