Elasticsearch simple query string and highlighter leads to timeout

Hi!

I am facing a strange problem with Elasticsearch:

I have this query which results in a timeout because it takes more than 60 seconds:

GET /website/_search
{
  "query": {
    "simple_query_string": {
      "query": "mbs regex replace all",
      "fields": ["content_primary"],
        "default_operator": "and"
    }
  },
  "highlight": {
    "fields": {
      "content_primary": {}
    }
    ,"fragment_size" : 200
  }
}

There are 3 options to make the query work:

  • Remove the highlight
  • Change the default_operator to "OR"
  • Remove the word "all" from the query term

If I remove the highlighter, the query finishes within 9 milliseconds.
If I search for "mbs regex replace" (without the word "all" and with highlight enabled and operator AND) it finishes within 166 milliseconds.
If I change the query operator to OR (with highlight enabled) it finishes within 11 milliseconds.

But for my application, I need all three parameters exactly like specified, because the query "mbs regex replace all" is coming from the user, I need the highlighter for the frontend and I need the operator to be AND, so the search works like the user expect it.

What is wrong here?

The index has 96743 documents.
Elasticsearch v7.11.2 hosted on cloud.elastic.co

Edit: this is the mapping:


PUT /website
{
  "settings": {
    "analysis": {
      "filter" : {
        "custom_synonym" :  {
          "type" : "synonym",
          "lenient": true,
          "synonyms": [ "FM => FileMaker", "DDR => Datenbank Design Report", "DB => Datenbank", "TO, Table Occurence => Tabellenauftreten" ]
        },
        "german_stemming": {
          "type": "stemmer",
          "language": "german"
        },
        "german_stopping": {
          "type": "stop",
          "stopwords": "_german_"
        }
      },
      "analyzer": {
        "custom_german_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "custom_synonym"]
        }
      }
    }
  },  
    "mappings" : {
      "properties" : {
        "authors" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "classifications" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "content_primary" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            },
            "german" : {
              "type": "text",
              "analyzer": "custom_german_analyzer"
            }
          }
        },
        "date" : {
          "type" : "date"
        },
        "description" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "domain" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "id" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "keywords" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "language" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "properties" : {
          "properties" : {
            "Plugin Komponente" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "osversion" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "softwareversion" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "source" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            },
            "german" : {
              "type": "text",
              "analyzer": "custom_german_analyzer"
            }
          }
        },
        "url" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "urlToImagePreview" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}

PS: Disclaimer: I also asked here but nobody answered: Elasticsearch simple query string and highlighter leads to timeout - Stack Overflow

Thank you!

How large is the content_primary field on average? Have you tried using a different highlkighter or tune the settings?

Thank you for your questions, which lead into the right direction.
The average document has about 400 words. But with my query mbs regex replace all and the AND operator, it returned 5 documents wich had each 200 KB in size and the default highlighter unified took like 12 seconds for each of these documents. The plain highlighter takes 178 milliseconds for the whole result.

So first I will get rid of these super long documents and also I think I will change to the plain highlighter, but I have to check what the means in my UI.

Thank you for asking the right questions!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.