High load of cpu when highlight

Hello everyone,
I'm having an issue when i use highlight in query.
Currently, we are using elasticsearch for full text search on 1 index, around 10gb of documents. This index is running on a single node. Its technical information is bellow:

  • 30gb ram, with 4gb heap size allocated for elasticsearch, and 2gb heap size for logstash jdbc
  • 30gb ssd for around 1 index of 10gb
  • CPU with two cores

Mapping of this index is like bellow:

{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "translations": {
        "properties": {
          "en": {
            "properties": {
              "content": {
                "type": "keyword",
                "ignore_above": 20,
                "fields": {
                  "default": {
                    "type": "text",
                    "term_vector": "with_positions_offsets",
                    "analyzer": "indonesian"
                  },
                  "exact": {
                    "type": "text",
                    "term_vector": "with_positions_offsets",
                    "analyzer": "indonesian_exact"
                  }
                }
              },
              "title": {
                "type": "keyword",
                "ignore_above": 20,
                "fields": {
                  "default": {
                    "type": "text",
                    "term_vector": "with_positions_offsets",
                    "analyzer": "indonesian"
                  },
                  "exact": {
                    "type": "text",
                    "term_vector": "with_positions_offsets",
                    "analyzer": "indonesian_exact"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

When i do a simple query string search, it took only ~200ms:

{
  "query": {
    "simple_query_string": {
      "query": "force majeur",
      "fields": [
        "translations.*.title.default",
        "translations.*.content.default"
      ]
    }
  }
}

But when i add 1 highlight to this query, it took around 59000ms ~ 1min, and CPU load chart in kibana suddenly goes to near 100%

{
  "query": {
    "simple_query_string": {
      "query": "foo bar",
      "fields": [
        "translations.*.title.default",
        "translations.*.content.default"
      ]
    }
  },
  "highlight": {
    "type": "fvh",
    "order": "score",
    "pre_tags": [
      "<mark>"
    ],
    "post_tags": [
      "</mark>"
    ],
    "fields": {
      "ranslations.*.content.default": {
        "fragment_size": 500,
        "number_of_fragments": 3
      }
    }
  }
}

I read some articles about this but i still have some questions:

  • If i add more nodes to this cluster (for example 2 nodes with 1 primary shard (10gb) and 2 replica shards) then could this 3 nodes cluster accelerate the query time ?
  • Does it help if i store documents in index with term vector option "with_positions_offsets_payloads" instead of "with_positions_offsets" like the current one ?
  • Does it help if i upgrade CPU from 2 cores to 4 cores ?

Highlight feature is really important in our use case. If you need more information, i'm ready to share it.
Do you have any other suggestion to improve highlight query time ?
Thank you alot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.