Hello everyone,
I'm having an issue when i use highlight in query.
Currently, we are using elasticsearch for full text search on 1 index, around 10gb of documents. This index is running on a single node. Its technical information is bellow:
- 30gb ram, with 4gb heap size allocated for elasticsearch, and 2gb heap size for logstash jdbc
- 30gb ssd for around 1 index of 10gb
- CPU with two cores
Mapping of this index is like bellow:
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"translations": {
"properties": {
"en": {
"properties": {
"content": {
"type": "keyword",
"ignore_above": 20,
"fields": {
"default": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "indonesian"
},
"exact": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "indonesian_exact"
}
}
},
"title": {
"type": "keyword",
"ignore_above": 20,
"fields": {
"default": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "indonesian"
},
"exact": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "indonesian_exact"
}
}
}
}
}
}
}
}
}
}
When i do a simple query string search, it took only ~200ms:
{
"query": {
"simple_query_string": {
"query": "force majeur",
"fields": [
"translations.*.title.default",
"translations.*.content.default"
]
}
}
}
But when i add 1 highlight to this query, it took around 59000ms ~ 1min, and CPU load chart in kibana suddenly goes to near 100%
{
"query": {
"simple_query_string": {
"query": "foo bar",
"fields": [
"translations.*.title.default",
"translations.*.content.default"
]
}
},
"highlight": {
"type": "fvh",
"order": "score",
"pre_tags": [
"<mark>"
],
"post_tags": [
"</mark>"
],
"fields": {
"ranslations.*.content.default": {
"fragment_size": 500,
"number_of_fragments": 3
}
}
}
}
I read some articles about this but i still have some questions:
- If i add more nodes to this cluster (for example 2 nodes with 1 primary shard (10gb) and 2 replica shards) then could this 3 nodes cluster accelerate the query time ?
- Does it help if i store documents in index with term vector option "with_positions_offsets_payloads" instead of "with_positions_offsets" like the current one ?
- Does it help if i upgrade CPU from 2 cores to 4 cores ?
Highlight feature is really important in our use case. If you need more information, i'm ready to share it.
Do you have any other suggestion to improve highlight query time ?
Thank you alot.