This post is regarding my Github Issue: https://github.com/elastic/elasticsearch/issues/41231
Elasticsearch Version: 7.0.0
OpenJDK Runtime Environment (build 1.8.0_201-b09)
Centos 7 Linux 3.10.0-957.5.1.el7.x86_64
After upgrade from Elasticsearch 6.7 to 7.0 it every minute took 500 - 1000 ms to deliver the result instead 9 - 12 ms. This happen mostly for 3-5 seconds then work again as usual .
It's a very small dataset of 40K records, that fit completely into the memory.
Need to mention: Every minute logstash import updated documents into elasticsearch, at the moment this are around 100-200 updated documents. There is nothing inside log of Logstash. And logstash was updated too to 7.0.0.
The two nodes we run don't have any signifficant load (<5%).
Is you cluster fully upgraded to 7.0.0 or there are still nodes running on 6.7? All (2) Nodes was upgraded to 7.0.0
How much Heap size have you configured for your cluster? 4 GB (we have 40'000 documents here around 30 MB data)
Could you check from the logs if there is something unusual when the query becomes slow? inside the logs is nothing regarding this, no warning, no error
Could you check if there is GC activity during this period that leads to the slow response? regarding log there is all the time GC activtiy:
...
[2019-04-16T07:47:52.661+0000][29643][safepoint ] Application time: 1.0002825 seconds
[2019-04-16T07:47:52.661+0000][29643][safepoint ] Entering safepoint region: Cleanup
[2019-04-16T07:47:52.662+0000][29643][safepoint ] Leaving safepoint region
[2019-04-16T07:47:52.662+0000][29643][safepoint ] Total time for which application threads were stopped: 0.0003800 seconds, Stopping threads took: 0.0000731 seconds
[2019-04-16T07:47:53.662+0000][29643][safepoint ] Application time: 1.0002561 seconds
[2019-04-16T07:47:53.662+0000][29643][safepoint ] Entering safepoint region: Cleanup
[2019-04-16T07:47:53.662+0000][29643][safepoint ] Leaving safepoint region
[2019-04-16T07:47:53.662+0000][29643][safepoint ] Total time for which application threads were stopped: 0.0004199 seconds, Stopping threads took: 0.0000839 seconds
[2019-04-16T07:47:54.663+0000][29643][safepoint ] Application time: 1.0002671 seconds
[2019-04-16T07:47:54.663+0000][29643][safepoint ] Entering safepoint region: Cleanup
[2019-04-16T07:47:54.663+0000][29643][safepoint ] Leaving safepoint region
[2019-04-16T07:47:54.663+0000][29643][safepoint ] Total time for which application threads were stopped: 0.0003651 seconds, Stopping threads took: 0.0000822 seconds
[2019-04-16T07:47:55.664+0000][29643][safepoint ] Application time: 1.0002670 seconds
[2019-04-16T07:47:55.664+0000][29643][safepoint ] Entering safepoint region: Cleanup
[2019-04-16T07:47:55.664+0000][29643][safepoint ] Leaving safepoint region
[2019-04-16T07:47:55.664+0000][29643][safepoint ] Total time for which application threads were stopped: 0.0003404 seconds, Stopping threads took: 0.0000649 seconds
PUT dating
{
"mappings": {
"profile": {
"properties": {
"appid": {
"type": "text"
},
"first_name": {
"type": "text"
},
"country": {
"type": "text"
},
"geo_pt": {
"type": "geo_point"
},
"last_update_dt": {
"type": "date"
},
"member_id": {
"type": "integer"
},
"mstatus": {
"type": "byte"
},
"is_hidden": {
"type": "byte"
},
...
{
"size": 100,
"from": 0,
"script_fields": {
"geo_distance": {
"script": {
"params": {
"lat": xxx,
"lon": xxx
},
"source": "doc[\'geo_pt\'].arcDistance(params.lat, params.lon)"
}
}
},
"_source": [
"first_name",
"last_request_dt",
"geo_distance"
],
"sort": [
{ "member_id" : {"order" : "desc"} }
],
"query": {
"bool": {
"filter": [
{
"term": {
"mstatus": 1
}
},
{
"terms": {
"appid": [ xxx, xxx]
}
},
"term": {
"country": "xxx"
}
]
, "must_not": [
{
"term": {
"is_hidden": 1
}
},
{
"term": {
"member_id": xxx
}
}
]
}
}
}
It seems it's not related to the Off-heap terms index, as the data is not larger then the available memory.
Is there anything I can try before I downgrade back to 6.7?