We are trying to migrate the elasticsearch from 6.7 to 7.10.
The indices are reindexed in the new elasticsearch cluster.
And did some comparisons.
simple load test is slow
A simple load test indicates that the same set of queries uses 91ms per request in 6 while it takes 112ms in 7.
An example query looks like this: request.json · GitHub
profile shows slower parts
The profile detail shows a lot of queries take 100% longer in 7 than it in 6.
{
"type": "PointInSetQuery",
"description": "brandIds:{59 (...omitted 100 brands) 31389}",
"time_6_baseline": "1.2531 ms",
"time_7_baseline": "4.4807 ms"
}
Breakdowns with a same query have huge differences.
The left one comes from ES6 and the right one is ES7.
page cache is used
We know that the elasticsearch 7 has a off-heap changes and it moves terms index out of the heap. so we checked the page cache: sudo lsof +D /mnt/elasticsearch/data/nodes/0/indices/wu47nPk0TEuVBzLo3WEOsQ
and it looks like this:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 30779 elasticsearch mem REG 259,0 108409303 15728663 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/7/index/_5.cfs
java.........................................................................cfs
java.........................................................................doc
java.........................................................................dvd
java.........................................................................kdd
java.........................................................................kdi
java.........................................................................nvd
java.........................................................................tim
java.........................................................................tip(these files are loaded in mem)
java 30779 elasticsearch mem REG 259,0 71445 15729187 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/4/index/_9_Lucene84_0.tip
java 30779 elasticsearch 373r REG 259,0 33969522 15728698 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/1/index/_d_Lucene84_0.pos
java.........................................................................ckp
java.........................................................................fdt
java.........................................................................fdx
java.........................................................................lock
java.........................................................................pos
java.........................................................................tlog(these files are not loaded in mem)
java 30779 elasticsearch 731w REG 259,0 88 15728658 /mnt/elasticsearch/data/nodes/0/indices/Dznr2xCpSS6VDJV0qLDkqQ/7/translog/translog.ckp
we tried with different index store settings(change the store type to mmapfs, preload custom files) and they load more files into the mem like elasticsearch 6 but the overall performance is still not as good as 6.
query optimization
Query optimization should improve the performance but it doesn't cover the gaps between 6 and 7.
We tried using filter
to replace must
and have some improvement but it doesn't fill the gap between 6 and 7.
index and cluster brief
The testing index and cluster are:
- 1 index with 2.6 million products and 9 shards 1 replica. 5.8gb(11.6 gb including replica)
- we tried different Elasticsearch heap size: 8G and 15G
- The cluster has 2 nodes. Each node is a i3.xlarge(4cpu 30.5GB SSD)
why these 2 versions
Because we use amazon/opendistro-for-elasticsearch
which are 0.9.0 and 1.3.2 and we plan to move to 8 without using that image.
question
So the question is what causes the performance issue?