We are upgrading our ElasticSearch 1.7.4 cluster to 5.4. We made mapping/code modifications so that everything is functionally working, but the performance is much worse than 1.7.4 (same exact cluster size, data size, and node types). With no load on the cluster, a query takes 10X the time it takes on 1.7.4. The worst performance seems to be related to queries that pull back a lot of records. In trying to improve performance - here are some things we’ve looked at:
-
We were seeing a fuzzy match query on a few fields that returned 5000 records go from 2-3 seconds on 1.7.4 to 30+ seconds on 5.4, so we thought possibly the new ABM25 similarity algorithm could be slowing things down. We changed the similarity algorithm to classic - no impact.
-
We double checked memory usage - we have 16GB allocated to each node across 16 nodes. This is the same as before. We compared memory usage during activity between 1.7.4 and 5.4 and its very similar
-
We tried queries using numeric since previously most of our fields were mapped as string and with 5.4 we auto-mapped them so they were text + keyword. Numeric based queries were also slower, but not quite as extreme.
-
We have also double checked that index files are memory mapped. We merged segments down to 5 per shard. We made sure mlockall was enabled. We confirmed no swapping is occurring at the OS level.
(For context, we have 1/2 billion to a billon records. The records are pretty complex - 100s of attributes in each record)
Has anybody noticed performance differences between older versions of ES and ES 5.x (with 5.x being slower)? Any recommendations on what to look at? Just ask if there are clarifications that would help with an answer.
Any thoughts would be greatly appreciated.
Thanks!