Problem
Elasticsearch search latency degrades over time (from ~2 ms to fluctuations to 50-100 ms until we get circuit_breaking_exceptions.
Current Fix
When we restart our Node.js process (runs within Kubernetes on AWS) latency drops to ~2 ms again until the problem shows its ugly face again.
Stack
Node.js 14, latest elasticsearch-js client, latest Express, Alpine Linux, etc.
We've tried both CPU Optimized and High IO optimized.
Scale
We have around 100-150 requests per second to the ES cluster. Every query is unique with a unique latitude+longitude. I could show the mappings and queries on request but I'm thinking it's not related to them since a Node.js restart "fixes" it. If our queries were non-performant we ought to see it all the time, consistently, I'm thinking.
Solution
We reindex all of our documents every 20 minutes and then update aliases in a cronjob. We have three indexes, one with ~1k documents, one with ~60k documents and one with around 60-70k documents.
Our client:
const {Client} = require('@elastic/elasticsearch');
const options = {
maxRetries: 1,
requestTimeout: 1500,
agent: {
timeout: 1500
},
cloud: {
id: process.env.ELASTICSEARCH_CLOUD_ID
},
auth: {
username: process.env.ELASTICSEARCH_USERNAME,
password: process.env.ELASTICSEARCH_PASSWORD
}
};
module.exports = new Client(options);
Index Settings
await es.indices.create({
index: indexName,
body: {
settings: {
'number_of_shards': 1,
'number_of_replicas': 1,
'refresh_interval': -1,
'index.translog.sync_interval': '30s', // We've tried without this too
'index.translog.durability': 'async', // We've tried without this too
'analysis': analysis
},
mappings
}
});
The reason for refresh_interval -1 is that we refresh manually in the cronjob once it's done.
Symptoms
The big drops in latency are when we redeploy Node.js.
What can cause this? Restarting Node.js over and over again doesn't really seem like a solution