Hi everyone,
I'm am almost a beginner in elasticsearch and I need a little help.
I built a monitoring infrastructure with ELK for 60 nodes, with a single instance of all ELK stack with monitoring plugin on a single node, and it's been working for three months. Basically, my logstash collects three differents input:
- Input from metricbeat with 7 metricsets of system module from about 60 nodes, with 1 minute interval;
- Input from 15 exec inputs with a simple ruby filter parsing, executed each minute;
- Input from 1 exec input with a simple ruby filter parsing, executed each minute.
So, for each day, elasticsearch has three different indexes. For example, indexes of yesterday are (data has been taken from monitoring plugin):
- Index1-2017.08.15, document count 2.4m, data 917.7 MB, index rate 0 /s, search rate 0/s, and Unassigned shards 5
- Index2-2017.08.15, document count 20.2k, data 299.7 MB, index rate 0 /s, search rate 0/s, and Unassigned shards 5
- Index3-2017.08.15, document count 1.4k, data 52.1 MB, index rate 0 /s, search rate 0/s, and Unassigned shards 5
Each day a crontab starts to delete indexes older than 8 weeks, in order to release space and shards (so there are about a total of 860 shards in elasticsearch everyday).
Each index has a field called type, and I can select these documents from others with the type. For instance, all documents in index1* have type: "type1", etc...
I need to collect all data in a time range (for example the last 5 minutes of data), and to select them with type field in order to avoid to collect wrong data. Actually, I have a lot of documents in 5 minutes, so I used scroll settings, using for example 20s of context's time and 100 pages for scroll.
I also need to run sometimes another module which calls this query a few times with intervals in a determined time. For example, to gather all the past month metrics, I must have these metrics divided in 5-minutes interval, then I call the following query 5 times at time until I reach all the month. Jobs can't be more than 5 at times to avoid an overload of elasticsearch.
The query I built is (for example):
//I need to scroll to get all results
scroll: 20s,
size: 100,
//This sort operation should optimize elasticsearch because I don't need any sort
sort: ["_doc"],
body: {
query: {
bool: {
filter: [
//Filter documents for type
{terms: { type: ["type1", "type2", "type3"]}},
// Filter documents for arrival time
{range: {
"@timestamp": {
gte: now-290s,
lte: now
}
}}
]
}
}
}
Now, I run this query with nodejs every 5 minutes and I use the other module which uses this query with 5-minutes intervals to collect last months metrics. It works properly for the first minutes, but after a while elasticsearch node stops working (logstash doesn't provide any events, elasticsearch doesn't respond and kibana finds that elasticsearch plugin is red).
I don't know if it is a problem of reindexing, heavy query, elasticsearch settings, etc...
Can someone help me?
Thank you