ElasticSearch Configuration
1 zone - 4gb ram - 120GB storage
Logstash Configuration
4vcpu - 16gb ram - t2.xlarge instance - logstash 7.7.0
Workflow
- One tool gets data from different sources and creates csv files in different folders.
- Logstash is configured to process those csv files and send it to elastic search.
- In 24hrs, we create about 5 gb of total files. There are multiple csv files. Each file size is around 1-2mb.
- Logstash uploads about 4000 entries per min to elastic search by processing those files. As soon as a new file is created it will upload to elastic search.
Logstash conf file
input {
file {
path => "/home/ubuntu/output/**/*.csv"
start_position => "beginning"
sincedb_write_interval => 15
mode => "read"
file_completed_action => "delete"
}
}
filter {
csv {
separator => "$"
columns => ["url", "index", "hash", "all", "subject", "test1", "test12", "test13", "test14", "test5", "test56", "test57", "date1", "date2"]
}
}
output {
elasticsearch
{
hosts => ["[https://hash.us-east-1.aws.found.io:9243"]
user => "user"
password => "pass"
index => "testindex"
}
}
Logstash.yml /etc/logstash/logstash.yml
path.data: /var/lib/logstash
pipeline.ordered: auto
queue.type: memory
queue.type: persisted
queue.max_bytes: 8gb
path.logs: /var/log/logstash
cloud.id: "BB_user:pass"
cloud.auth: "user:pass"
Note: I have removed all commented lines from logstash.yml while posting here.
Problem we are are facing ...
- Logstash will work perfectly for few hours then it starts producing delay of few mins then afterwards it will produce delay of few hours.
- When we restart, it starts working correctly. i.e. it will work perfectly for few hours then again start producing delay.
- Now, yesterday, it completely stopped working. When we restarted it, it never uploads anything to elasticsearch. It had 35k csv files pending and it didn't upload anything.
Can you please take a look at this and help us with it?