Hi All,
My ELK setup is,
Logstash running in k8s, version 7.16.2
Elasticsearch in vm with cluster of 4 data nodes and 2 coordinators all running version 7.16.2
Issue: Data is not saved into Elasticsearch and there are no any errors es or Logstash
Data that we are sending is microservice logs for request and responses. For some time, we have seen some of the data is not saved in Elasticsearch. We send around 10k records per minute. This data is passed from tcp to a Logstash which will be sent to kafka and another Logstash will be there for reading from kafka and saving to Elasticsearch. There is only 1 replica for indexes we have, the es status us green and there are no sharding issues.
We have only seen the data missing issue when the data is sent to Elasticsearch, for the testing we ran a script which will send 100 events into Logstash and Logstash will save it to the Elasticsearch. This is out output config in the pipeline.
output {
elasticsearch {
hosts => ["${ELASTICSEARCH_1_URL}"]
index => "test-data-usage"
}
}
We for testing we have changed the output to file, and we were able get all the events into the file. Later on, we added a new pipeline to get the data from that file and sent to Elasticsearch, this will also result in data loss. We are only getting around 25-30 from 100 events that we send.
Could please share info on this, to debug the issue that we have??
Here is some more testing we did,
-
We started testing with sending 100 events into logstash using postman. There was no delay in the logstash pipeline, and the batch size is 80. We were able to see the events flowing from logstash output into elasticsearch. This data is saved into 1 index for that day, there were only 30% of the data that we sent in the usage index.
-
We started the same test but with the delay of 20secods and the batch size is set to 125 in logstash. After running this usage index only shows 30%.
-
For the next test we used the same batch size but no delay and, in the pipeline, we saved the documents with traceid. After this we also only got 30% of the data in usage index.
-
We kept the same settings from above and added a new output to the pipeline to save the data into a file, so this will send the data to Elasticsearch and file at the same time. After testing this the was able to records all the data, but the usage index had only 30% of the records.
-
We re-ran the above test case but only with one output at a time, when we ran file output only, file had all the 100 records, when we change into Elasticsearch only 30% is there.
-
We updated the pipeline to save only into a file and created another pipeline to read that file, when we checked the file had all the records, but index only has 30%.
-
After this we added a delay before sending to Elasticsearch, delay was 6s in the pipeline. After adding this we as able increase the number of records in the usage index. There was more 20% of data.
-
For the next test case we added a new Elasticsearch coordinator and updated the pipeline to send to both. For this test we used the pipeline with the file input to read the events written from the main pipeline. The usage index only had 30% of the data.
-
We ran all the testes before we did using multiple coordinators, but result was same only 30% was records but when the delay in the pipeline we had around 50% of the data.
-
For the next test we updated the pipeline to save each document in a different index. So, the new index will be test-usage-document_id-current_date. We are still using the file input to get the data; with this we were able to get all the data in a different index. We send 100 events and total of 100 indexes were there.
-
Next, we added the output as file and read the events from it, this also resulted in only 30% of the events
-
We also tried to save documents with trace_id or document_id or with random uuid, this didn’t increase the number of events.