ElasticSearch deletes documents in an index automatically

joanjanku2000 · November 7, 2022, 10:51am

I have configured an ELK Cluster with 5 nodes, one being master and the other slaves. I index logs in the cluster once a day using logstash. I use a CronJOB (script) to copy the log files to the configured logstash directory. I have also manually set a .sincedb path for logstash.

However, a tricky thing happens. Almost every 3 days, index seems to be loosing documents and deleting everything prior to certain dates. I haven't configured any ILM policy, nor there is any script performing delete by query or delete full index. Even when calling _cat/indices formatted to show the creation date of te index, I see that it has been created almost 2 weeks ago. However, the documents that should've been for 2 weeks aren't there anymore, and even today it only had documents from 3 days ago.

Does anyone know why could this behaviour be happening or what can trigger it ?

Christian_Dahlqvist · November 7, 2022, 11:25am

Elasticsearch does not have master and slaves - the nodes form a cluster. Because of this you should always look to have 3 master eligible nodes in the cluster as the single master eligible node otherwise is a single point of failure.

There is nothing in Elasticsearch that delete data in indices, and ILM deletes complete indices. It would help if you shared your Logstash pipeline as it could be an issue with this that causes this behaviour, e.g. if you are incorrectly etting document IDs in your pipeline.

joanjanku2000 · November 7, 2022, 11:29am

input {
file {
path => "/home/somedir/haproxy/.log"
type => "haproxy"
}
file {
path => "/home/somedir/.log"
type => "webmap"
}
}

filter {
if [type] == "haproxy" {
grok {
match => ["message", "%{NOTSPACE:month}%{SPACE:sp}%{NOTSPACE:day} %{NOTSPACE:time} %{NOTSPACE:smth} %{NOTSPACE:process_and_id} %{IP:client_ip}:%{NUMBER:client_port:int} [%{NOTSPACE:haproxy_times
tamp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:TRequest:int}/%{NUMBER:TQueues:int}/%{NUMBER:TConnectionInServer:int}/%{NUMBER:TResponseFromServer:int}/%{NUMBER:TR
equestActiveInHaproxy:int} %{NUMBER:status_code} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state} %{NOTSPACE:some_flags} %{NOTS
PACE:srv_queue}/%{NOTSPACE:backend_queue} {%{HAPROXYCAPTUREDREQUESTHEADERS}} "%{NOTSPACE:http_verb} %{NOTSPACE:http_host}?%{NOTSPACE:http_params} %{NOTSPACE:http_protocol}""
,
"message", "%{NOTSPACE:month}%{SPACE:sp}%{NOTSPACE:day} %{NOTSPACE:time} %{NOTSPACE:smth} %{NOTSPACE:process_and_id} %{IP:client_ip}:%{NUMBER:client_port:int} [%{NOTSPACE:haproxy_timest
amp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:TRequest:int}/%{NUMBER:TQueues:int}/%{NUMBER:TConnectionInServer:int}/%{NUMBER:TResponseFromServer:int}/%{NUMBER:TRe
questActiveInHaproxy:int} %{NUMBER:status_code} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state} %{NOTSPACE:some_flags} %{NOTSP
ACE:srv_queue}/%{NOTSPACE:backend_queue} {%{HAPROXYCAPTUREDREQUESTHEADERS}} "%{NOTSPACE:http_verb} %{NOTSPACE:http_host} %{NOTSPACE:http_protocol}""
]
}
date {
match => ["haproxy_timestamp","dd/MMM/yyyy:HH:mm:ss.SSS"]
target => "@timestamp"
}
} else if [type] == "webmap" {
grok {
match => {"message" => "(?%{DATE_EU} %{TIME}) %{WORD:timezoneContinent}/%{WORD:timezoneState} %{LOGLEVEL:loglevel} %{WORD:logername} - %{WORD:user}--%{DATA:action}--%{WORD:entit
y_name}--%{WORD:id}--%{WORD:methodName}" }
    }
     date {
            match => ["timestamp","dd.MM.yyyy HH:mm:ss"]
            target => "@timestamp"
    }
     }
}

output {
if [type] == "haproxy" {
elasticsearch {
hosts => ["192.168.X.XXX:9200"]
index => "logs_haproxy"
}
}
else if [type] == "webmap" {
elasticsearch {
hosts => ["192.168.X.XXX:9200"]
index => "digital_footprint_logs"
}
}
}

Hi , yes I am attaching the logstash pipeline.

Could file deletion from the input directory, cause docs to be deleted ?

Christian_Dahlqvist · November 7, 2022, 11:29am

Please do not post screenshots of text - it can be very hard to read. Instead copy and paste and format using the tools available.

It seems like you are not setting any ID so that means the Logstash pipeline is likely not the problem. I can however see that you have not secured your cluster. That means someone could access and delete data without your knowledge. I would recommend securing it.

joanjanku2000 · November 7, 2022, 11:32am

Yes, just edited the response above

joanjanku2000 · November 7, 2022, 11:33am

Yes, I am in the process of working with security in the test environment. Thanks for the sugestion.

joanjanku2000 · November 7, 2022, 11:34am

It's strange cause it works perfectly fine in test env. This problem only occurs in prod

joanjanku2000 · November 7, 2022, 11:35am

And I also use runtime mapped fields. So the index being deleted and re-created would also need runtime mapped fields to be configured through painless script which can only be done by some bash scripts I have, and are not in any crontabs.

Christian_Dahlqvist · November 7, 2022, 11:35am

Which version of Elasticsearch are you using?

joanjanku2000 · November 7, 2022, 11:36am

Meaning that the index is not likely deleted but rather the specific documents are being deleted

joanjanku2000 · November 7, 2022, 11:36am

It's 7.17

Christian_Dahlqvist · November 7, 2022, 11:36am

That means that some external process is likely deleting them.

joanjanku2000 · November 7, 2022, 11:37am

I also checked the logs when I was in prod. I didnt seem to find any log trace of documents being deleted

Christian_Dahlqvist · November 7, 2022, 11:39am

That would not be logged unless you have audit logging.

joanjanku2000 · November 7, 2022, 11:39am

I don't actually.

joanjanku2000 · November 7, 2022, 11:39am

I should probably turn audit logging on

joanjanku2000 · November 7, 2022, 11:40am

Would that tell me from where the deletion si comming though ?

Christian_Dahlqvist · November 7, 2022, 11:41am

It is not logged, so there is no way to tell. If you enable security, any script trying to delete without the correct credentials would fail.

joanjanku2000 · November 7, 2022, 11:45am

But it would log the fact that something is trying to delete a specific document, right ?

Christian_Dahlqvist · November 7, 2022, 11:51am

No. Only audit logging would do that, and that I believe is a commercial feature. But if you secure the cluster and the deleting stops you might be able to conclude that something was deleting it.

Topic		Replies	Views
Indices keep getting deleted automatically Elasticsearch ilm-index-lifecycle-management	5	444	May 31, 2022
Elastic search Delete API Elasticsearch	7	467	July 17, 2018
Delete index automatically after particular days Elasticsearch ilm-index-lifecycle-management	7	546	November 25, 2021
Documents deleted from all indices Elasticsearch	7	387	November 28, 2020
Documents in elasticsearch getting deleted automatically? Elasticsearch	7	4032	July 5, 2017

ElasticSearch deletes documents in an index automatically

Related topics