I have configured an ELK Cluster with 5 nodes, one being master and the other slaves. I index logs in the cluster once a day using logstash. I use a CronJOB (script) to copy the log files to the configured logstash directory. I have also manually set a .sincedb path for logstash.
However, a tricky thing happens. Almost every 3 days, index seems to be loosing documents and deleting everything prior to certain dates. I haven't configured any ILM policy, nor there is any script performing delete by query or delete full index. Even when calling _cat/indices formatted to show the creation date of te index, I see that it has been created almost 2 weeks ago. However, the documents that should've been for 2 weeks aren't there anymore, and even today it only had documents from 3 days ago.
Does anyone know why could this behaviour be happening or what can trigger it ?
Elasticsearch does not have master and slaves - the nodes form a cluster. Because of this you should always look to have 3 master eligible nodes in the cluster as the single master eligible node otherwise is a single point of failure.
There is nothing in Elasticsearch that delete data in indices, and ILM deletes complete indices. It would help if you shared your Logstash pipeline as it could be an issue with this that causes this behaviour, e.g. if you are incorrectly etting document IDs in your pipeline.
1 Like
input {
file {
path => "/home/somedir/haproxy/.log"
type => "haproxy"
}
file {
path => "/home/somedir/.log"
type => "webmap"
}
}
filter {
if [type] == "haproxy" {
grok {
match => ["message", "%{NOTSPACE:month}%{SPACE:sp}%{NOTSPACE:day} %{NOTSPACE:time} %{NOTSPACE:smth} %{NOTSPACE:process_and_id} %{IP:client_ip}:%{NUMBER:client_port:int} [%{NOTSPACE:haproxy_times
tamp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:TRequest:int}/%{NUMBER:TQueues:int}/%{NUMBER:TConnectionInServer:int}/%{NUMBER:TResponseFromServer:int}/%{NUMBER:TR
equestActiveInHaproxy:int} %{NUMBER:status_code} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state} %{NOTSPACE:some_flags} %{NOTS
PACE:srv_queue}/%{NOTSPACE:backend_queue} {%{HAPROXYCAPTUREDREQUESTHEADERS}} "%{NOTSPACE:http_verb} %{NOTSPACE:http_host}?%{NOTSPACE:http_params} %{NOTSPACE:http_protocol}""
,
"message", "%{NOTSPACE:month}%{SPACE:sp}%{NOTSPACE:day} %{NOTSPACE:time} %{NOTSPACE:smth} %{NOTSPACE:process_and_id} %{IP:client_ip}:%{NUMBER:client_port:int} [%{NOTSPACE:haproxy_timest
amp}] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{NUMBER:TRequest:int}/%{NUMBER:TQueues:int}/%{NUMBER:TConnectionInServer:int}/%{NUMBER:TResponseFromServer:int}/%{NUMBER:TRe
questActiveInHaproxy:int} %{NUMBER:status_code} %{NUMBER:bytes_read:int} %{NOTSPACE:captured_request_cookie} %{NOTSPACE:captured_response_cookie} %{NOTSPACE:termination_state} %{NOTSPACE:some_flags} %{NOTSP
ACE:srv_queue}/%{NOTSPACE:backend_queue} {%{HAPROXYCAPTUREDREQUESTHEADERS}} "%{NOTSPACE:http_verb} %{NOTSPACE:http_host} %{NOTSPACE:http_protocol}""
]
}
date {
match => ["haproxy_timestamp","dd/MMM/yyyy:HH:mm:ss.SSS"]
target => "@timestamp"
}
} else if [type] == "webmap" {
grok {
match => {"message" => "(?%{DATE_EU} %{TIME}) %{WORD:timezoneContinent}/%{WORD:timezoneState} %{LOGLEVEL:loglevel} %{WORD:logername} - %{WORD:user}--%{DATA:action}--%{WORD:entit
y_name}--%{WORD:id}--%{WORD:methodName}" }
}
date {
match => ["timestamp","dd.MM.yyyy HH:mm:ss"]
target => "@timestamp"
}
}
}
output {
if [type] == "haproxy" {
elasticsearch {
hosts => ["192.168.X.XXX:9200"]
index => "logs_haproxy"
}
}
else if [type] == "webmap" {
elasticsearch {
hosts => ["192.168.X.XXX:9200"]
index => "digital_footprint_logs"
}
}
}
Hi , yes I am attaching the logstash pipeline.
Could file deletion from the input directory, cause docs to be deleted ?
Please do not post screenshots of text - it can be very hard to read. Instead copy and paste and format using the tools available.
It seems like you are not setting any ID so that means the Logstash pipeline is likely not the problem. I can however see that you have not secured your cluster. That means someone could access and delete data without your knowledge. I would recommend securing it.
1 Like
Yes, just edited the response above
Yes, I am in the process of working with security in the test environment. Thanks for the sugestion.
It's strange cause it works perfectly fine in test env. This problem only occurs in prod
And I also use runtime mapped fields. So the index being deleted and re-created would also need runtime mapped fields to be configured through painless script which can only be done by some bash scripts I have, and are not in any crontabs.
Which version of Elasticsearch are you using?
Meaning that the index is not likely deleted but rather the specific documents are being deleted
That means that some external process is likely deleting them.
1 Like
I also checked the logs when I was in prod. I didnt seem to find any log trace of documents being deleted
That would not be logged unless you have audit logging.
1 Like
I should probably turn audit logging on
Would that tell me from where the deletion si comming though ?
It is not logged, so there is no way to tell. If you enable security, any script trying to delete without the correct credentials would fail.
But it would log the fact that something is trying to delete a specific document, right ?
No. Only audit logging would do that, and that I believe is a commercial feature. But if you secure the cluster and the deleting stops you might be able to conclude that something was deleting it.
1 Like