Hi all,
i use elastisearch 6.2.2 in a really basic test environment (no cluster, single node).
I've just realized that all records are duplicated exactly twice. The two records have different _id but all other fields are the same.
Only a short example:
record 1 @timestamp March 22nd 2018, 17:17:20.700 @version 1 _id HEGATmIBZYl-EB49OMRc
_index logstash-2018.03.22
message Connection lost after 0 seconds
record 2 @timestamp March 22nd 2018, 17:17:20.700 @version 1 _id JUGATmIBZYl-EB49OsRI
_index logstash-2018.03.22
message Connection lost after 0 seconds
I configured logstash in debug mode but i didn't see any duplicate data.
I tried to add a second elasticsearch ouput in logstash and the data is not duplicate in this one but the data is always double in the first elasticsearch.
This is my logstash.conf:
input {
udp {
host => "127.0.0.1"
port => 10514
codec => "json"
type => "rsyslog"
}
}
# This is an empty filter block. You can later add other filters here to further process
# your log lines
#https://gist.github.com/mesimeris/bf6cd912d11b674c4a2b
filter {
if [sysloghost] == "192.168.0.1" {
grok {
patterns_dir => "/etc/logstash/grok/mikrotik.pattern"
match => { "message" => "%{MIKROTIKFIREWALL}" }
}
}
}
# This output block will send all events of type "rsyslog" to Elasticsearch at the configured
# host and port into daily indices of the pattern, "rsyslog-YYYY.MM.DD"
output {
if "_grokparsefailure" in [tags] {
# write events that didn't match to a file
file { "path" => "/tmp/grok_failures.txt" }
} else {
if [type] == "rsyslog" {
elasticsearch { hosts => [ "127.0.0.1:9200" ] }
elasticsearch { hosts => [ "192.168.0.122:9200" ] }
stdout { codec => rubydebug }
}
}
}
Logstash concatenates config files it finds in its directory. Do you by any chance have any other config files that specify the same Elasticsearch output?
Yes, i have other two logstash configs with the same output but different inputs and filters.
I add the second elastichsearch to second logstash config. The data in always not duplicate on the second and duplicate on the first.
input {
# this is the actual live log file to monitor
#file {
# path => ["/home/cowrie/cowrie-git/log/cowrie.json"]
# codec => json
# type => "cowrie"
#}
# this is to send old logs to for reprocessing
#tcp {
# port => 3333
# type => "cowrie"
#}
beats {
port => 5044
type => "cowrie"
}
}
filter {
if [type] == "cowrie" {
json {
source => message
}
date {
match => [ "timestamp", "ISO8601" ]
}
mutate {
convert => { "dst_port" => "integer" }
convert => { "src_port" => "integer" }
}
if [src_ip] {
mutate {
add_field => { "src_host" => "%{src_ip}" }
}
dns {
reverse => [ "src_host" ]
nameserver => [ "192.168.0.1" ]
action => "replace"
hit_cache_size => 4096
hit_cache_ttl => 900
failed_cache_size => 512
failed_cache_ttl => 900
}
geoip {
source => "src_ip"
target => "geoip"
#database => "/opt/logstash/vendor/geoip/GeoLite2-City.dat"
database => "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.3-java/vendor/GeoLite2-City.mmdb"
}
}
mutate {
remove_tag => [ "beats_input_codec_plain_applied"]
# cut useless fields added by filebeat, if you use it of course
remove_field => [ "source", "offset", "input_type" ]
}
}
}
output {
if [type] == "cowrie" {
elasticsearch {
hosts => ["localhost:9200"]
}
elasticsearch { hosts => [ "192.168.0.122:9200" ] }
# file {
# path => "/tmp/cowrie-logstash.log"
# codec => json
# }
# stdout {
# codec => rubydebug
# }
}
}
Thank you all for your support and tips.
I found the configuration error in my last logstash conf file.
The line of "if type" was commented in the output.
It's a stupid error but not so easy to see.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.