Logstash duplicating log events?

hi -- I have a fairly simple logstash setup:

  • logstash runs on a web server, and sends events to redis
  • logstash running on another server takes the events from redis and outputs them into ES.

I am occasionally, but not infrequently, seeing duplicated events showing up in ES. If I look at the log file for the web server, there is only one event but it shows up in ES twice, and each ES document has a different _id. I know the events are duplicates because many of these events have unique IDs in a field specific to the application doing the logging -- the IDs are not sufficient for ES _ids, so I don't use them as such.

The duplicates do not seem to be associated with restarts of logstash, redis, or ES. I am occasionally seeing problems with one server or another connecting to my redis instance, but these problems don't always coincide with the duplicates.

I am running logstash 2.0 and ES 2.0 (I plan on upgrading, but I don't think that this will help this particular problem).

Thanks,
John Ouellette

I was about to send in another message about this, having forgotten that I'd sent in this a couple months ago... A little more info:
I am running Logstash 2.1.1 with the following config:

On the host generating the log events:

output {
redis {
host => ["host1", "host2"]
data_type => "list"
key => "apache-raw"
}
}

On the host doing the indexing:

input {
redis {
host => "host1"
data_type => "list"
key => 'apache-raw'
}
}
input {
redis {
host => "host2"
data_type => "list"
key => 'apache-raw'
}
}

In Elasticsearch (2.1.1) I am seeing duplicate events. The only thing I can think of is that the redis output is sending the events to each host listed in the host parameter. Admittedly, the docs don't say that it won't do this, but I recall reading that it would send all events to only one of the hosts, failing over to the other host only when it fails to connect to the first host.

I'm no redis expert but if those two hosts are in a cluster then you are
getting duplicate events because of the two inputs. I have no idea how it
is recommended to read from the two hosts using the input. Take one input
out and see if the duplication ceases.

Thanks, @Joe_Lawson -- they aren't in a cluster, but I will try removing an input.