Using logstash/elastic search for firewall syslog indexing: syslogs are lost on the way


(Antoine Brun) #1

Hello,

I'm really new to elasticsearch and I'm running some evaluation
(ElasticSearch vs SOLR) in order to decide which one I should use for
handling high log volume indexing.
I have a very basic setup:
syslogInjector -> logstash listening on port 514 -> elasticsearch.

input {
tcp {
port => 514
type => syslog
}
udp {
port => 514
type => syslog
}
}
filter {
grok {
match => { "message" => "%{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
}
output {
elasticsearch { host => localhost }
}

My test is:
inject 10000 1 ko syslog, at the rate 2000/sec.

I did a tcpdump on the server that receives the logs and all 10000 logs are
received.

BUT, when I look at the number of document indexed in Lucene, I almost
never get 10000 document, sometime I do, but most of the time I loose up to
40% of the logs.

Is there anything I can do to make sure that logstash doesn't loose data (I
assume it is logstash but not sure)
Is logstash doing some buffering?

Thank you for your response

Antoine Brun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/656cec27-895b-4ade-988b-054c4e79c325%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Antoine Brun) #2

Hello,

maybe my post wasn't specific enough...
well, I'm still trying to validate logstash and still I'm loosing up to 40%
of logs:

  • 10000 logs injected.
  • 10000 logs detected by tcpdump (listening on port 514)
  • 6000-10000 documents created in elasticsearch index.

is there a way to count, in logstash, the number of event that were process
and forwarded to elasticsearch ?

Thanks

Antoine

Le mercredi 26 mars 2014 15:13:27 UTC+1, Antoine Brun a écrit :

Hello,

I'm really new to elasticsearch and I'm running some evaluation
(ElasticSearch vs SOLR) in order to decide which one I should use for
handling high log volume indexing.
I have a very basic setup:
syslogInjector -> logstash listening on port 514 -> elasticsearch.

input {
tcp {
port => 514
type => syslog
}
udp {
port => 514
type => syslog
}
}
filter {
grok {
match => { "message" => "%{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
}
output {
elasticsearch { host => localhost }
}

My test is:
inject 10000 1 ko syslog, at the rate 2000/sec.

I did a tcpdump on the server that receives the logs and all 10000 logs
are received.

BUT, when I look at the number of document indexed in Lucene, I almost
never get 10000 document, sometime I do, but most of the time I loose up to
40% of the logs.

Is there anything I can do to make sure that logstash doesn't loose data
(I assume it is logstash but not sure)
Is logstash doing some buffering?

Thank you for your response

Antoine Brun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b190724e-11d8-4237-97f8-f5bb0d028847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3