1.5.0 works from CLI, but not from Puppet elasticsearc/logstash module


(Chris Neal) #1

Hi all. Super strange problem here. I upgraded my Logstash version that is managed by the Puppet Forge module elasticsearch/logstash from 1.5.0 beta 1 to 1.5.0-1. The uninstall of the beta worked fine. The install of the new version went fine.

What I'm seeing is that when starting LS from the service definition that the Puppet module creates, it starts up, but NO events ever process. If I start LS with the same config file but from the CLI, then events process fine. I've been looking at it for hours now and cannot figure this one out. Both of the process strings are the same from a "ps -ef"...I'm just stumped.

When running LS as a service with --debug, I do see everything load successfully:

{:timestamp=>"2015-05-18T21:49:46.552000-0400", :message=>"config LogStash::Outputs::ElasticSearch/@retry_max_interval = 5", :level=>:debug, :file=>"logstash/config/mixin.rb", :line=>"112", :method=>"config_init"}
{:timestamp=>"2015-05-18T21:49:46.905000-0400", :message=>"Registering file input", :path=>["/var/log/httpd/kibana3_access_ssl.log"], :level=>:info, :file=>"logstash/inputs/file.rb", :line=>"76", :method=>"register"}
{:timestamp=>"2015-05-18T21:49:46.907000-0400", :message=>"Registering file input", :path=>["/var/log/httpd/kibana3_error_ssl.log"], :level=>:info, :file=>"logstash/inputs/file.rb", :line=>"76", :method=>"register"}
{:timestamp=>"2015-05-18T21:49:46.910000-0400", :message=>"_sincedb_open: /tmp/access: No such file or directory - /tmp/access", :level=>:debug, :file=>"filewatch/tail.rb", :line=>"213", :method=>"_sincedb_open"}
{:timestamp=>"2015-05-18T21:49:46.912000-0400", :message=>"_discover_file_glob: /var/log/httpd/kibana3_access_ssl.log: glob is: []", :level=>:debug, :file=>"filewatch/watch.rb", :line=>"132", :method=>"_discover_file"}
{:timestamp=>"2015-05-18T21:49:46.916000-0400", :message=>"_sincedb_open: /tmp/error: No such file or directory - /tmp/error", :level=>:debug, :file=>"filewatch/tail.rb", :line=>"213", :method=>"_sincedb_open"}
{:timestamp=>"2015-05-18T21:49:46.917000-0400", :message=>"Adding type with date config", :type=>"", :field=>"timestamp", :format=>"dd/MMM/yyyy:HH:mm:ss Z", :level=>:debug, :file=>"logstash/filters/date.rb", :line=>"174", :method=>"setupMatcher"}
{:timestamp=>"2015-05-18T21:49:48.672000-0400", :message=>"Registering kafka producer", :topic_id=>"app_kibana3_raw", :broker_list=>"bdprodk01.dbhotelcloud.com:9092,bdprodk02.dbhotelcloud.com:9092", :level=>:info, :file=>"logstash/outputs/kafka.rb", :line=>"144", :method=>"register"}
{:timestamp=>"2015-05-18T21:49:48.683000-0400", :message=>"Create client to elasticsearch server on bdprodes02.dbhotelcloud.com:", :level=>:info, :file=>"logstash/outputs/elasticsearch.rb", :line=>"340", :method=>"register"}
{:timestamp=>"2015-05-18T21:49:50.338000-0400", :message=>"Automatic template management enabled", :manage_template=>"true", :level=>:info, :file=>"logstash/outputs/elasticsearch.rb", :line=>"348", :method=>"register"}
{:timestamp=>"2015-05-18T21:49:50.753000-0400", :message=>"Using mapping template", :template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>#<Java::JavaUtil::ArrayList:0x702e7fb5>, "properties"=>{"@version"=>{"type"=>"string", "index"=>"not_analyzed"}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"location"=>{"type"=>"geo_point"}}}}}}}, :level=>:info, :file=>"logstash/outputs/elasticsearch.rb", :line=>"396", :method=>"get_template"}
{:timestamp=>"2015-05-18T21:49:50.821000-0400", :message=>"New Elasticsearch output", :cluster=>"elasticsearch-prod", :host=>["bdprodes02.dbhotelcloud.com"], :port=>"9300-9305", :embedded=>false, :protocol=>"transport", :level=>:info, :file=>"logstash/outputs/elasticsearch.rb", :line=>"357", :method=>"register"}
  {:timestamp=>"2015-05-18T21:49:50.831000-0400", :message=>"Pipeline started", :level=>:info, :file=>"logstash/pipeline.rb", :line=>"86", :method=>"run"}

Same output for starting it from the CLI, but the events from the Apache server actually get processed.

I'm very confused by this behavior to say the least.

Here is my logstash.conf file:

input {
  file {
    path => "/var/log/httpd/kibana3_access_ssl.log"
    type => "kibana3.access.bdprodm05"
    sincedb_path => "/tmp/access"
  }
  file {
    path => "/var/log/httpd/kibana3_error_ssl.log"
    type => "kibana3.error.bdprodm05"
    sincedb_path => "/tmp/error"
  }
}

filter {
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }

  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
    tag_on_failure => [ ]
  }

  geoip {
    source => "clientip"
    target => "geoip"
    add_field => { "[geoip][coordinates]" => "%{[geoip][longitude]}" }
    add_field => { "[geoip][coordinates]" => "%{[geoip][latitude]}"  }
  }

  # Fix types
  mutate {
    convert => { "[geoip][coordinates]" => "float"}
  }
}

output {

#  stdout { codec => rubydebug }
  kafka {
    batch_num_messages => 10
    broker_list => "mykafkahost1:9092,mykafkahost2:9092"
    client_id => "logstash"
    compression_codec => "gzip"
    producer_type => "async"
    topic_id => "app_kibana3_raw"
  }
  elasticsearch {
    bind_host => "myeshost1"
    cluster => "elasticsearch-prod"
    flush_size => 10
    host => "myeshost1"
    index => "app-kibana3-%{+YYYYMMdd}"
    document_type => "app_kibana3"
    node_name => "logstash-bdprodm05"
    protocol => "transport"
  }
} 

Does anyone have any idea what might be going on? Something I can try?
Thanks so much for your time.
Chris


(Chris Neal) #2

Sorry, I probably should have said I'm on Centos 6.5


(Spuder) #3

I've been encountering the same problem with logstash 1.5rc4 and kafka 0.8.2. I installed logstash from the chef cookbook.

For 2 days, logstash would not read any data from kafka when started as a service, but when I started it from the command line, the data would stream in correctly.

I rebooted my kafka servers, and the problem appears to have mostly gone away. I suspect it was a zookeeper related problem.

I can't reproduce it right now, but it smells like it is related to this:

Check your kafka logs. I had the following errors in kafka about the time that logstash was acting funny.

log4j, [2015-05-18T11:51:10.419] ERROR: kafka.consumer.ZookeeperConsumerConnector: [logstash_swat-logstash01-1431970927138-f1e70431], error during syncedRebalance
kafka.common.ConsumerRebalanceFailedException: logstash_swat-logstash01-1431970927138-f1e70431 can't rebalance after 4 retries
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:633)
	at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(ZookeeperConsumerConnector.scala:551)
log4j, [2015-05-18T11:51:10.816] ERROR: kafka.consumer.ConsumerFetcherThread: [ConsumerFetcherThread-logstash_swat-logstash01-1431971470431-2d3a852c-0-1664195530], Current offset 135524246 for partition [ndstreamer,2] out of range; reset offset to 135944171

Also, check how many kafka consumers you have. You can only consume as many threads as you have kafka partitions. I set mine to 8, and give logstash 3 threads. That lets me connect 5 more consumers for debugging.


Logstash stops processing files after a while
(Chris Neal) #4

Thanks very much for your reply. :smile:

It does sound similar to what I'm seeing, however I'm producing to Kafka not consuming from Kafka. I see Apache logs stream by from a tail -F, but nothing gets picked up by LogStash....unless I run it from the command line.

I checked my Kafka logs, and I didn't have any "reset offset" messages. I never had LogStash crash on me either, just never process any events!

I did poke around in the github pages for the Puppet module, and the version compatibility table says the current version works with up to 1.4.2, so there could be something that needs updating in the module to make it compatible with versions >=1.5.0. I asked a similar question on the GitHub page as well, in hopes that the author might chime in and see if that might be the case.

Again, thanks for your time.
Chris


(Spuder) #5

Are you on CentOs?

This could be a problem with the permissions of the file. The init script runs as a logstash user who may not be able to read the file.

You may need to use setfacl to grant access.

Check if the puppet module is overwriting the init script.


(Chris Neal) #6

Again, thanks for the reply!

I am on CentOS, yes.

The Puppet module creates the init.d/logstash script itself, but it looks like it has been updated to properly set the $HOME:

start() {

  LS_JAVA_OPTS="${LS_JAVA_OPTS} -Djava.io.tmpdir=${LS_HOME}"
  HOME=${LS_HOME}
  export PATH HOME LS_HEAP_SIZE LS_JAVA_OPTS LS_USE_GC_LOGGING
<snip>

Sill looking... :smile:
Chris


(Spuder) #7

Actually, this is the bug I should have referred to.


(Chris Neal) #8

Thanks!

It's a little hard for me to tell from the long discussion whether or not I'm hitting that or not :smile:
It doesn't appear to have been committed and made part of a versioned module yet, if I'm reading the status correctly. Is there a quick hack/work-around that I could try to see if indeed this is my issue?

Many thanks!
Chris


(Spuder) #9

Try starting from the command line as the logstash user. You should experience the same symptoms as if you started with the init script.

sudo -u logstash -g logstash -H logstash /opt/logstash/bin/logstash -f foobar.conf

Or just try tailing a file as the logstash user

sudo -u logstash -g logstash -H logstash tail foobar.log

(Chris Neal) #10

Very nice!

I think we may have a winner here!

root@bdprodm05:[215]:/etc/init.d> sudo -u logstash -g logstash -H tail -f /var/log/httpd/kibana3_access_ssl.log
tail: cannot open `/var/log/httpd/kibana3_access_ssl.log' for reading: Permission denied

It looked like there was a "quick fix" suggestion to do the following (comment out this line of the init script):

#  nice -n ${LS_NICE} chroot --userspec $LS_USER:$LS_GROUP  /

Is that the current workaround until the fix is committed?
You have no idea how much I appreciate your time with this.
Chris


(Spuder) #11

I spent days trying to figure this out the first time I ran into this.

A better work around is to chmod or run setfacl the file so that the logstash user can read it. `
http://linux.die.net/man/1/setfacl

If that isn't feasible, then modify the script.
Be advised that if you comment out that line from the init script, then the logstash process will start as root. That means that /var/log/logstash*.log will be owned by root.

FYI this bug isn't present in ubuntu.


(Chris Neal) #12

Well....
This is embarrassing.
This got me to do some looking at permissions in general, and came to find out that the Apache logs directory was set to 744 root:root. WTH?
Changing that to a more appropriate 755 fixed the problem.

Sheesh.
Thanks for all your time and effort.
Chris


(system) #13