Logstash not parsing new files added to docker volume

Vikrant_Aggarwal · June 21, 2018, 7:29am

I have created a docker-compose file for ELK stack to parse the logs.

[root@onw-kwah-2v ELK-compose]# cat docker-compose.yml
version: '3'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.2.4
    container_name: elasticsearch
    environment:
      - "discovery.type=single-node"
      - "xpack.security.enabled=false"
      - "XPACK_MONITORING_ENABLED=false"
    volumes:
      - esdata:/usr/share/elasticsearch/data:rw
    ports:
      - 9200:9200
    networks:
      - elk
    restart: unless-stopped
  kibana:
    depends_on:
    - elasticsearch
    image: docker.elastic.co/kibana/kibana:6.2.4
    networks:
      - elk
    environment:
      - "xpack.security.enabled=false"
      - "XPACK_MONITORING_ENABLED=false"
    ports:
      - 5601:5601
    restart: unless-stopped
  logstash:
    image: ervikrant06/logstashbpimage:6.2.4
    depends_on:
      - elasticsearch
    networks:
      - elk
    environment:
      - "xpack.security.enabled=false"
      - "INPUT1=/var/tmp/log"
      - "XPACK_MONITORING_ENABLED=false"
    restart: unless-stopped
    volumes:
      - /tmp/logstash/:/var/tmp/log
volumes:
  esdata:
    driver: local
networks:
  elk:

I have used this dockerfile to create my image. Purpose of using ENV is to take the input from user depending upon the location of log files present on host system.

# cat Dockerfile
FROM docker.elastic.co/logstash/logstash:6.2.4
RUN rm -f /usr/share/logstash/pipeline/logstash.conf
ADD pipeline/ /usr/share/logstash/pipeline/
ENV INPUT1 ${variable:-/var/tmp/}

My pipeline file is:

[root@onw-kwah-2v logstash]# cat pipeline/bp-filter.conf
input {
  file {
    path => "${INPUT1}/syslog.log*"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter {
  grok {
    match => {"message" => ["%{TIMESTAMP_ISO8601:logdate} %{HOSTNAME:hostname} %{WORD:conatiner_name}: %{GREEDYDATA:[@metadata][messageline]}",
    "%{TIMESTAMP_ISO8601:logdate} %{HOSTNAME:hostname} %{WORD:container}\[%{INT:haprorxy_id}\]: %{GREEDYDATA:[@metadata][messageline]}"]}
  }
  if "_grokparsefailure" in [tags] {
    drop {}
  }
  mutate {
    remove_field => ["message", "@timestamp"]
  }
  json {
    source => "[@metadata][messageline]"
  }
  if "_jsonparsefailure" in [tags] {
    drop {}
  }
  date {
    match => ["logdate", "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ"]
  }
}
output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logs-%{+yyyy-MM-dd}"
    document_type => "applicationlogs"
  }
}

All containers are started successfully with docker-compose. before bringing up the containers, following files were added into the /tmp/logstash directory of host machine. I was expecting that all files should have been parsed by logstash because of the input file name SYNTAX which i have used in it.

[root@onw-kwah-2v logstash]# ls
syslog.log  syslog.log11thJune  syslog.log.gz

But it's only parsing the content of syslog.log file which contains messages of 8th June. List of indices which are present in ES. All log files contain similar kind of log messages. this is not a parsing issue.

# curl 172.19.0.2:9200/_cat/indices?pretty
green  open .monitoring-es-6-2018.06.21 Hy-F0j9LSoWqSOx8TZ8hrg 1 0 195 0 140.2kb 140.2kb
yellow open logs-2018-06-08             FdP1XDFyQJ-RClcjtoPlAw 5 1   6 0    83kb    83kb

Can anyone please help me to understand the following points:

Why the monitoring index is created in ES? I have used option to disable it in my compose file.
Why the logstash is not able to read all the input log files?
If the kibana is storing the dashboard in ES .kibana index then how can i save the dashboard so that users after spinning up the ELK can access the dashboard? I know volume is used for persistent storage, but I am talking about scenario in which this docker compose setup should be run on different setups in which storing dashboard on volume would not be an option.

magnusbaeck · June 21, 2018, 8:38am

Why the logstash is not able to read all the input log files?

Things that I'd check:

Is the grok filter failing, resulting in everything being dropped by the first drop filter?
Is there a problem with the JSON strings, resulting in everything being dropped by the second drop filter?
Is the .gz file somehow disturbing the file input?

Vikrant_Aggarwal · June 21, 2018, 4:26pm

Thanks for providing the hints. Instead of using the docker I thought of running the logstash in foreground as a process on my Mac machine. Here is what I am seeing on screen.

bin/logstash -f config/pipelines/bp-filter.conf
Sending Logstash's logs to /Users/viaggarw/Documents/ELK/logstash-6.2.4/logs which is now configured via log4j2.properties
[2018-06-21T21:24:35,515][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/Users/viaggarw/Documents/ELK/logstash-6.2.4/modules/netflow/configuration"}
[2018-06-21T21:24:36,403][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/Users/viaggarw/Documents/ELK/logstash-6.2.4/modules/fb_apache/configuration"}
[2018-06-21T21:24:47,156][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-06-21T21:25:07,098][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.2.4"}
[2018-06-21T21:25:14,160][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-06-21T21:26:49,404][WARN ][logstash.outputs.elasticsearch] You are using a deprecated config setting "document_type" set in elasticsearch. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Document types are being deprecated in Elasticsearch 6.0, and removed entirely in 7.0. You should avoid this feature If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"document_type", :plugin=><LogStash::Outputs::ElasticSearch hosts=>[//127.0.0.1:9200], index=>"logs-%{+yyyy-MM-dd}", document_type=>"applicationlogs", id=>"7e9585c1de37bcdc062307f55d6729d231ec52fd6c399c380514d1787fdb315e", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_3c9e6a2d-0fc1-45f2-881f-ff0826132842", enable_metric=>true, charset=>"UTF-8">, workers=>1, manage_template=>true, template_name=>"logstash", template_overwrite=>false, doc_as_upsert=>false, script_type=>"inline", script_lang=>"painless", script_var_name=>"event", scripted_upsert=>false, retry_initial_interval=>2, retry_max_interval=>64, retry_on_conflict=>1, action=>"index", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5, timeout=>60, pool_max=>1000, pool_max_per_route=>100, resurrect_delay=>5, validate_after_inactivity=>10000, http_compression=>false>}
[2018-06-21T21:26:51,459][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-06-21T21:27:02,865][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://127.0.0.1:9200/]}}
[2018-06-21T21:27:03,150][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
[2018-06-21T21:27:09,350][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://127.0.0.1:9200/"}
[2018-06-21T21:27:11,019][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-06-21T21:27:11,137][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-06-21T21:27:11,872][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-06-21T21:27:12,808][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-06-21T21:27:14,927][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1:9200"]}
[2018-06-21T21:27:26,245][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x5315ece0 run>"}
[2018-06-21T21:27:26,874][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}

My log files present at location are:

$ ls
syslog.log_11thJune	syslog.log_8thJune

But still in ES only one index is created but this time it's for 11th June previously while using docker it was created for 8th June.

$ curl localhost:9200/_cat/indices?pretty
yellow open logs-2018-06-11     C4RxEgDTTA-dSiHIPEyxmA 5 1     14 0 113.5kb 113.5kb

I am not able to understand what is happening. Here is my pipeline file.

$ cat config/pipelines/bp-filter.conf
input {
  file {
    path => "/var/tmp/logs/syslog*"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}
filter {
  grok {
    match => {"message" => ["%{TIMESTAMP_ISO8601:logdate} %{WORD:hostname} %{WORD:container_name}: %{GREEDYDATA:[@metadata][messageline]}",
    "%{TIMESTAMP_ISO8601:logdate} %{WORD:hostname} %{WORD:container}\[%{INT:haprorxy_id}\]: %{GREEDYDATA:[@metadata][messageline]}"]}
  }
  if "_grokparsefailure" in [tags] {
    drop {}
  }
  mutate {
    remove_field => ["message", "@timestamp"]
  }
  json {
    source => "[@metadata][messageline]"
  }
  mutate {
    convert => {"pid" => "integer"}
  }
  if "_jsonparsefailure" in [tags] {
    drop {}
  }
  date {
    match => ["logdate", "yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ"]
  }
}
output {
  elasticsearch {
    hosts => ["127.0.0.1:9200"]
    index => "logs-%{+yyyy-MM-dd}"
    document_type => "applicationlogs"
  }
}

magnusbaeck · June 21, 2018, 6:05pm

So how do you know that the events aren't being dropped? Have you tried increasing Logstash's loglevel to get additional clues?

Vikrant_Aggarwal · June 21, 2018, 7:04pm

Thanks. I removed the if condition for grokparsefailure.. It helped me to identify the issue. Basically one hostname was having '-' and WORD was not able to catch that hostname. Really appreciate your help.

Do you have any idea about other two issues?

If the kibana is storing the dashboard in ES .kibana index then how can i save the dashboard so that users after spinning up the ELK can access the dashboard? I know volume is used for persistent storage, but I am talking about scenario in which this docker compose setup should be run on different setups in which storing dashboard on volume would not be an option.

Why the monitoring index is created in ES? I have used option to disable it in my compose file.

system · July 19, 2018, 7:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash does not send apache logs to elasticsearch Logstash	53	5168	July 27, 2017
Issues when running Logstash in Docker Logstash	8	5517	May 10, 2017
Creating simple pipeline to parse the docker logs Beats filebeat	8	2074	July 6, 2018
Problem with docker-compose and logstash Logstash	9	2254	April 9, 2020
Using logstash for parsing logs Logstash	21	6725	July 6, 2017

Logstash not parsing new files added to docker volume

Related topics