This data is visible in the discover tab under Index Pattern: Cowrie-* && documents have the _index: cowrie-logstash-2020.08.09-000001.
Important note: I can see type filed having value Cowrie in data under filebeat too:
Here are my doubts:
There are no logstash configurations where the output is "filebeat", I am however seeing indexing within filebeat which is at a suspicious rate of Cowrie
There is reduced rate of ingestion as I've stopped other hosts
How do I:
Stop the duplication of data? and write only to cowrie- index?
Is there a way to merge only the unique documents from filebeat to Cowrie?
A. I suspect horrow here too since I can see that Index Pattern for Cowrie has 560 fields but Filebeat has 6035 fields.
Until writing this post there was no Index Pattern for Filebeat and I had never suspected duplication of data. It is only because of losing disk space I went through the checks.
How are you running logstash? What is the content of your pipelines.yml if you are running it as a service?
If you have 4 configurations, but your pipelines.yml point to a directory with those files instead of one pipeline for each one of your configurations, logstash will merge your files and you will have one big configuration with multiple inputs and outputs.
If this is the case, it doesn't matter that you have different ports on the inputs, the filter and output blocks will be applied to every event that enters the pipeline.
Share your logstash.yml and your pipelines.yml if possible.
Hello, thank you very much for replying. Please find the uncommented portion of the configuration file. I'm giving selective portions for brevity. If you need the entire file, please do let me know:
logstash.yml (uncommented section only)
# ------------ Data path ------------------
#
# Which directory should be used by logstash and its plugins
# for any persistent needs. Defaults to LOGSTASH_HOME/data
#
path.data: /var/lib/logstash
#
# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
# pipeline.workers: 2
#
# How many events to retrieve from inputs before sending to filters+workers
#
# pipeline.batch.size: 125
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
# pipeline.batch.delay: 50
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false
#
# Set the pipeline event ordering. Options are "auto" (the default), "true" or "false".
# "auto" will automatically enable ordering if the 'pipeline.workers' setting
# is also set to '1'.
# "true" will enforce ordering on the pipeline and prevent logstash from starting
# if there are multiple workers.
# "false" will disable any extra processing necessary for preserving ordering.
#
pipeline.ordered: auto
# ------------ Debugging Settings --------------
#
# Options for log.level:
# * fatal
# * error
# * warn
# * info (default)
# * debug
# * trace
#
# log.level: info
path.logs: /var/log/logstash
pipeline.yml (entire file)
# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
# https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
- pipeline.id: main
path.config: "/etc/logstash/conf.d/*.conf"
This configuration here could lead to data duplication:
- pipeline.id: main
path.config: "/etc/logstash/conf.d/*.conf"
With this you do not have 4 different configurations, you have 1 pipeline called main composed of your 4 configuration files.
Consider this example where if you have files 1.conf and 2.conf in the directory /etc/logstash/conf.d/*.conf
1.conf
input {
beats {
port => 5001
}
}
filter {
some filters A
}
output {
elasticsearch {
hosts => ["http://es-hosts:9200"]
index => "indexA"
}
}
2.conf
input {
beats {
port => 5002
}
}
filter {
some filters B
}
output {
elasticsearch {
hosts => ["http://es-hosts:9200"]
index => "indexB"
}
}
When you start logstash, it will merge the files and you will have one pipeline with the following configuration:
input {
beats {
port => 5001
}
beats {
port => 5002
}
}
filter {
some filters A
some filters B
}
output {
elasticsearch {
hosts => ["http://es-hosts:9200"]
index => "indexA"
}
elasticsearch {
hosts => ["http://es-hosts:9200"]
index => "indexB"
}
}
So, if you do not use conditionals in your filters and outputs, every event received from any of the inputs will pass through all filters and will be sent to every output.
If you want to completely separate your configuration files to avoid using filters you need to change it in your pipelines.yml.
I will try this and come back. I added id => "honeypot_ingest" inside each of the configuration but commented it since after that cluster monitoring did not give details about the pipelines (entire cluster is being monitored via metricbeat) -- Should I use thisidfield inpipelines.ymlto separate the configuration?
`
Hence my pipelines.yml would be:
# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
# https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html
- pipeline.id: honeypot_ingest
path.config: "/etc/logstash/conf.d/cowrie.conf"
- pipeline.id: beats_ingest
path.config: "/etc/logstash/conf.d/beats.conf"
- pipeline.id: packetbeat_ingest
path.config: "/etc/logstash/conf.d/packetbeat.conf"
To separate your pipelines you just need to change your pipelines.yml, there is no id field in the pipelines.yml, just pipeline.id.
The id setting you are talking is the one that is used in inputs, filters and outputs to get metrics about it, not to separate anything, there is no need to use it if you do not want.
You can change pipeline-X for honeypot_ingest for example, you choose the name of the pipeline , only letters, numbers, - and _ are allowed if I'm not wrong.
You can read more about running multiple pipelines from the documentation
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.