I'm using filebeat for forwarding logs to logstash and creating multiple datastreams in elasticsearch. I noticed one index created with name "logstash-2022.01.01-000001" and storing some of data from different datasteams. Why is it so ? Can anyone help in uderstanding the behaviour ?
I suggest post your configurations (Filebeat + Logstash) , and some samples and the results then perhaps someone can help answer the question.
Filebeat configuration
filebeat.inputs:
- type: log
enabled: true
paths:
- /dir1/file1
tags: ["file1"]
- type: log
enabled: true
paths:
- /dir1/file2
tags: ["file2"]
- type: log
enabled: true
paths:
- /dir1/file3
tags: ["file3"]
- type: filestream
enabled: false
paths:
fields:
level: debug
review: 1
# ============================== Filebeat modules ==============================
filebeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
# Period on which files under path should be checked for changes
reload.period: 5s
# ======================= Elasticsearch template setting =======================
setup.template.settings:
index.number_of_shards: 12
# =================================== Kibana ===================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
# ================================== Outputs ===================================
output.logstash:
# The Logstash hosts
hosts: ["host1:5044","host2:5044"]
loadbalance: true
worker: 1
# ================================= Processors =================================
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
logging.level: info
logging.to_files: true
logging.files:
path: "/var/log/filebeat"
name: "filebeat"
keepfiles: 7
permissions: 0644
Logstash Config:
input {
beats {
type => "logs"
port => "5044"
}
}
filter {
if "file1" in [tags]
{
grok {
match => { "message" => ["%{IPORHOST:remote_ip} - %{DATA:user_name} \[%{HTTPDATE:access_time}\] \""] }
}
mutate {
add_field => { "read_timestamp" => "%{@timestamp}" }
}
date {
match => [ "timestamp", "dd/MMM/YYYY:H:m:s Z" ]
remove_field => "timestamp"
}
}
else if "file2" in [tags]
{
json {
source => "message"
}
mutate {
remove_field => ["ephemeral_id", "cloud", "agent"]
add_field => { "read_timestamp" => "%{@timestamp}" }
}
}
else if "file3" in [tags]
{
json {
source => "message"
}
mutate {
remove_field => ["ephemeral_id", "cloud", "agent"]
add_field => { "read_timestamp" => "%{@timestamp}" }
}
}
}
output {
stdout {
codec => rubydebug
}
if "file1" in [tags] {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
user => "xxxxxx"
password => "xxxxxx"
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "slm"
data_stream_namespace => "file1"
data_stream_auto_routing => "false"
}
}
else if "file2" in [tags] {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
user => "xxxxxx"
password => "xxxxxx"
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "prod"
data_stream_namespace => "file2"
data_stream_auto_routing => "false"
}
}
else if "file3" in [tags] {
elasticsearch {
hosts => ["https://elasticsearch:9200"]
user => "xxxxxx"
password => "xxxxxx"
data_stream => "true"
data_stream_type => "logs"
data_stream_dataset => "prod"
data_stream_namespace => "file3"
data_stream_auto_routing => "false"
}
}
In elasticsearch:
Datastream created : logs-prod-file1, logs-prod-file2, logs-prod-file3
Additional index - logstash-{date}-0001 - this index contains some of the records from file1/2/3. Why is this happening ?
My guess would be that you have more than one configuration file in the directory. By default Logstash will concatenate all files found into a single pipeline and all data will be processed by all filters and go to all outputs, so if any other file had a simple Elasticsearch output it would potentially index all events in addition to the outputs shown above.
There is no configuration other than that mentioned above. Its not every records or replication,it's random. Like 5 out 10000 found in logstash-{date}-0001
for each file1/file2/file3.
Is it continously being indexed into?
It looks like you have two Logstash instances. Have you verified none of them have any additional configuration files?
Do you see any unexpected connections to Elasticsearch that could indicate some other process is indexing into Elasticsearch?
Yes, both logstash have exact same configs and pipeline. There is no output other than specific to file1/2/3. No other connections, everything is verified on that part.
@stephenb Can you please take a look at the configs as requested?
Looks good to me.
I would isolate to a single logstash while debugging.
What does your pipelines.yml look like.
Also I notice you are missing the last }
in your logstash pipeline not sure if that is a cut-n-paste or if there is something below that.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.